标签 Java 下的文章

无Nginx代理场景

业务层通过获取请求头参数即可拿到客户端IP

request.getRemoteAddr();

一级Nginx代理

使用代理后直接读取请求头参数会读取到代理服务器的IP地址,而非真实客户端IP

解决方法是添加Nginx请求头参数提前保存客户端IP

nginx.conf配置加入内容

location / {
    ...
    # IP地址转发
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Real-Port $remote_port;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  }

业务层读取nginx配置的请求头参数即可

public static String getIpAddr(HttpServletRequest request) {
        String ip = request.getHeader("X-Real-IP");
        if (ip == null || ip.length() == 0 || " unknown ".equalsIgnoreCase(ip)) {
            ip = request.getHeader("Proxy-Client-IP");
            log.info("【Proxy-Client-IP】 {}", ip);
        }
        if (ip == null || ip.length() == 0 || " unknown ".equalsIgnoreCase(ip)) {
            ip = request.getHeader("WL-Proxy-Client-IP");
            log.info("【WL-Proxy-Client-IP】{}", ip);
        }
        if (ip == null || ip.length() == 0 || " unknown ".equalsIgnoreCase(ip)) {
            ip = request.getHeader("X-Forwarded-For");
            log.info("【X-Forwarded-For】{}", ip);
        }
        if (ip == null || ip.length() == 0 || " unknown ".equalsIgnoreCase(ip)) {
            ip = request.getRemoteAddr();
            log.info("【unknown】{}", ip);
        }
        return ip;
    }

多级Nginx代理

若存在多级Nginx代理,则需要在第一级代理时获取客户端IP,在后续代理逐层传递

第一级代理配置同上,第N级代理配置nginx.conf如下

location /{
   # IP地址转发
   proxy_set_header X-Real-IP $X-Real-IP;
   proxy_set_header X-Real-Port $X-Real-Port;
   proxy_set_header X-Forwarded-For $X-Forwarded-For;
}

业务层保持不变即可读取到传递的IP地址

阿里云CDN转发

在已经存在nginx代理的场景下,加入CDN后源客户端IP在CDN处被转发,故第一级nginx代理使用$remote_addr参数读取到的是CDN服务的地址,而根据一般约定,CDN转发会将IP地址存放在X-Forwarded-For参数下

修改顶级Nginx配置以支持优先获取CDN代理地址,也可以通过修改业务层读取优先级实现

location / {
    ...
    # IP地址转发
    # proxy_set_header X-Real-IP $remote_addr;
    # proxy_set_header X-Real-Port $remote_port;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  }

但经过对阿里云CDN测试,发现X-Forwarded-For下不仅包含真实客户端IP也包含CDN服务IP,故业务层需要做一定的处理进行区分

如图,第一个为真实客户端IP,第二个为CDN代理IP

1554256476626.png

public static String getIpAddr(HttpServletRequest request) {
        String ip = request.getHeader("X-Real-IP");
        if (ip == null || ip.length() == 0 || " unknown ".equalsIgnoreCase(ip)) {
            ip = request.getHeader("Proxy-Client-IP");
            log.info("【Proxy-Client-IP】 {}", ip);
        }
        if (ip == null || ip.length() == 0 || " unknown ".equalsIgnoreCase(ip)) {
            ip = request.getHeader("WL-Proxy-Client-IP");
            log.info("【WL-Proxy-Client-IP】{}", ip);
        }
        if (ip == null || ip.length() == 0 || " unknown ".equalsIgnoreCase(ip)) {
            ip = request.getHeader("X-Forwarded-For");
            if (ip.contains(",")) {
                // 通过阿里云CDN转发后可能读取到2个IP地址
                String[] cdnMutilIp = ip.split(",");
                ip = cdnMutilIp[0];
            }
            log.info("【X-Forwarded-For】{}", ip);
        }
        if (ip == null || ip.length() == 0 || " unknown ".equalsIgnoreCase(ip)) {
            ip = request.getRemoteAddr();
            log.info("【unknown】{}", ip);
        }
        return ip;
    }
}

[toc]

官方文档

JDK命令行参数

标准化参数,各个jvm版本中不变

-参数

-help -version -server -client

非标准化参数,不同jvm版本可能会变化

-X参数

-Xint:解释执行 -Xcomp:第一次全部编译成本地代码(首次运行速度慢) -Xmixed:混合模式(默认),JVM自己决定是否编译成本地代码

PS C:\Users\11860> java -version
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)

-XX参数

布尔类型

格式:-XX:[+-]表示启用或禁用 -XX:+UseConcMarkSweepGC 表示启用CMS垃圾回收器 -XX:+UseG1GC 表示启用G1垃圾回收器

值类型

格式:-XX:=表示name属性的值是value -XX:MaxGCPauseMillis=500 表示GC最大停顿时间为500 -XX:GCTimeRatio=19

-Xms -Xmx

-Xms等价于-XX:InitialHeapSize 即初始化堆大小 -Xmx等价于-XX:MaxHeapSize 最大堆大小

JVM运行时参数

-XX:+PrintFlagsInitial 查看初始值 -XX:+PrintFlagsFinal 查看最终值 -XX:+UnlockDiagnosticVMOptions 解锁诊断参数 -XX:+PrintCommandLineFlags 打印命令行参数

+PrintFlagsFinal = 即默认值 := 即用户或JVM修改后的值

PS C:\Users\11860> java -XX:+PrintFlagsFinal -version
[Global flags]
    uintx AdaptiveSizeDecrementScaleFactor          = 4                                   {product}
    uintx AdaptiveSizeMajorGCDecayTimeScale         = 10                                  {product}
    uintx AdaptiveSizePausePolicy                   = 0                                   {product}
    uintx AdaptiveSizePolicyCollectionCostMargin    = 50                                  {product}
    uintx InitialCodeCacheSize                      = 2555904                             {pd product}
    uintx InitialHeapSize                          := 268435456                           {product}
    bool UseG1GC                                    = false                                {product}

jps

专用于查看java进程

PS C:\Users\11860> jps
14624 Jps
10248
14168 proxyee-down.exe
2120 RemoteMavenServer
5192 main\proxyee-down-core.jar

jps -l显示全称

PS C:\Users\11860> jps -l
10248
14168 proxyee-down-2.54\proxyee-down.exe
1800 sun.tools.jps.Jps
2120 org.jetbrains.idea.maven.server.RemoteMavenServer
5192 /proxyee-down-2.54/main\proxyee-down-core.jar

命令参考地址

jinfo

查看正在运行的java程序参数

PS C:\Users\11860> jinfo
Usage:
    jinfo [option] <pid>
        (to connect to running process)
    jinfo [option] <executable <core>
        (to connect to a core file)
    jinfo [option] [server_id@]<remote server IP or hostname>
        (to connect to remote debug server)

where <option> is one of:
    -flag <name>         to print the value of the named VM flag
    -flag [+|-]<name>    to enable or disable the named VM flag
    -flag <name>=<value> to set the named VM flag to the given value
    -flags               to print VM flags
    -sysprops            to print Java system properties
    <no option>          to print both of the above
    -h | -help           to print this help message

查看最大内存

jinfo -flag MaxHeapSize

PS C:\Users\11860> jinfo -flag MaxHeapSize 2120
-XX:MaxHeapSize=805306368

查看垃圾回收器

jinfo -flag UseG1GC

jstat

查看jvm统计信息

PS C:\Users\11860> jstat -help
Usage: jstat -help|-options
       jstat -<option> [-t] [-h<lines>] <vmid> [<interval> [<count>]]

Definitions:
  <option>      An option reported by the -options option
  <vmid>        Virtual Machine Identifier. A vmid takes the following form:
                     <lvmid>[@<hostname>[:<port>]]
                Where <lvmid> is the local vm identifier for the target
                Java virtual machine, typically a process id; <hostname> is
                the name of the host running the target Java virtual machine;
                and <port> is the port number for the rmiregistry on the
                target host. See the jvmstat documentation for a more complete
                description of the Virtual Machine Identifier.
  <lines>       Number of samples between header lines.
  <interval>    Sampling interval. The following forms are allowed:
                    <n>["ms"|"s"]
                Where <n> is an integer and the suffix specifies the units as
                milliseconds("ms") or seconds("s"). The default units are "ms".
  <count>       Number of samples to take before terminating.
  -J<flag>      Pass <flag> directly to the runtime system.

官方文档

可选参数

-statOption Determines the statistics information the jstat command displays. The following lists the available options. Use the -options general option to display the list of options for a particular platform installation. See Stat Options and Output.

class: Displays statistics about the behavior of the class loader.

compiler: Displays statistics about the behavior of the Java HotSpot VM Just-in-Time compiler.

gc: Displays statistics about the behavior of the garbage collected heap.

gccapacity: Displays statistics about the capacities of the generations and their corresponding spaces.

gccause: Displays a summary about garbage collection statistics (same as -gcutil), with the cause of the last and current (when applicable) garbage collection events.

gcnew: Displays statistics of the behavior of the new generation.

gcnewcapacity: Displays statistics about the sizes of the new generations and its corresponding spaces.

gcold: Displays statistics about the behavior of the old generation and metaspace statistics.

gcoldcapacity: Displays statistics about the sizes of the old generation.

gcmetacapacity: Displays statistics about the sizes of the metaspace.

gcutil: Displays a summary about garbage collection statistics.

printcompilation: Displays Java HotSpot VM compilation method statistics.

类加载

-class Stat Options and Output The following information summarizes the columns that the jstat command outputs for each statOption.

-class option Class loader statistics.

Loaded: Number of classes loaded.

Bytes: Number of kBs loaded.

Unloaded: Number of classes unloaded.

Bytes: Number of Kbytes unloaded.

Time: Time spent performing class loading and unloading operations.
PS C:\Users\11860> jstat -class 2120
Loaded  Bytes  Unloaded  Bytes     Time
  4271  7387.1       98   137.6       3.28

垃圾收集

-gc 输出含义 S0C、S1C、S0U、S1U:S0和S1的总用量与使用量 EC、EU:Eden区总量与使用量 OC、OU:Old区总量与使用量 MC、MU:Metaspace区总量与使用量 CCSC、CCSU:压缩类空间总量与使用量 YGC、YGCT:YoungGC的次数与时间 FGC、FGCT:FullGC的次数与时间 GCT:总的GC时间

解释:S0=S1,同一时间只启用s1或s0,S0+S1+Eden=Young,内存堆=Young+Old 默认命令只输出一次

PS C:\Users\11860> jstat -gc 2120
 S0C    S1C    S0U    S1U      EC       EU        OC         OU       MC     MU    CCSC   CCSU   YGC     YGCT    FGC    FGCT     GCT
35840.0 51712.0  0.0    0.0   158720.0 11900.5   102400.0    9199.6   24192.0 22964.4 2944.0 2550.2      8    0.314   2      0.154    0.468

指定输出参数为每隔一秒输出一次,共输出10次

PS C:\Users\11860> jstat -gc 5192 1000 10
 S0C    S1C    S0U    S1U      EC       EU        OC         OU       MC     MU    CCSC   CCSU   YGC     YGCT    FGC    FGCT     GCT
4352.0 4352.0 2228.1  0.0   34944.0  19156.1   87424.0    75913.1   36864.0 35975.9  0.0    0.0     2338    7.004   8      0.459    7.463
4352.0 4352.0  0.0   2131.3 34944.0   7184.2   87424.0    76050.5   36864.0 35975.9  0.0    0.0     2339    7.006   8      0.459    7.466
4352.0 4352.0  0.0   2131.3 34944.0  31117.2   87424.0    76050.5   36864.0 35975.9  0.0    0.0     2339    7.006   8      0.459    7.466
4352.0 4352.0 2320.3  0.0   34944.0   8482.5   87424.0    76050.5   36864.0 35975.9  0.0    0.0     2340    7.009   8      0.459    7.468
4352.0 4352.0 2320.3  0.0   34944.0  29236.0   87424.0    76050.5   36864.0 35975.9  0.0    0.0     2340    7.009   8      0.459    7.468
4352.0 4352.0  0.0   2058.5 34944.0  11809.7   87424.0    76348.8   36864.0 35975.9  0.0    0.0     2341    7.012   8      0.459    7.471
4352.0 4352.0 2339.3  0.0   34944.0    0.0     87424.0    76348.8   36864.0 35975.9  0.0    0.0     2342    7.015   8      0.459    7.474
4352.0 4352.0 2339.3  0.0   34944.0  11533.3   87424.0    76348.8   36864.0 35975.9  0.0    0.0     2342    7.015   8      0.459    7.474
4352.0 4352.0  0.0   2076.1 34944.0   3826.1   87424.0    76520.6   36864.0 35975.9  0.0    0.0     2343    7.017   8      0.459    7.476
4352.0 4352.0  0.0   2076.1 34944.0  11102.7   87424.0    76520.6   36864.0 35975.9  0.0    0.0     2343    7.017   8      0.459    7.476

JIT编译

-complier

PS C:\Users\11860> jstat -compiler 2120
Compiled Failed Invalid   Time   FailedType FailedMethod
    5234      0       0    16.70          0

JVM内存结构 image

内存溢出

堆区内存溢出

非堆区内存溢出

导出内存映像文件

当内存溢出时自动导出 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./

jmap命令手动导出

PS C:\Users\11860> jmap -help
Usage:
    jmap [option] <pid>
        (to connect to running process)
    jmap [option] <executable <core>
        (to connect to a core file)
    jmap [option] [server_id@]<remote server IP or hostname>
        (to connect to remote debug server)

where <option> is one of:
    <none>               to print same info as Solaris pmap
    -heap                to print java heap summary
    -histo[:live]        to print histogram of java object heap; if the "live"
                         suboption is specified, only count live objects
    -clstats             to print class loader statistics
    -finalizerinfo       to print information on objects awaiting finalization
    -dump:<dump-options> to dump java heap in hprof binary format
                         dump-options:
                           live         dump only live objects; if not specified,
                                        all objects in the heap are dumped.
                           format=b     binary format
                           file=<file>  dump heap to <file>
                         Example: jmap -dump:live,format=b,file=heap.bin <pid>
    -F                   force. Use with -dump:<dump-options> <pid> or -histo
                         to force a heap dump or histogram when <pid> does not
                         respond. The "live" suboption is not supported
                         in this mode.
    -h | -help           to print this help message
    -J<flag>             to pass <flag> directly to the runtime system

手动导出映像文件

PS C:\Users\11860> jmap -dump:format=b,file=heap.hprof 11720
Dumping heap to L:\heap.hprof ...
Heap dump file created

jmap相关命令

PS C:\Users\11860> jmap -heap 11720
Attaching to process ID 11720, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.151-b12

using thread-local object allocation.
Parallel GC with 8 thread(s)

Heap Configuration:
   MinHeapFreeRatio         = 0
   MaxHeapFreeRatio         = 100
   MaxHeapSize              = 33554432 (32.0MB)
   NewSize                  = 11010048 (10.5MB)
   MaxNewSize               = 11010048 (10.5MB)
   OldSize                  = 22544384 (21.5MB)
   NewRatio                 = 2
   SurvivorRatio            = 8
   MetaspaceSize            = 21807104 (20.796875MB)
   CompressedClassSpaceSize = 1073741824 (1024.0MB)
   MaxMetaspaceSize         = 17592186044415 MB
   G1HeapRegionSize         = 0 (0.0MB)

Heap Usage:
PS Young Generation
Eden Space:
   capacity = 7864320 (7.5MB)
   used     = 6748264 (6.435646057128906MB)
   free     = 1116056 (1.0643539428710938MB)
   85.80861409505208% used
From Space:
   capacity = 1572864 (1.5MB)
   used     = 950272 (0.90625MB)
   free     = 622592 (0.59375MB)
   60.416666666666664% used
To Space:
   capacity = 1572864 (1.5MB)
   used     = 0 (0.0MB)
   free     = 1572864 (1.5MB)
   0.0% used
PS Old Generation
   capacity = 22544384 (21.5MB)
   used     = 21187512 (20.20598602294922MB)
   free     = 1356872 (1.2940139770507812MB)
   93.9813303392987% used

16212 interned Strings occupying 2160560 bytes.

MAT分析内存溢出

工具下载 导入文件显示怀疑内存溢出(Leak suspects)

image 选择直方图

image 找到影响最大的项目,右键查看强引用以判断这些占用内存大的对象由谁引用,由此定位内存溢出问题

image

image 选择实体树

image 也可看出存在大量对象加载

image

jstack

分析线程状态

PS C:\Users\11860> jstack
Usage:
    jstack [-l] <pid>
        (to connect to running process)
    jstack -F [-m] [-l] <pid>
        (to connect to a hung process)
    jstack [-m] [-l] <executable> <core>
        (to connect to a core file)
    jstack [-m] [-l] [server_id@]<remote server IP or hostname>
        (to connect to a remote debug server)

Options:
    -F  to force a thread dump. Use when jstack <pid> does not respond (process is hung)
    -m  to print both java and native frames (mixed mode)
    -l  long listing. Prints additional information about locks
    -h or -help to print this help message

手动导出

jstack 23276 > RemoteMavenServerJstack

线程状态官方文档

[线程状态互相转化]https://mp.weixin.qq.com/s/GsxeFM7QWuR--Kbpb7At2w)

jstack定位CPU飙高问题

tomcat 远程监控

开启tomcat远程监控

startup.sh 在最后一行的'start "$@"'前加上'jpda'

catalina.sh 搜索jpda,可看到jpda使用说明

SpringCloud Config

配置无法读取

问题描述

配置中心的配置文件如下

eureka:
  client:
    service-url:
      defaultZone: http://localhost:8761/eureka/
spring:
  application:
    name: config
  cloud:
    config:
      server:
        git:
          uri: http://XXX/fjy8018/config-repo
          username: 1186032234@qq.com
          password: XXX
          basedir: spring-cloud\config\basedir
server:
  port: 8084

已经通过网页测试该账号和密码可以访问该仓库,并有权限,启动配置中心后通过页面访问提示Cannot clone or checkout repository 请输入图片描述

解决方法

SpringCloud正式版的配置中心路径必须是带.git结尾的全路径,将配置改为全路径即可

eureka:
  client:
    service-url:
      defaultZone: http://localhost:8761/eureka/
spring:
  application:
    name: config
  cloud:
    config:
      server:
        git:
          uri: http://XXX/fjy8018/config-repo.git
          username: 1186032234@qq.com
          password: XXX
          basedir: spring-cloud\config\basedir
server:
server:
  port: 8084

请输入图片描述

背景

原理

在学习Redis分布式锁时编写Service类,其中包含加锁和解锁两个方法,通过调用Redis的SETNX和GETSET方法加锁 请输入图片描述

请输入图片描述

源码

为便于查看,加了logback进行日志打印 请输入图片描述

请输入图片描述

需要加锁的方法,该方法为模拟抢购场景的实现 请输入图片描述

交由SpringBoot自动注入写好的RedisLock包(其实问题出在这里,本应该是10*1000,手误写成10*000结果为0,故所有超时为0请输入图片描述

访问Controller,一个为抢购地址,抢购地址访问一次则数量减一,一个为查询地址 请输入图片描述

压测工具

使用Apache AB模拟高并发访问

并发测试

手动测试

手动访问,测试功能是否正常 查询 请输入图片描述

订购 请输入图片描述

压测

10个请求,单线程发起测试 请输入图片描述

数量正常 请输入图片描述

50个请求,2个线程发起测试 请输入图片描述

出错

数量显然不对 请输入图片描述

查看日志,发现后续请求都从判断锁是否过期的方法获得的锁,但执行上数值一致,逻辑也没问题 请输入图片描述

原因

后来发现在高并发情况下超时时间如果设置过小,多个线程可能获得的时间戳一致,导致加锁无效,即每个线程都在判断时间戳时可以获得锁,导致“超售” 请输入图片描述

再次压测

数量正常 请输入图片描述

提高并发量 请输入图片描述

数量正常,问题解决 请输入图片描述

代码

GitHub链接 开源工具类UserAgentParser地址

日志来源

根据之前的作业提交系统项目的后台获取,为了本地测试,故先从数据库中备份下来并做了适当的处理,仅保留UserAgent的相关信息 请输入图片描述

本地文件测试

使用开源工具类解析UserAgent中的信息

@Test
    public void testReadFile() throws Exception {
        String path = "F:\\JAVA Workspace\\hadoopstudy\\access.log";
        BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(new File(path))));

        String line = "";
        int count = 0;

        //模拟MapReduce存储
        Map<String, Integer> browerMap = new HashMap<String, Integer>();

        UserAgentParser userAgentParser = new UserAgentParser();

        while (line != null) {
            line = reader.readLine();
            count++;
            if (StringUtils.isNotBlank(line)) {
                String source = line;
                UserAgent agent = userAgentParser.parse(source);
                //测试
                String browser = agent.getBrowser();
                String engine = agent.getEngine();
                String engineVersion = agent.getEngineVersion();
                String os = agent.getOs();
                String platform = agent.getPlatform();
                boolean mobile = agent.isMobile();
                Integer browserValue = browerMap.get(browser);
                
                if (browserValue != null) {
                    browerMap.put(browser, browerMap.get(browser) + 1);
                } else {
                    browerMap.put(browser, 1);
                }

                //输出解析的信息
                System.out.println(browser + "," + engine + "," + engineVersion + "," + os + "," + platform + "," + mobile);
            }

        }
        System.out.println("总记录数:" + count);
        System.out.println("====================================");
        for (Map.Entry<String,Integer>entry :browerMap.entrySet()){
            System.out.println(entry.getKey()+":"+entry.getValue());
        }
    }

执行单元测试 请输入图片描述

使用MapReduce统计

代码以WordCount为原型,结合本地测试的代码编写 ... boolean mobile = agent.isMobile();

            if (mobile){
                mobileuser = "手机用户";
            }else {
                mobileuser = "非手机用户";
            }

            //通过上下文把map的处理结果输出
            context.write(new Text(browser), one);
...

打包运行

将工程用mvn assembly:assembly命令把插件一起打包,上传至虚拟机,同时将log上传至HDFS的根目录 [hadoop@localhost testFile]$ hadoop fs -put access.log / 18/04/11 06:12:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [hadoop@localhost testFile]$ hadoop fs -ls / 18/04/11 06:12:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 9 items -rw-r--r-- 1 hadoop supergroup 73 2018-04-11 00:30 /PartitionerTest.txt -rw-r--r-- 1 hadoop supergroup 26888 2018-04-11 06:12 /access.log drwxr-xr-x - hadoop supergroup 0 2018-04-08 10:56 /hdfsapi -rw-r--r-- 1 hadoop supergroup 60 2018-04-10 23:03 /hello.txt drwxrwx--- - hadoop supergroup 0 2018-04-11 01:28 /history drwxr-xr-x - hadoop supergroup 0 2018-04-11 00:35 /output drwxr-xr-x - hadoop supergroup 0 2018-04-08 10:35 /test drwx------ - hadoop supergroup 0 2018-04-11 01:36 /tmp drwxr-xr-x - hadoop supergroup 0 2018-04-09 20:11 /user 运行 请输入图片描述

请输入图片描述

结果与本地测试一致 [hadoop@localhost testFile]$ hadoop fs -ls /logaccess/browserout 18/04/11 06:15:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 2 items -rw-r--r-- 1 hadoop supergroup 0 2018-04-11 06:14 /logaccess/browserout/_SUCCESS -rw-r--r-- 1 hadoop supergroup 32 2018-04-11 06:14 /logaccess/browserout/part-r-00000 [hadoop@localhost testFile]$ hadoop fs -text /logaccess/browserout/part-r-00000 18/04/11 06:15:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Chrome 191 Firefox 10 Unknown 1 增加其他属性重新运行MapReduce,结果如下 [hadoop@localhost testFile]$ hadoop fs -ls /logaccess 18/04/11 06:33:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 2 items -rw-r--r-- 1 hadoop supergroup 0 2018-04-11 06:32 /logaccess/_SUCCESS -rw-r--r-- 1 hadoop supergroup 192 2018-04-11 06:32 /logaccess/part-r-00000 [hadoop@localhost testFile]$ hadoop fs -text /logaccess/part-r-00000 18/04/11 06:33:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 20100101 10 537.36 191 604.3.5 1 Android 48 Chrome 191 Firefox 10 Gecko 10 Linux 48 Unknown 1 Webkit 192 Windows 296 Windows 7 10 iPhone 1 iPhone OS 11.1 1 手机用户 49 非手机用户 153 日志离线统计完成!

官方原理解析

MapReduce演进

MapReduce2.x的架构 官方原理解析

MapReduce1.x的架构,一旦JobTracker挂了,整个分布式节点就崩溃了,因此被淘汰 官方原理解析

开发WordCount

WordCount原理

数据输入到MapReduce,每个MapReduce任务都被初始化为一个Job,每个Job又可以分为两种阶段:map阶段和reduce阶段。map方法接收一个<key,value>形式的输入,key表示词组偏移量,value表示该偏移量对应的值,然后同样产生一个<key,value>形式的中间输出,Hadoop函数接收一个如<key,(list of values)>形式的输入,list of values即Java中的集合,用于表示相同偏移量下的字符统计数量,然后对这个value集合进行处理,每个Reduce方法产生0或1个输出,reduce的输出也是<key,value>形式的。 官方原理解析

如果文本为

Hadoop Hello Hadoop 

实际过程大概为:

Mapping:

<Hadoop ,1>,<Hello,1>,<Hadoop ,1>

Shuffling:

<Hadoop ,1>,<Hadoop ,1>
<Hello,1>

Reducing:

<Hadoop ,2>
<Hello,1>

官方原理解析

多节点下的并行计算原理示意图 官方原理解析

开发代码

查看源码

通过查看Mapper类的源码,发现官方已经定义的run方法,包含了所有基础的Mapper操作(模板方法设计模式),主要操作在map中,因此,只要重写map方法即可。 请输入图片描述

同理Reduce类中也有类似的定义 请输入图片描述

重写方法

这里只列举部分代码,其余所有代码可去本人的GitHub中查看:GitHub链接 /** * Map:读取输入文件 * Text:类似字符串 */ public static class MyMapper extends Mapper<LongWritable, Text, Text, LongWritable> {

    LongWritable one = new LongWritable(1);

    /**
     * @param key     偏移量
     * @param value   每行的字符串
     * @param context 上下文
     * @throws IOException
     * @throws InterruptedException
     */
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        /*super.map(key, value, context);*/

        //接收到每一行数据
        String line = value.toString();

        //按照指定分隔符进行拆分
        /*line.split("\t");//以Tab分隔*/
        String[] words = line.split(" ");//以空格分隔

        for (String word : words) {
            //通过上下文把map的处理结果输出
            context.write(new Text(word), one);
        }
    }
}

运行

通过宝塔面板上传到服务器,确保此时Hadoop所有服务均为启动状态 请输入图片描述

在HDFS中预先创建好要测试的文件,通过绝对路径的方法hadoop fs -ls hdfs://192.168.79.129:8020/确认文件是可以访问的 请输入图片描述

输入运行命令即可运行

    hadoop jar hadoopstudy-1.0-SNAPSHOT.jar com.fjy.hadoop.mapreduce.WordCountApp hdfs://192.168.79.129:8020/hello.txt hdfs://192.168.79.129:8020/output/wc

启动至FileInputFormat

    [hadoop@localhost testFile]$ hadoop jar hadoopstudy-1.0-SNAPSHOT.jar com.fjy.hadoop.mapreduce.WordCountA2.168.79.129:8020/hello.txt hdfs://192.168.79.129:8020/output/wc
    18/04/10 09:57:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    18/04/10 09:57:44 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
    18/04/10 09:57:45 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    18/04/10 09:57:45 INFO input.**FileInputFormat**: Total input paths to process : 1

执行Job

    18/04/10 09:57:46 INFO mapreduce.JobSubmitter: number of splits:1
    18/04/10 09:57:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1523322221448_0001
    18/04/10 09:57:47 INFO impl.YarnClientImpl: Submitted application application_1523322221448_0001
    18/04/10 09:57:47 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1523322221448_0001/
    18/04/10 09:57:47 INFO mapreduce.Job: Running job: job_1523322221448_0001
    18/04/10 09:57:56 INFO mapreduce.Job: Job job_1523322221448_0001 running in uber mode : false
    18/04/10 09:57:56 INFO mapreduce.Job:  map 0% reduce 0%
    18/04/10 09:58:04 INFO mapreduce.Job:  map 100% reduce 0%
    18/04/10 09:58:10 INFO mapreduce.Job:  map 100% reduce 100%
    18/04/10 09:58:12 INFO mapreduce.Job: Job job_1523322221448_0001 completed successfully
    18/04/10 09:58:12 INFO mapreduce.Job: Counters: 49
            File System Counters
                    FILE: Number of bytes read=39
                    FILE: Number of bytes written=222915
                    FILE: Number of read operations=0
                    FILE: Number of large read operations=0
                    FILE: Number of write operations=0
                    HDFS: Number of bytes read=113
                    HDFS: Number of bytes written=17
                    HDFS: Number of read operations=6
                    HDFS: Number of large read operations=0
                    HDFS: Number of write operations=2
            Job Counters

Map方法执行

                    Launched map tasks=1
                    Launched reduce tasks=1
                    Data-local map tasks=1
                    Total time spent by all maps in occupied slots (ms)=5920
                    Total time spent by all reduces in occupied slots (ms)=3321
                    Total time spent by all map tasks (ms)=5920
                    Total time spent by all reduce tasks (ms)=3321
                    Total vcore-seconds taken by all map tasks=5920
                    Total vcore-seconds taken by all reduce tasks=3321
                    Total megabyte-seconds taken by all map tasks=6062080
                    Total megabyte-seconds taken by all reduce tasks=3400704
            Map-Reduce Framework
                    Map input records=1
                    Map output records=2
                    Map output bytes=29
                    Map output materialized bytes=39

Reduce 方法执行

                    Input split bytes=101
                    Combine input records=0
                    Combine output records=0

                    Reduce input groups=2
                    Reduce shuffle bytes=39
                    Reduce input records=2
                    Reduce output records=2
                    Spilled Records=4
                    Shuffled Maps =1
                    Failed Shuffles=0
                    Merged Map outputs=1
                    GC time elapsed (ms)=115
                    CPU time spent (ms)=1080
                    Physical memory (bytes) snapshot=300331008
                    Virtual memory (bytes) snapshot=5493092352
                    Total committed heap usage (bytes)=165810176

Shuffle 方法执行

            Shuffle Errors
                    BAD_ID=0
                    CONNECTION=0
                    IO_ERROR=0
                    WRONG_LENGTH=0
                    WRONG_MAP=0
                    WRONG_REDUCE=0
`File Input`和`File Output` 方法执行   
            File Input Format Counters
                    Bytes Read=12
            File Output Format Counters
                    Bytes Written=17

运行结果如图 请输入图片描述

进一步优化

使用Combiner

使用Combiner组件可以在Mapping时就先进行一次聚合,再发送到Shuffling继续运算,从而提高运算效率 在主方法中进行如下配置即可

        //通过job设置combiner处理类,逻辑上与reduce一致,注意,如果要计算平均数等不能使用Combiner!
        job.setCombinerClass(MyReducer.class);

编写完后使用Maven重新打包

    mvn clean package -DskipTests

上传到服务器后再次执行计算命令

    hadoop jar hadoopstudy-0.3.jar com.fjy.hadoop.mapreduce.WordCountApp hdfs://192.168.79.129:8020/hello.txt hdfs://192.168.79.129:8020/output/wc

查看图形界面可观察是否完成 请输入图片描述

请输入图片描述

通过查看代码可发现Combine input records已经不为0(之前未使用Combiner 时该值为0),说明使用成功

    Map-Reduce Framework
                    Map input records=4
                    ...
                    Combine input records=9
                    Combine output records=5
                    ...
                    Physical memory (bytes) snapshot=298844160
                    Virtual memory (bytes) snapshot=5493088256
                    Total committed heap usage (bytes)=165810176
            Shuffle Errors
                    BAD_ID=0
                    ...

查看结果

    [hadoop@localhost testFile]$ hadoop fs -cat /output/wc/part-r-00000
    18/04/10 23:06:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    !       1
    Hadoop  3
    Hello   2
    MapReduce       2
    YARN    1

使用Partitioner组件

作用

Partitioner决定MapTask输出的数据交由哪个ReduceTask处理,其默认实现为分发的key的hash值对ReduceTask个数取模

文件准备

在宝塔测试目录/www/hadoop/testFile下准备好PartitionerTest.txt文件,文件内容如图,将其上传到HDFS根目录中 请输入图片描述

    [hadoop@localhost root]$ cd /www/hadoop/testFile
    [hadoop@localhost testFile]$ ls
    PartitionerTest.txt  hello.txt
    [hadoop@localhost testFile]$ hadoop fs -put PartitionerTest.txt /
    18/04/11 00:30:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    [hadoop@localhost testFile]$ hadoop fs -ls /
    18/04/11 00:30:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Found 7 items
    -rw-r--r--   1 hadoop supergroup         73 2018-04-11 00:30 /PartitionerTest.txt
    drwxr-xr-x   - hadoop supergroup          0 2018-04-08 10:56 /hdfsapi
    -rw-r--r--   1 hadoop supergroup         60 2018-04-10 23:03 /hello.txt
    drwxr-xr-x   - hadoop supergroup          0 2018-04-10 23:05 /output
    drwxr-xr-x   - hadoop supergroup          0 2018-04-08 10:35 /test
    drwx------   - hadoop supergroup          0 2018-04-09 20:11 /tmp
    drwxr-xr-x   - hadoop supergroup          0 2018-04-09 20:11 /user
    [hadoop@localhost testFile]$ hadoop fs -text /PartitionerTest.txt
    18/04/11 00:30:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    xiaomi 200
    huawei 300
    xiaomi 100
    huawei 200
    iphone 300
    iphone 200
    sony 50

代码编写

主要对分隔符和上下文内容进行修改,map类中修改为

                //每一个空格是一个手机品牌,另一个是销售数量
                context.write(new Text(words[0]), new LongWritable(Long.parseLong(words[1])));

新增Partitioner处理类

    /**
     * Partitioner处理类
     */
    public static class MyPartitioner extends Partitioner<Text,LongWritable>{
         @Override
         public int getPartition(Text key, LongWritable value, int i) {

             if ("xiaomi".equals(key.toString())) {
                 return 0;//若为xiaomi则交由0 ReduceTask处理
             }
             if ("huawei".equals(key.toString())) {
                 return 1;//若为huawei则交由1 ReduceTask处理
             }
             if ("iphone".equals(key.toString())) {
                 return 2;//若为iphone则交由2 ReduceTask处理
             }

             return 3;//若为其他,则交由3 ReduceTask处理
         }
     }

最后在驱动中配置Partitioner

        //设置job的Partition
        job.setPartitionerClass(MyPartitioner.class);
        //设置四个reducer,每个分区一个,否则Partitioner配置不生效
        job.setNumReduceTasks(4);

通过复制类名全路径(以IDEA为例),修改运行命令 请输入图片描述

此时,运行命令为

    hadoop jar hadoopstudy-0.4.jar com.fjy.hadoop.mapreduce.WordCountPartitionerApp hdfs://192.168.79.129:8020/PartitionerTest.txt hdfs://192.168.79.129:8020/output/partitioner

执行

再次用Maven打包并上传到测试目录运行 运行前: 请输入图片描述

运行中: 请输入图片描述

运行结束 请输入图片描述

查看结果内容:

    [hadoop@localhost testFile]$ hadoop fs -ls /output/partitioner
    18/04/11 00:36:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Found 5 items
    -rw-r--r--   1 hadoop supergroup          0 2018-04-11 00:35 /output/partitioner/_SUCCESS
    -rw-r--r--   1 hadoop supergroup         11 2018-04-11 00:35 /output/partitioner/part-r-00000
    -rw-r--r--   1 hadoop supergroup         11 2018-04-11 00:35 /output/partitioner/part-r-00001
    -rw-r--r--   1 hadoop supergroup         11 2018-04-11 00:35 /output/partitioner/part-r-00002
    -rw-r--r--   1 hadoop supergroup          8 2018-04-11 00:35 /output/partitioner/part-r-00003
    [hadoop@localhost testFile]$ hadoop fs -text /output/partitioner/part-r-00000
    18/04/11 00:36:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    xiaomi  300
    [hadoop@localhost testFile]$ hadoop fs -text /output/partitioner/part-r-00001
    18/04/11 00:36:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    huawei  500
    [hadoop@localhost testFile]$ hadoop fs -text /output/partitioner/part-r-00002
    18/04/11 00:37:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    iphone  500
    [hadoop@localhost testFile]$ hadoop fs -text /output/partitioner/part-r-00003
    18/04/11 00:37:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    sony    50

结果正确!

开启JobHistory

JobHistory作用

记录已经运行完 的MapReduce信息到指定的HDFS目录下,但默认不开启 因此此时不配置直接访问如下

![jobhistory][33]

jobhistory

jobhistory

jobhistory

配置JobHistory

mapred-site.xml加入配置

    <!-- jobhistory地址-->
	<property>
		<name>mapreduce.jobhistory.address</name>
		<value>192.168.79.129:10020</value>
		<description>MapReduce JobHistory Server IPC host:port</description>
	</property>
    <!-- jobhistory web的地址-->
	<property>
		<name>mapreduce.jobhistory.webapp.address</name>
		<value>192.168.79.129:19888</value>
		<description>MapReduce JobHistory Server Web UI host:port</description>
	</property>
    <!-- 作业运行完存放地址-->
	<property>
		<name>mapreduce.jobhistory.done-dir</name>
		<value>/history/done</value>
	</property>
    <!-- 作业运行中存放地址-->
	<property>
		<name>mapreduce.jobhistory.intermediate-done-dir</name>
		<value>/history/done_intermediate</value>
	</property>

jobhistory

配置完后重启YARN

    [hadoop@localhost mapreduce]$ cd /www/hadoop/hadoop-2.6.0-cdh5.7.0/sbin
    [hadoop@localhost sbin]$ ./stop-yarn.sh
    stopping yarn daemons
    stopping resourcemanager
    localhost: stopping nodemanager
    no proxyserver to stop
    [hadoop@localhost sbin]$ jps
    2353 DataNode
    80224 Jps
    2228 NameNode
    2522 SecondaryNameNode
    [hadoop@localhost sbin]$ ./start-yarn.sh
    starting yarn daemons
    starting resourcemanager, logging to /www/hadoop/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-resourcemanager-localhost.localdomain.out
    localhost: starting nodemanager, logging to /www/hadoop/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-nodemanager-localhost.localdomain.out
    [hadoop@localhost mapreduce]$ jps
    2353 DataNode
    80336 ResourceManager
    2228 NameNode
    81510 Jps
    2522 SecondaryNameNode
    63582 JobHistoryServer
    80447 NodeManager

启动jobhistory服务

    [hadoop@localhost sbin]$ ./mr-jobhistory-daemon.sh start historyserver
    starting historyserver, logging to /www/hadoop/hadoop-2.6.0-cdh5.7.0/logs/mapred-hadoop-historyserver-localhost.localdomain.out

运行PI测试

    [hadoop@localhost sbin]$ cd /www/hadoop/hadoop-2.6.0-cdh5.7.0/share/hadoop/mapreduce
    [hadoop@localhost mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar pi 1 2

访问报错

进入日志查看,发现仍然无法访问 jobhistory

jobhistory

jobhistory

报未开启聚合的错误,同时发现指向跳转链接为localhost jobhistory

增加配置

因此在yarn-site.xml加入配置

    <!-- 开启YARN log的聚合功能-->
	<property>
		<name>yarn.log-aggregation-enable</name>
		<value>true</value>
	</property>

同时,为了让虚拟机的跳转链接不为localhost,此处再加入一段配置,制定IP

      <property>  
         <name>yarn.nodemanager.hostname</name>  
         <value>192.168.79.129</value>  
      </property>  

jobhistory

重启服务

再次重启YARN,并同时重启jobhistory服务

    [hadoop@localhost sbin]$ ./mr-jobhistory-daemon.sh stop historyserver
    stopping historyserver
    [hadoop@localhost sbin]$ ./stop-yarn.sh
    stopping yarn daemons
    stopping resourcemanager
    localhost: stopping nodemanager
    no proxyserver to stop
    [hadoop@localhost sbin]$ ./start-yarn.sh
    starting yarn daemons
    starting resourcemanager, logging to /www/hadoop/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-resourcemanager-localhost.localdomain.out
    localhost: starting nodemanager, logging to /www/hadoop/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-nodemanager-localhost.localdomain.out
    [hadoop@localhost sbin]$ ./mr-jobhistory-daemon.sh start historyserver
    starting historyserver, logging to /www/hadoop/hadoop-2.6.0-cdh5.7.0/logs/mapred-hadoop-historyserver-localhost.localdomain.out

【注意】此时若只重启YARN就算配置正确,仍然可能出现Aggregation is not enabled的错误

验证

再次执行PI运算

    [hadoop@localhost sbin]$ cd /www/hadoop/hadoop-2.6.0-cdh5.7.0/share/hadoop/mapreduce
    [hadoop@localhost mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar pi 1 2

查看Logs记录 jobhistory

jobhistory

已经有日志记录,配置成功! jobhistory