代码
GitHub链接 开源工具类UserAgentParser地址
日志来源
根据之前的作业提交系统项目的后台获取,为了本地测试,故先从数据库中备份下来并做了适当的处理,仅保留UserAgent的相关信息
本地文件测试
使用开源工具类解析UserAgent中的信息
@Test
public void testReadFile() throws Exception {
String path = "F:\\JAVA Workspace\\hadoopstudy\\access.log";
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(new File(path))));
String line = "";
int count = 0;
//模拟MapReduce存储
Map browerMap = new HashMap();
UserAgentParser userAgentParser = new UserAgentParser();
while (line != null) {
line = reader.readLine();
count++;
if (StringUtils.isNotBlank(line)) {
String source = line;
UserAgent agent = userAgentParser.parse(source);
//测试
String browser = agent.getBrowser();
String engine = agent.getEngine();
String engineVersion = agent.getEngineVersion();
String os = agent.getOs();
String platform = agent.getPlatform();
boolean mobile = agent.isMobile();
Integer browserValue = browerMap.get(browser);
if (browserValue != null) {
browerMap.put(browser, browerMap.get(browser) + 1);
} else {
browerMap.put(browser, 1);
}
//输出解析的信息
System.out.println(browser + "," + engine + "," + engineVersion + "," + os + "," + platform + "," + mobile);
}
}
System.out.println("总记录数:" + count);
System.out.println("====================================");
for (Map.Entryentry :browerMap.entrySet()){
System.out.println(entry.getKey()+":"+entry.getValue());
}
}
执行单元测试
使用MapReduce统计
代码以WordCount为原型,结合本地测试的代码编写 ... boolean mobile = agent.isMobile();
if (mobile){
mobileuser = "手机用户";
}else {
mobileuser = "非手机用户";
}
//通过上下文把map的处理结果输出
context.write(new Text(browser), one);
...
打包运行
将工程用mvn assembly:assembly
命令把插件一起打包,上传至虚拟机,同时将log上传至HDFS的根目录
[hadoop@localhost testFile]$ hadoop fs -put access.log /
18/04/11 06:12:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@localhost testFile]$ hadoop fs -ls /
18/04/11 06:12:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 9 items
-rw-r--r-- 1 hadoop supergroup 73 2018-04-11 00:30 /PartitionerTest.txt
-rw-r--r-- 1 hadoop supergroup 26888 2018-04-11 06:12 /access.log
drwxr-xr-x - hadoop supergroup 0 2018-04-08 10:56 /hdfsapi
-rw-r--r-- 1 hadoop supergroup 60 2018-04-10 23:03 /hello.txt
drwxrwx--- - hadoop supergroup 0 2018-04-11 01:28 /history
drwxr-xr-x - hadoop supergroup 0 2018-04-11 00:35 /output
drwxr-xr-x - hadoop supergroup 0 2018-04-08 10:35 /test
drwx------ - hadoop supergroup 0 2018-04-11 01:36 /tmp
drwxr-xr-x - hadoop supergroup 0 2018-04-09 20:11 /user
运行
结果与本地测试一致 [hadoop@localhost testFile]$ hadoop fs -ls /logaccess/browserout 18/04/11 06:15:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 2 items -rw-r--r-- 1 hadoop supergroup 0 2018-04-11 06:14 /logaccess/browserout/_SUCCESS -rw-r--r-- 1 hadoop supergroup 32 2018-04-11 06:14 /logaccess/browserout/part-r-00000 [hadoop@localhost testFile]$ hadoop fs -text /logaccess/browserout/part-r-00000 18/04/11 06:15:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Chrome 191 Firefox 10 Unknown 1 增加其他属性重新运行MapReduce,结果如下 [hadoop@localhost testFile]$ hadoop fs -ls /logaccess 18/04/11 06:33:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 2 items -rw-r--r-- 1 hadoop supergroup 0 2018-04-11 06:32 /logaccess/_SUCCESS -rw-r--r-- 1 hadoop supergroup 192 2018-04-11 06:32 /logaccess/part-r-00000 [hadoop@localhost testFile]$ hadoop fs -text /logaccess/part-r-00000 18/04/11 06:33:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 20100101 10 537.36 191 604.3.5 1 Android 48 Chrome 191 Firefox 10 Gecko 10 Linux 48 Unknown 1 Webkit 192 Windows 296 Windows 7 10 iPhone 1 iPhone OS 11.1 1 手机用户 49 非手机用户 153 日志离线统计完成!