使用hive分析nginx访问日志方法

发布时间:2017-7-1 11:36:00编辑:www.fx114.net 分享查询网我要评论
本篇文章主要介绍了"使用hive分析nginx访问日志方法 ",主要涉及到使用hive分析nginx访问日志方法 方面的内容,对于使用hive分析nginx访问日志方法 感兴趣的同学可以参考一下。

以下案例是使用hive分析nginx的访问日志案例,其中字段分隔通过正则表达式匹配,具体步骤如下:

日志格式:
192.168.5.139 - - [08/Jun/2017:17:09:12 +0800] "GET //oportal/static/ui/layer/skin/default/icon.png HTTP/1.1" 200 9905 http://192.168.100.126//oportal/static/ui/layer/skin/layer.css "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36" -
192.168.5.139 - - [08/Jun/2017:17:09:25 +0800] "GET //oportal/page/homepage/images/icon-02.png HTTP/1.1" 200 1322 http://192.168.100.126//dsfdsal/page/homepage/css/indet.css "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36" -
192.168.5.139 - - [08/Jun/2017:17:09:25 +0800] "GET /dsfdortal/page/waittodo/waittodo.jsp?registesfsdame=%25E7%25BB%25BC%25E5%sdf2590%2588%25E9%25A2%2584%25E7%25AE%25sdf97&registerAsdfsdppid=bsdfsdas,ssdfsdfpf,bsdfsdgt,insdfsddi,hqrsdfdseport,hqosdfa,hqsfdsbi&resdfgisterId=FD748AA3sd82851A37F1693D3880C844EF&allviewsdfnum=10&appSource=undefined&tokenid=5728A0ED7998CC84B88FE8717A33FAB8aK79UkfS&waittodoNums=0&showway=0 HTTP/1.1" 200 3121 http://192.168.100.126//fposdfsdrtal/page/homdsfdepage/homepage.jsp?tokenid=5728A0ED7998CC84B88FE8717A33FAB8aK79UkfS "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36" -
192.168.5.139 - - [08/Jun/2017:17:09:25 +0800] "GET //fposdfgfrtal/page/waittodo/css/db_index.css HTTP/1.1" 200 6310 http://192.168.100.126/fpdsfdsfortal/page/waittodo/waittodo.jsp?registerName=%25E7%25BB%25BC%25E5%2590%2588%25E9%25A2%2584%25E7%25AE%2597&registerAppid=bas,spf,bgt,indi,hqreport,hqoa,hqbi&registerId=FD748AA382851A37F1693D3880C844EF&allviewnum=10&appSource=undefined&tokenid=5728A0ED7998CC84B88FE8717A33FAB8aK79UkfS&waittodoNums=0&showway=0 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36" -

正则匹配:
测试网站:http://wpjam.qiniudn.com/tool/regexpal/

([^ |^\n]*) ([^ ]*) ([^ ]*) (\[.*\]) (\".*?\") (-|[0-9]*) (-|[0-9]*) (\".*?\") (\".*?\") (-)

建立原表:

drop table if exists chavin.nginx_access_log;
CREATE TABLE chavin.nginx_access_log(
   host STRING,
   identity STRING,
   user STRING,
   time STRING,
   request STRING,
   status STRING,
   size STRING,
   referer STRING,
   agent STRING,
   other STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
   "input.regex" = "([^ |^\n]*) ([^ ]*) ([^ ]*) (\\[.*\\]) (\".*?\") (-|[0-9]*) (-|[0-9]*) (\".*?\") (\".*?\") (-)",
   "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s %10$s"
)
STORED AS TEXTFILE;

清除access.log日志中空白行:

sed -i '/^$/d' access.log

加载access.log日志数据到hive中:

load data local inpath '/opt/datas/access.log' overwrite into table chavin.nginx_access_log;

查询数据,进行验证:

select * from chavin.nginx_access_log limit 5;

接下来可以建立业务子表,进行定向分析了。


上一篇:问题 1058: C二级辅导-求偶数和
下一篇:ArcGIS Engine开发之地图基本操作(1)

相关文章

相关评论

本站评论功能暂时取消,后续此功能例行通知。

一、不得利用本站危害国家安全、泄露国家秘密,不得侵犯国家社会集体的和公民的合法权益,不得利用本站制作、复制和传播不法有害信息!

二、互相尊重,对自己的言论和行为负责。

好贷网好贷款