github.com/bingoohuang/gg@v0.0.0-20240325092523-45da7dee9335/pkg/logline/README.md (about) 1 # 日志行解析 2 3 ## 解析的设计思路 4 5 本解析的设计,为说人话的方式进行,避免诸如[Logstash Grok Patterns](https://coralogix.com/blog/logstash-grok-tutorial-with-examples/) 等的复杂形式。 6 7 日志行解析设计思路的五大原则: 8 9 1. 对照原则:样本与模式对照书写,模式中的#对应的样本字符为锚定符 10 2. 锚定原则:需要捕获锚定符之间的值时,给定一个标识符(例如ip,time),如果不需要取值则使用空格略过 11 3. 命名原则:值名称为time时表示日期时间,对应的样本中的时间值,要修改成golang的[时间格式 layout](https://golang.org/src/time/format.go) 12 4. 转换原则:竖线表示转换过滤器,目前仅支持path过滤器,就是从uri(带query)中取出path(不带query) 13 5. 类型原则:捕获标识符对应的样本值为整数时会解析成int类型,为小数时会解析成float64类型 14 15 ### 示例1 16 17 ```go 18 // pattern="%h %l %u %t %r %s %b %S %D %T %F %{Referer}i %{X-Forwarded-For}i %{User-Agent}i %{X-Real-IP}i" 19 const samplee = `127.0.0.1 - - [02/Jan/2006:15:04:05 -0700] GET /path?indent=true HTTP/1.1 200 41824 - 8 0.008 6 - - Nginx/1.1` 20 const pattern = `ip # # ##time ##method#uri|path # #code#bytesSent#-#millis#seconds#` 21 ``` 22 23 对于上面的样本(samplee)与模式(pattern),上下是对照的。在模式中使用`#`来指定样本对对应的锚定字符,然后在锚定字符之间,通过命名来获取对应的样本中的信息。 获取的值可以通过`|` 24 符号,建立转换规则,对取值进行转换处理。取值类型由样本中对应的示例值给出(目前支持字符串、整型、日期时间、浮点四种) 25 26 ### 示例2 27 28 ```go 29 // pattern: '%h %l %u %t "%r" %s %b "%{Referer}i" "%{User-Agent}i" %D' 30 const samplee := "10.1.6.1 - - [02/Jan/2006:15:04:05 -0700] !HEAD / HTTP/1.0! 200 94 !-! !-! 0 " 31 const pattern := "ip # # #time # #method#path|path# ##code#bytesSent## # # ##millis" 32 ``` 33 34 ## [tomcat access log 格式设置](https://qsli.github.io/2016/12/23/tomcat-access-log/) 35 36 ### Tomcat access log 日志格式 37 38 1. 文件位置: conf/server.xml 39 2. 默认配置 40 41 ```xml 42 <!-- Access log processes all example. 43 Documentation at: /docs/config/valve.html 44 Note: The pattern used is equivalent to using pattern="common" --> 45 <Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs" 46 prefix="localhost_access_log." suffix=".txt" 47 pattern="%h %l %u %t "%r" %s %b"/> 48 ``` 49 50 名称|含义 51 ---|--- 52 %a|Remote IP address 53 %A|Local IP address 54 %b|Bytes sent, excluding HTTP headers, or ‘-‘ if zero 55 %B|Bytes sent, excluding HTTP headers 56 %h|Remote host name (or IP address if enableLookups for the connector is false) 57 %H|Request protocol 58 %l|Remote logical username from identd (always returns ‘-‘) 59 %m|Request method (GET, POST, etc.) 60 %p|Local port on which this request was received 61 %q|Query string (prepended with a ‘?’ if it exists) 62 %r|First line of the request (method and request URI) 63 %s|HTTP status code of the response 64 %S|User session ID 65 %t|Date and time, in Common Log Format 66 %u|Remote user that was authenticated (if any), else ‘-‘ 67 %U|Requested URL path 68 %v|Local server name 69 %D|Time taken to process the request, in millis 70 %T|Time taken to process the request, in seconds 71 %F|Time taken to commit the response, in millis 72 %I|Current request thread name (can compare later with stacktraces 73 74 默认的配置打出来的access日志如下: 75 76 > 127.0.0.1 - - [07/Oct/2016:22:31:56 +0800] "GET /dubbo/ HTTP/1.1" 404 963 77 78 > 远程IP logicalUsername remoteUser 时间和日期 http请求的第一行 状态码 除去http头的发送大小 79 80 ### header、cookie、session其他字段的支持 81 82 > There is also support to write information incoming or outgoing headers, cookies, session or request attributes and special timestamp formats. It is modeled after the Apache HTTP Server log configuration syntax: 83 84 名称|含义 85 ---|--- 86 %{xxx}i|for incoming headers 87 %{xxx}o|for outgoing response headers 88 %{xxx}c|for a specific cookie 89 %{xxx}r|xxx is an attribute in the ServletRequest 90 %{xxx}s|xxx is an attribute in the HttpSession 91 %{xxx}t|xxx is an enhanced SimpleDateFormat pattern 92 93 例如: `%{X-Forwarded-For}i` 即可打印出实际访问的ip地址(考虑到ng的反向代理) 94 95 HTTP头一般格式如下: 96 97 `X-Forwarded-For: client1, proxy1, proxy2` 98 99 > 其中的值通过一个 逗号+空格 把多个IP地址区分开, 最左边(client1)是最原始客户端的IP地址, 代理服务器每成功收到一个请求,就把请求来源IP地址添加到右边。 在上面这个例子中,这个请求成功通过了三台代理服务器:proxy1, proxy2 及 proxy3。请求由client1发出,到达了proxy3(proxy3可能是请求的终点)。请求刚从client1中发出时,XFF是空的,请求被发往proxy1; 通过proxy1的时候,client1被添加到XFF中,之后请求被发往proxy2;通过proxy2的时候,proxy1被添加到XFF中,之后请求被发往proxy3; 通过proxy3时,proxy2被添加到XFF中,之后请求的的去向不明,如果proxy3不是请求终点,请求会被继续转发。 100 101 > 鉴于伪造这一字段非常容易,应该谨慎使用X-Forwarded-For字段。正常情况下XFF中最后一个IP地址是最后一个代理服务器的IP地址, 这通常是一个比较可靠的信息来源。 102 103 ### 参考 104 105 1. [Apache Tomcat 7 The Valve Component](http://tomcat.apache.org/tomcat-7.0-doc/config/valve.html) 106 1. [X-Forwarded-For](https://zh.wikipedia.org/wiki/X-Forwarded-For)