为了账号安全,请及时绑定邮箱和手机立即绑定

golang中的正则表达式命名组

golang中的正则表达式命名组

Go
扬帆大鱼 2022-05-17 16:48:38
我需要帮助将正则表达式与 golang 集成。我想解析日志文件并创建一个在https://regex101.com/r/p4mbiS/1/上看起来很好的正则表达式日志线如下所示:57.157.87.86 - - [06/Feb/2020:00:11:04 +0100] "GET /?parammore=1&customer_id=1&version=1.56&param=meaningful&customer_name=somewebsite.de&some_id=4&cachebuster=1580944263903 HTTP/1.1" 204 0 "https://www.somewebsite.com/more/andheresomemore/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0"像这样的正则表达式:(?P<ip>([^\s]+)).+?\[(?P<localtime>(.*?))\].+?GET\s\/\?(?P<request>.+?)\".+?\"(?P<ref>.+?)\".\"(?P<agent>.+?)\"命名组的结果应如下所示:ip: 57.157.87.86当地时间:06/Feb/2020:00:11:04 +0100请求:parammore=1&customer_id=1&...HTTP/1.1参考:https ://www.somewebsite.com/more/andheresomemore/代理:Mozilla/5.0(Windows NT 10.0;Win64;x64;rv:72.0)...regex101.com 生成对我不起作用的 golang 代码。我试图改进它但没有成功。golang 代码只返回整个字符串而不是组。package mainimport (    "regexp"    "fmt")func main() {    var re = regexp.MustCompile(`(?P<ip>([^\s]+)).+?\[(?P<localtime>(.*?))\].+?GET\s\/\?(?P<request>.+?)\".+?\"(?P<ref>.+?)\".\"(?P<agent>.+?)\"`)    var str = `57.157.87.86 - - [06/Feb/2020:00:11:04 +0100] "GET /?parammore=1&customer_id=1&version=1.56&param=meaningful&customer_name=somewebsite.de&some_id=4&cachebuster=1580944263903 HTTP/1.1" 204 0 "https://www.somewebsite.com/more/andheresomemore/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0"`        if len(re.FindStringIndex(str)) > 0 {        fmt.Println(re.FindString(str),"found at index",re.FindStringIndex(str)[0])    }}在这里找到小提琴https://play.golang.org/p/e0_8PM-Nv6i
查看完整描述

2 回答

?
慕运维8079593

TA贡献1876条经验 获得超5个赞

单匹配解决方案


由于您定义了捕获组并需要提取它们的值,因此您需要使用.FindStringSubmatch,请参阅此 Go lang 演示:


package main


import (

    "fmt"

    "regexp"

)


func main() {

    var re = regexp.MustCompile(`(?P<ip>\S+).+?\[(?P<localtime>.*?)\].+?GET\s/\?(?P<request>.+?)".+?"(?P<ref>.+?)"\s*"(?P<agent>.+?)"`)

    var str = `57.157.87.86 - - [06/Feb/2020:00:11:04 +0100] "GET /?parammore=1&customer_id=1&version=1.56&param=meaningful&customer_name=somewebsite.de&some_id=4&cachebuster=1580944263903 HTTP/1.1" 204 0 "https://www.somewebsite.com/more/andheresomemore/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0"`

    result := make(map[string]string) 

    match := re.FindStringSubmatch(str) 

    for i, name := range re.SubexpNames() {

        if i != 0 && name != "" {

            result[name] = match[i]

        }

    }

    fmt.Printf("IP: %s\nLocal Time: %s\nRequest: %s\nRef: %s\nAgent: %s\n",result["ip"], result["localtime"], result["request"], result["ref"], result["agent"])

}

输出:


IP: 57.157.87.86

Local Time: 06/Feb/2020:00:11:04 +0100

Request: parammore=1&customer_id=1&version=1.56&param=meaningful&customer_name=somewebsite.de&some_id=4&cachebuster=1580944263903 HTTP/1.1

Ref: https://www.somewebsite.com/more/andheresomemore/

Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0

在模式中如此频繁地使用它不是一个好主意,.+?因为它会降低性能,因此我用否定字符类替换了那些点模式,并试图使模式更加冗长。


多匹配解决方案


在这里,您需要使用regexp.FindAllStringSubmatch:


请参阅此 Go 演示:


package main


import (

    "fmt"

    "regexp"

)


func main() {

    var re = regexp.MustCompile(`(?P<ip>\S+).+?\[(?P<localtime>.*?)\].+?GET\s/\?(?P<request>.+?)".+?"(?P<ref>.+?)"\s*"(?P<agent>.+?)"`)

    var str = `57.157.87.86 - - [06/Feb/2020:00:11:04 +0100] "GET /?parammore=1&customer_id=1&version=1.56&param=meaningful&customer_name=somewebsite.de&some_id=4&cachebuster=1580944263903 HTTP/1.1" 204 0 "https://www.somewebsite.com/more/andheresomemore/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0"`

    result := make([]map[string]string,0) 

    for _, match := range re.FindAllStringSubmatch(str, -1) {

        res := make(map[string]string)

        for i, name := range re.SubexpNames() {

            if i != 0 && name != "" {

                res[name] = match[i]

            }

        }

        result = append(result, res)

    }


    // Displaying the matches

    for i, match := range(result) {

        fmt.Printf("--------------\nMatch %d:\n", i+1)

        for i, name := range re.SubexpNames() {

            if i != 0 && name != "" {

                fmt.Printf("Group %s: %s\n", name, match[name]) 

            }

        }

    }

}

输出:


--------------

Match 1:

Group ip: 57.157.87.86

Group localtime: 06/Feb/2020:00:11:04 +0100

Group request: parammore=1&customer_id=1&version=1.56&param=meaningful&customer_name=somewebsite.de&some_id=4&cachebuster=1580944263903 HTTP/1.1

Group ref: https://www.somewebsite.com/more/andheresomemore/

Group agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0


查看完整回答
反对 回复 2022-05-17
?
当年话下

TA贡献1890条经验 获得超9个赞

您可以为此使用regroup https://github.com/oriser/regroup。


例子:


package main


import (

    "fmt"


    "github.com/oriser/regroup"

)


type LogEntry struct {

    IP        string `regroup:"ip"`

    LocalTime string `regroup:"localtime"`

    Request   string `regroup:"request"`

    Ref       string `regroup:"ref"`

    Agent     string `regroup:"agent"`

}


func main() {

    var re = regroup.MustCompile(`(?P<ip>([^\s]+)).+?\[(?P<localtime>(.*?))\].+?GET\s\/\?(?P<request>.+?)\".+?\"(?P<ref>.+?)\".\"(?P<agent>.+?)\"`)

    var str = `57.157.87.86 - - [06/Feb/2020:00:11:04 +0100] "GET /?parammore=1&customer_id=1&version=1.56&param=meaningful&customer_name=somewebsite.de&some_id=4&cachebuster=1580944263903 HTTP/1.1" 204 0 "https://www.somewebsite.com/more/andheresomemore/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0"`


    logEntry := &LogEntry{}

    if err := re.MatchToTarget(str, logEntry); err != nil {

        panic(err)

    }


    fmt.Printf("%#v\n", logEntry)

}


查看完整回答
反对 回复 2022-05-17
  • 2 回答
  • 0 关注
  • 214 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
微信客服

购课补贴
联系客服咨询优惠详情

帮助反馈 APP下载

慕课网APP
您的移动学习伙伴

公众号

扫描二维码
关注慕课网微信公众号