请问为何正则表达式无法提取我要的内容?

阮壬宏 2020-5-21 1015



先上源码

 

arrayData = UiElement.DataScrap({"html":[{"tag":"MAIN"}],"wnd":[{"app":"chrome","cls":"Chrome_WidgetWin_1","title":"*"},{"cls":"Chrome_RenderWidgetHostHWND","title":"Chrome Legacy Window"}]},{"Columns":[{"props":["text"],"selecors":[{"className":"container","index":0,"prefix":"","tag":"div","value":"div.container"},{"className":"row","index":0,"prefix":">","tag":"div","value":"div.row"},{"className":"col-lg-9 main","index":0,"prefix":">","tag":"div","value":"div.col-lg-9.main"},{"className":"card card-thread","index":0,"prefix":">","tag":"div","value":"div.card.card-thread"},{"className":"card-body","index":0,"prefix":">","tag":"div","value":"div.card-body"},{"className":"message break-all","index":0,"prefix":">","tag":"div","value":"div.message.break-all"},{"index":0,"prefix":">","tag":"p","value":"p"}]}],"ExtractTable":0},{"objNextLinkElement":"","iMaxNumberOfPage":5,"iMaxNumberOfResult":-1,"iDelayBetweenMS":1000,"bContinueOnError":false})

TracePrint(arrayData)

For Each value In arrayData

 

arrRet = Regex.FindAll(value,".+?:")

TracePrint(arrRet)

Next


目标网址

https://forum.uibot.com.cn/thread-50.htm

希望将提取到数据所有标题和链结分别用正则提取为另外一个数组(变成2个数组)

并过滤掉一些不需要的数据.我只需要标题和链结

一个标题数组里面会存入

9/28更新-邮箱操作 二

9/28更新-邮箱操作 三

一个链结数组则会存入

https://forum.uibot.com.cn/thread-2853.htm

https://forum.uibot.com.cn/thread-2854.htm


arrayData返回值

[

"9/28更新-邮箱操作 二:https://forum.uibot.com.cn/thread-2853.htm"

],

[

"9/28更新-邮箱操作 三:https://forum.uibot.com.cn/thread-2854.htm"

],

希望用正则提取所有不含链结的字符串.

9/28更新-邮箱操作 二:

9/28更新-邮箱操作 三:

但是我用正则都提取不到任何数据.

请问这是哪里做错了?


最新回复 (4)
  • 财酱 2020-5-21
    2
    抓取数据后你可以遍历数组 得到 以:结尾的数据就是名字 以:开头的就是链接
  • 换个昵称 2020-5-21
    3
    arrayData = [["9/28更新-邮箱操作 二:https://forum.uibot.com.cn/thread-2853.htm"],["9/28更新-邮箱操作 三:https://forum.uibot.com.cn/thread-2854.htm"]]
    For Each 数组 In arrayData
    For Each 字符串 In 数组
    arrRet = Regex.FindAll(字符串,".+?:")
    TracePrint(arrRet)
    Next
    Next


    结果:

  • 阮壬宏 2020-5-22
    4
    财酱 抓取数据后你可以遍历数组 得到 以:结尾的数据就是名字 以:开头的就是链接
    谢谢财酱的解说.这样我懂了.
    可以再请教你一个问题吗?
    SRT ="9/28更新-邮箱操作 三:https://forum.uibot.com.cn/thread-2854.htm"
    ---我想用这样的方式提取到()內正则匹配到的內容
    arrRet = Regex.FindAll(SRT,"9/28更新-"(.+?:))
    希望返回的是"邮箱操作 三:https://forum.uibot.com.cn/thread-2854.htm"
    要如何来写呢?谢谢.
  • 阮壬宏 2020-5-22
    5
    换个昵称 arrayData = [["9/28更新-邮箱操作 二:https://forum.uibot.com.cn/thread-2853.htm"],[&qu ...
    非常感谢您.我终于弄懂了.原来因为他是[ ] 内还有包含[ ] 所以需要遍历2次才能提取到字符串.
    这样正则才能找到我们要的字符串.
返回
发新帖