请问为何正则表达式无法提取我要的内容？-开发问题-UiBot RPA技术开发交流社区 - RPA机器人流程自动化

请问为何正则表达式无法提取我要的内容？

阮壬宏 2020-5-21 1015

先上源码

arrayData = UiElement.DataScrap({"html":[{"tag":"MAIN"}],"wnd":[{"app":"chrome","cls":"Chrome_WidgetWin_1","title":"*"},{"cls":"Chrome_RenderWidgetHostHWND","title":"Chrome Legacy Window"}]},{"Columns":[{"props":["text"],"selecors":[{"className":"container","index":0,"prefix":"","tag":"div","value":"div.container"},{"className":"row","index":0,"prefix":">","tag":"div","value":"div.row"},{"className":"col-lg-9 main","index":0,"prefix":">","tag":"div","value":"div.col-lg-9.main"},{"className":"card card-thread","index":0,"prefix":">","tag":"div","value":"div.card.card-thread"},{"className":"card-body","index":0,"prefix":">","tag":"div","value":"div.card-body"},{"className":"message break-all","index":0,"prefix":">","tag":"div","value":"div.message.break-all"},{"index":0,"prefix":">","tag":"p","value":"p"}]}],"ExtractTable":0},{"objNextLinkElement":"","iMaxNumberOfPage":5,"iMaxNumberOfResult":-1,"iDelayBetweenMS":1000,"bContinueOnError":false})

TracePrint(arrayData)

For Each value In arrayData

arrRet = Regex.FindAll(value,".+?：")

TracePrint(arrRet)

目标网址

https://forum.uibot.com.cn/thread-50.htm

希望将提取到数据所有标题和链结分别用正则提取为另外一个数组（变成2个数组）

并过滤掉一些不需要的数据.我只需要标题和链结

一个标题数组里面会存入

9/28更新-邮箱操作二

9/28更新-邮箱操作三

一个链结数组则会存入

https://forum.uibot.com.cn/thread-2853.htm

https://forum.uibot.com.cn/thread-2854.htm

arrayData返回值

[

"9/28更新-邮箱操作二：https://forum.uibot.com.cn/thread-2853.htm"

[

"9/28更新-邮箱操作三：https://forum.uibot.com.cn/thread-2854.htm"

希望用正则提取所有不含链结的字符串.

9/28更新-邮箱操作二：

9/28更新-邮箱操作三：

但是我用正则都提取不到任何数据.

请问这是哪里做错了？

最新回复 (4)

财酱 2020-5-21

2楼

抓取数据后你可以遍历数组得到以：结尾的数据就是名字以：开头的就是链接
换个昵称 2020-5-21

3楼

arrayData = [["9/28更新-邮箱操作二：https://forum.uibot.com.cn/thread-2853.htm"],["9/28更新-邮箱操作三：https://forum.uibot.com.cn/thread-2854.htm"]]
For Each 数组 In arrayData
For Each 字符串 In 数组
arrRet = Regex.FindAll(字符串,".+?：")
TracePrint(arrRet)
Next
Next

结果：
阮壬宏 2020-5-22

4楼

财酱抓取数据后你可以遍历数组得到以：结尾的数据就是名字以：开头的就是链接
谢谢财酱的解说.这样我懂了.
可以再请教你一个问题吗？
SRT ="9/28更新-邮箱操作三：https://forum.uibot.com.cn/thread-2854.htm"
---我想用这样的方式提取到()內正则匹配到的內容
arrRet = Regex.FindAll(SRT,"9/28更新-"(.+?：))
希望返回的是"邮箱操作三：https://forum.uibot.com.cn/thread-2854.htm"
要如何来写呢？谢谢.
阮壬宏 2020-5-22

5楼

换个昵称 arrayData = [["9/28更新-邮箱操作二：https://forum.uibot.com.cn/thread-2853.htm"],[&qu ...
非常感谢您.我终于弄懂了.原来因为他是[ ] 内还有包含[ ] 所以需要遍历2次才能提取到字符串.
这样正则才能找到我们要的字符串.

发新帖

阮壬宏

主题数
23

帖子数
23

精华数
0

注册排名
314108