使用 BeautifulSoup 计算 HTML 页面中的子字符串数

Python

PIPIONE 2023-03-08 15:43:41

我需要使用 BeautifulSoup 模块查找并计算所有“python”和“c++”单词作为 HTML 代码中的子字符串。在维基百科中，这些词相应地出现了 1 次和 9 次。为什么我的代码写0和0？from urllib.request import urlopen, urlretrievefrom bs4 import BeautifulSoupresp = urlopen("https://stepik.org/media/attachments/lesson/209717/1.html") html = resp.read().decode('utf8') soup = BeautifulSoup(html, 'html.parser') table = soup.find('table', attrs = {'class' : 'wikitable sortable'})cnt = 0for tr in soup.find_all("python"): cnt += 1print(cnt)cnt1 = 0for tr in soup.find_all("c++"): cnt += 1print(cnt)

查看完整描述

1 回答

慕码人8056858

TA贡献1803条经验获得超6个赞

你做错了你需要使用字符串参数来搜索任何字符串

# These will only work in case like these Python

soup.find_all(string="Python")

# Not in these python or Python is best

#We can use regex to fix that they will work in substring cases

soup.find_all(string=re.compile("[cC]\+\+"))

soup.find_all(string=re.compile("[Pp]ython"))

反对回复 2023-03-08

热搜

最近搜索清空

使用 BeautifulSoup 计算 HTML 页面中的子字符串数

使用 BeautifulSoup 计算 HTML 页面中的子字符串数

1 回答

添加回答