3 回答
TA贡献1829条经验 获得超4个赞
如果您想跳过前 6 页,请更改循环中的范围:
for i in range (6): # the first page is addressed at index `0`
或者:
for i in range (0,6):
代替:
for i in range (1,5): # this will start from the second page, since the second page is indexed at `1`
TA贡献1810条经验 获得超4个赞
尝试:
import requests
from bs4 import BeautifulSoup
for i in range(6):
url = 'https://www.nairaland.com/search/ipob/0/0/0/{}'.format(i)
the_word = 'afonja'
r = requests.get(url, allow_redirects=False)
soup = BeautifulSoup(r.content, 'lxml')
words = soup.find(text=lambda text: text and the_word in text)
print(words)
count = 0
if words:
count = len(words)
print('\nUrl: {}\ncontains {} occurrences of word: {}'.format(url, count, the_word))
新规格后编辑。
假设要计数的单词与 url 中的单词相同,您可以注意到该单词在页面中突出显示,并且span class=highlight在 html 中可被识别。
所以你可以使用这个代码:
import requests
from bs4 import BeautifulSoup
for i in range(6):
url = 'https://www.nairaland.com/search/afonja/0/0/0/{}'.format(i)
the_word = 'afonja'
r = requests.get(url, allow_redirects=False)
soup = BeautifulSoup(r.content, 'lxml')
count = len(soup.find_all('span', {'class':'highlight'}))
print('\nUrl: {}\ncontains {} occurrences of word: {}'.format(url, count, the_word))
你得到:
Url: https://www.nairaland.com/search/afonja/0/0/0/0
contains 30 occurrences of word: afonja
Url: https://www.nairaland.com/search/afonja/0/0/0/1
contains 31 occurrences of word: afonja
Url: https://www.nairaland.com/search/afonja/0/0/0/2
contains 36 occurrences of word: afonja
Url: https://www.nairaland.com/search/afonja/0/0/0/3
contains 30 occurrences of word: afonja
Url: https://www.nairaland.com/search/afonja/0/0/0/4
contains 45 occurrences of word: afonja
Url: https://www.nairaland.com/search/afonja/0/0/0/5
contains 50 occurrences of word: afonja
TA贡献2051条经验 获得超10个赞
顺便说一句,搜索词有自己的类名,所以你可以数一下。以下正确返回页面上未找到的位置。您可以在循环中使用这种方法。
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://www.nairaland.com/search?q=afonja&board=0&topicsonly=2')
soup = bs(r.content, 'lxml')
occurrences = len(soup.select('.highlight'))
print(occurrences)
import requests
from bs4 import BeautifulSoup as bs
for i in range(9):
r = requests.get('https://www.nairaland.com/search/afonja/0/0/0/{}'.format(i))
soup = bs(r.content, 'lxml')
occurrences = len(soup.select('.highlight'))
print(occurrences)
添加回答
举报
