首页猿问我使用 beautifulsoup...

我使用 beautifulsoup 的 Webscraping 代码没有超过第一页

Python

FFIVE 2022-01-18 16:59:10

它似乎没有超过第一页。怎么了？此外，如果您要查找的单词在链接中，它不会提供正确的出现，它将显示 5 个输出，其中 5 个作为出现import requests from bs4 import BeautifulSoup for i in range (1,5): url = 'https://www.nairaland.com/search/ipob/0/0/0/{}'.format(i) the_word = 'is' r = requests.get(url, allow_redirects=False) soup = BeautifulSoup(r.content, 'lxml') words = soup.find(text=lambda text: text and the_word in text) print(words) count = len(words) print('\nUrl: {}\ncontains {} occurrences of word: {}'.format(url, count, the_word))

查看完整描述

3 回答

浮云间

TA贡献1829条经验获得超4个赞

如果您想跳过前 6 页，请更改循环中的范围：

for i in range (6):   # the first page is addressed at index `0`

或者：

for i in range (0,6):

代替：

for i in range (1,5):    # this will start from the second page, since the second page is indexed at `1`

反对回复 2022-01-18

慕莱坞森

TA贡献1810条经验获得超4个赞

尝试：

import requests

from bs4 import BeautifulSoup

for i in range(6):

url = 'https://www.nairaland.com/search/ipob/0/0/0/{}'.format(i)

the_word = 'afonja'

r = requests.get(url, allow_redirects=False)

soup = BeautifulSoup(r.content, 'lxml')

words = soup.find(text=lambda text: text and the_word in text)

print(words)

count = 0

if words:

count = len(words)

print('\nUrl: {}\ncontains {} occurrences of word: {}'.format(url, count, the_word))

新规格后编辑。

假设要计数的单词与 url 中的单词相同，您可以注意到该单词在页面中突出显示，并且span class=highlight在 html 中可被识别。

所以你可以使用这个代码：

import requests

from bs4 import BeautifulSoup

for i in range(6):

url = 'https://www.nairaland.com/search/afonja/0/0/0/{}'.format(i)

the_word = 'afonja'

r = requests.get(url, allow_redirects=False)

soup = BeautifulSoup(r.content, 'lxml')

count = len(soup.find_all('span', {'class':'highlight'}))

print('\nUrl: {}\ncontains {} occurrences of word: {}'.format(url, count, the_word))

你得到：

Url: https://www.nairaland.com/search/afonja/0/0/0/0

contains 30 occurrences of word: afonja

Url: https://www.nairaland.com/search/afonja/0/0/0/1

contains 31 occurrences of word: afonja

Url: https://www.nairaland.com/search/afonja/0/0/0/2

contains 36 occurrences of word: afonja

Url: https://www.nairaland.com/search/afonja/0/0/0/3

contains 30 occurrences of word: afonja

Url: https://www.nairaland.com/search/afonja/0/0/0/4

contains 45 occurrences of word: afonja

Url: https://www.nairaland.com/search/afonja/0/0/0/5

contains 50 occurrences of word: afonja

反对回复 2022-01-18

侃侃无极

TA贡献2051条经验获得超10个赞

顺便说一句，搜索词有自己的类名，所以你可以数一下。以下正确返回页面上未找到的位置。您可以在循环中使用这种方法。

import requests

from bs4 import BeautifulSoup as bs

r = requests.get('https://www.nairaland.com/search?q=afonja&board=0&topicsonly=2')

soup = bs(r.content, 'lxml')

occurrences = len(soup.select('.highlight'))

print(occurrences)

import requests

from bs4 import BeautifulSoup as bs

for i in range(9):

r = requests.get('https://www.nairaland.com/search/afonja/0/0/0/{}'.format(i))

soup = bs(r.content, 'lxml')

occurrences = len(soup.select('.highlight'))

print(occurrences)

反对回复 2022-01-18

3 回答
0 关注
211 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

我使用 beautifulsoup 的 Webscraping 代码没有超过第一页

我使用 beautifulsoup 的 Webscraping 代码没有超过第一页

3 回答

添加回答