首页猿问无法在Python中的Beauti...

无法在Python中的Beautiful Soup中获取div标签，

Python

翻阅古今 2023-12-29 17:02:24

我正在尝试下载官方网站上提供的所有口袋妖怪图像。我这样做的原因是因为我想要高质量的图像。以下是我编写的代码。from bs4 import BeautifulSoup as bs4import requestsrequest = requests.get('https://www.pokemon.com/us/pokedex/')soup = bs4(request.text, 'html')print(soup.findAll('div',{'class':'container pokedex'}))输出是[]我做错了什么吗？另外，从官方网站抓取合法吗？有没有任何标签或东西可以说明这一点？谢谢PS：我是 BS 和 html 的新手。

查看完整描述

2 回答

噜噜哒

TA贡献1784条经验获得超7个赞

图像是动态加载的，因此您必须使用selenium它们来抓取它们。这是执行此操作的完整代码：

from selenium import webdriver

import time

import requests

driver = webdriver.Chrome()

driver.get('https://www.pokemon.com/us/pokedex/')

time.sleep(4)

li_tags = driver.find_elements_by_class_name('animating')[:-3]

li_num = 1

for li in li_tags:

img_link = li.find_element_by_xpath('.//img').get_attribute('src')

name = li.find_element_by_xpath(f'/html/body/div[4]/section[5]/ul/li[{li_num}]/div/h5').text

r = requests.get(img_link)

with open(f"D:\\{name}.png", "wb") as f:

f.write(r.content)

li_num += 1

driver.close()

输出：

12张口袋妖怪图片。这是前两张图片：

图片1：

图片2：

另外，我注意到页面底部有一个加载更多按钮。单击时，它会加载更多图像。单击“加载更多”按钮后，我们必须继续向下滚动才能加载更多图像。如果我没记错的话，网站上一共有 893 张图片。为了抓取所有 893 张图像，您可以使用以下代码：

from selenium import webdriver

import time

import requests

driver = webdriver.Chrome()

driver.get('https://www.pokemon.com/us/pokedex/')

time.sleep(3)

load_more = driver.find_element_by_xpath('//*[@id="loadMore"]')

driver.execute_script("arguments[0].click();",load_more)

lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")

match=False

while(match==False):

lastCount = lenOfPage

time.sleep(1.5)

lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")

if lastCount==lenOfPage:

match=True

li_tags = driver.find_elements_by_class_name('animating')[:-3]

li_num = 1

for li in li_tags:

img_link = li.find_element_by_xpath('.//img').get_attribute('src')

name = li.find_element_by_xpath(f'/html/body/div[4]/section[5]/ul/li[{li_num}]/div/h5').text

r = requests.get(img_link)

with open(f"D:\\{name}.png", "wb") as f:

f.write(r.content)

li_num += 1

driver.close()

反对回复 2023-12-29

元芳怎么了

TA贡献1798条经验获得超7个赞

如果您首先检查网络选项卡，这可能会更容易完成：

import time

import requests

endpoint = "https://www.pokemon.com/us/api/pokedex/kalos"

# contains all metadata

data = requests.get(endpoint).json()

# collect keys needed to save the picture

items = [{"name": item["name"], "link": item["ThumbnailImage"]} for item in data]

# remove duplicates

d = [dict(t) for t in {tuple(d.items()) for d in items}]

assert len(d) == 893

for pokemon in d:

response = requests.get(pokemon["link"])

time.sleep(1)

with open(f"{pokemon['name']}.png", "wb") as f:

f.write(response.content)

反对回复 2023-12-29

2 回答
0 关注
491 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

无法在Python中的Beautiful Soup中获取div标签，

无法在Python中的Beautiful Soup中获取div标签，

2 回答

添加回答