网络抓取实时数据

Html5

qq_笑_17 2024-01-11 16:44:10

我目前正在尝试从雅虎财经页面抓取实时股市数据。我用的是bs4。我当前的问题是，每当我运行脚本时，它都无法正确更新以反映股票的当前价格。如果有人对如何改变有任何建议，我们将不胜感激。import requestsfrom bs4 import BeautifulSoupwhile True: page = requests.get("https://nz.finance.yahoo.com/quote/NZDUSD=X?p=NZDUSD=X") soup = BeautifulSoup(page.text, "html.parser") price = soup.find("div", {"class": "My(6px) Pos(r) smartphone_Mt(6px)"}).find("span").text print(price)

查看完整描述

1 回答

守着一只汪

TA贡献1872条经验获得超3个赞

单独使用 BS4 是不可能的

该网站特别使用JavaScript来更新页面和urlib等。仅解析页面的html内容而不是Java Script或AJAX内容。PhantomJs 或 Selenium Web 浏览器提供了一种更加机械化的浏览器，通常可以运行支持动态网站的 JavaScript 代码。尝试使用这个:)

使用 Selenium 可以这样做：

from selenium import webdriver #its the library

import time

from selenium.webdriver.common.keys import Keys

from bs4 import BeautifulSoup as soup

#it Says that we are going to Use chrome browser

chrome_options = webdriver.ChromeOptions()

#hiding the Chrome Browser

chrome_options.add_argument("--headless")

#Initiating Chrome with all properties we need (in this case we use no specific properties

driver = webdriver.Chrome(chrome_options=chrome_options,executable_path='C:/Users/shary/Downloads/chromedriver.exe')

#URL We need to open

url = 'https://nz.finance.yahoo.com/quote/NZDUSD=X?p=NZDUSD=X'

#Starting Our Browser

driver = webdriver.Chrome()

#Accessing the url .. this will open the page just as you open in Chrome etc.

driver.get(url)

while 1:

#it will get you the html content repeatedly .. So you can get the changing price

html = driver.page_source

page_soup = soup(html,features="lxml")

price = page_soup.find("div", {"class": "D(ib) Mend(20px)"}).text

print(price)

time.sleep(5)

反对回复 2024-01-11

热搜

最近搜索清空

网络抓取实时数据

网络抓取实时数据

1 回答

添加回答