为了账号安全,请及时绑定邮箱和手机立即绑定

从网页上的锚标记访问详细信息

从网页上的锚标记访问详细信息

人到中年有点甜 2022-06-22 17:46:52
我正在抓取一个网页,我已经设法使用 selenium 将表中的数据提取到一个 csv 文件中。我正在努力的是从表格每一行上的锚标签中获取信息。我尝试单击表格的所有锚标记以从相应的 URL 获取信息,但在单击第一个 URL 后它停止了。它给出了一个错误消息:过时的元素引用:元素未附加到页面文档。我不确定这是解决这个问题的正确方法。这是我迄今为止尝试过的代码。如果代码格式不正确,我很抱歉,我是 python 和 stackoverflow 的新手。 import csv import requests import time from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC browser = webdriver.Chrome(executable_path=r"D:\jewel\chromedriver.exe") browser.get(('https://e-sourcingni.bravosolution.co.uk/web/login.shtml')) signInButton = browser.find_element_by_css_selector(".only") signInButton.click() time.sleep(5) table = browser.find_element_by_css_selector(".list-table") for a in browser.find_elements_by_css_selector(".detailLink"):  a.click()  time.sleep(2)  browser.execute_script("window.history.go(-1)")  time.sleep(2) with open('output.csv', "w") as f:   writer = csv.writer(f)   writer.writerow(["S.No","Status","Organization","Project Title","First Publishing Date","Work Category","Listing Deadline"])  for row in table.find_elements_by_css_selector('tr'):    writer.writerow([d.text for d in row.find_elements_by_css_selector('td')]) browser.close()我需要的是从具有类 detailLink 的标签的 href 中获取数据。我无法找到适当的方法来执行此操作。
查看完整描述

2 回答

?
猛跑小猪

TA贡献1858条经验 获得超8个赞

我使用普通的 for 循环来迭代表而不是 for each 循环。试试这个,让我知道它是怎么回事。


import csv

import time

from selenium import webdriver


browser = webdriver.Chrome('/usr/local/bin/chromedriver')  # Optional argument, if not specified will search path.

browser.implicitly_wait(5)


browser.execute_script("window.open('about:blank','tab1');")

browser.switch_to.window("tab1")

browser.get(('https://e-sourcingni.bravosolution.co.uk/web/login.shtml'))

signInButton = browser.find_element_by_css_selector(".only")

signInButton.click()

time.sleep(5)

table = browser.find_element_by_css_selector(".list-table")

links=browser.find_elements_by_css_selector(".detailLink")

for i in range(len(links)):

    links=browser.find_elements_by_css_selector(".detailLink") 

    links[i].click()

    time.sleep(2)

    browser.execute_script("window.history.go(-1)")

    time.sleep(2)


with open('output.csv', "w") as f:

    writer = csv.writer(f)

    writer.writerow(["S.No","Status","Organization","Project Title","First Publishing Date","Work Category","Listing Deadline"])

    table=browser.find_elements_by_xpath("//table[@class='list-table']//tr")

    for row in range(len(table)):

        x=[]

        for d in browser.find_elements_by_xpath("//table[@class='list-table']//tr["+str(row)+"]//td"):

            x.append(d.text.encode('utf-8'))

        writer.writerow(x)



browser.close()


查看完整回答
反对 回复 2022-06-22
?
海绵宝宝撒

TA贡献1809条经验 获得超8个赞

是的,因为您移动到下一页,因为您更改了页面,它无法在上一页上找到该元素。你可以试试这个


import csv

import requests

import time

from selenium import webdriver

from selenium.webdriver.common.by import By

from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.support import expected_conditions as EC


 browser = webdriver.Chrome(executable_path=r"D:\jewel\chromedriver.exe")


browser.execute_script("window.open('about:blank','tab1');")

browser.switch_to.window("tab1")

browser.get("https://e-sourcingni.bravosolution.co.uk/web/login.shtml")

signInButton = browser.find_element_by_css_selector(".only")

signInButton.click()

time.sleep(5)

table = browser.find_element_by_css_selector(".list-table")


for a in table.find_elements_by_tag_name("a"):

    try:

        if a.get_attribute("class") == "detailLink":

            id = a.get_attribute("onclick")

            id = id.replace("javascript:goToDetail('","")

            id = id.replace("', '02260');stopEventPropagation(event);", "")

            a_href = a.get_attribute("href")

            browser.execute_script("window.open('about:blank','tab2');")

            browser.switch_to.window("tab2")

            browser.get("https://e-sourcingni.bravosolution.co.uk/esop/toolkit/opportunity/opportunityDetail.do?opportunityId="+ id +"&oppList=CURRENT")


            time.sleep(2)

            #wait for the element to load


            browser.switch_to.window("tab1")

            # print("in it ")

    except:

        print("detailLink is not present in the a tag class")


with open('output.csv', "w") as f:

    writer = csv.writer(f)

    writer.writerow(["S.No","Status","Organization","Project Title","First Publishing Date","Work Category","Listing Deadline"])

    for row in table.find_elements_by_css_selector('tr'):

        writer.writerow([d.text for d in row.find_elements_by_css_selector('td')])



browser.close()


查看完整回答
反对 回复 2022-06-22
  • 2 回答
  • 0 关注
  • 140 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
微信客服

购课补贴
联系客服咨询优惠详情

帮助反馈 APP下载

慕课网APP
您的移动学习伙伴

公众号

扫描二维码
关注慕课网微信公众号