为了账号安全,请及时绑定邮箱和手机立即绑定

从列表中获取特定链接

从列表中获取特定链接

慕妹3242003 2023-10-24 21:44:40
我想打印前 5 个“分组”中第二个标签的文本,并选择后五个“分组”中的第一个。我该怎么做?https://www.carehome.co.uk/care_search_results.cfm/searchunitary/Tower-Hamletsfrom urllib.request import urlopen as uReqfrom bs4 import BeautifulSoup as soupfrom selenium import webdriver#grabspage and parses it through ready for picking apartmy_url = "https://www.carehome.co.uk/care_search_results.cfm/searchunitary/Tower-Hamlets"driver = webdriver.Chrome(executable_path='C:/Users/lemonade/Documents/work/chromedriver')driver.get(my_url)page_s = soup(driver.page_source, features='html.parser')#Finds relvant divscontainers = page_s.findAll("div", {"class": "home-name"})for container in containers:    name_container = container.p    all_a = name_container.findAll("a")    print(all_a)输出:[<a name="member_21310"></a>, <a href="https://www.carehome.co.uk/carehome.cfm/searchazref/20001005SILA" style="font-weight:bold;font-size:28px">Silk Court</a>][<a name="member_35665"></a>, <a href="https://www.carehome.co.uk/carehome.cfm/searchazref/10001005FITA" style="">Westport Care Home</a>][<a name="member_34393"></a>, <a href="https://www.carehome.co.uk/carehome.cfm/searchazref/20001005ASPA" style="font-weight:bold;font-size:28px">Aspen Court Care Home</a>][<a name="member_4936"></a>, <a href="https://www.carehome.co.uk/carehome.cfm/searchazref/10001005SYDA" style="">Beaumont Court</a>][<a name="member_40189"></a>, <a href="https://www.carehome.co.uk/carehome.cfm/searchazref/20001005HAWA" style="font-weight:bold;font-size:28px">Hawthorn Green Residential and Nursing Home</a>]
查看完整描述

2 回答

?
慕雪6442864

TA贡献1812条经验 获得超5个赞

您可以使用 css 选择器来执行此操作select(),它将返回预期的输出。


from bs4 import BeautifulSoup as soup

from selenium import webdriver


#grabspage and parses it through ready for picking apart

my_url = "https://www.carehome.co.uk/care_search_results.cfm/searchunitary/Tower-Hamlets"


driver = webdriver.Chrome(executable_path='C:/Users/lemonade/Documents/work/chromedriver')

driver.get(my_url)

page_s = soup(driver.page_source, features='html.parser')

containers = page_s.select("div.home-name>p>a[href]")


for container in containers:

    print(container.text.strip())

输出:


Silk Court

Westport Care Home

Aspen Court Care Home

Beaumont Court

Hawthorn Green Residential and Nursing Home

Coxley House

Toby Lodge

Hotel in the Park

34/35 Huddleston Close

Approach Lodge


查看完整回答
反对 回复 2023-10-24
?
慕姐8265434

TA贡献1813条经验 获得超2个赞

您可以尝试以下解决方案吗:


driver.get(" https://www.carehome.co.uk/care_search_results.cfm/searchunitary/Tower-Hamlets ")


containers=WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.XPATH,"//div[contains(@class,'home-name')]//p//a[@href]")))


for container in containers:

     print container.text

https://img1.sycdn.imooc.com/6537caa000019b7908890315.jpg

查看完整回答
反对 回复 2023-10-24
  • 2 回答
  • 0 关注
  • 48 浏览

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信