为了账号安全,请及时绑定邮箱和手机立即绑定

仅使用Python BS4追加/查找具有属性或包含特定字符串的文本的元素的最佳实践

仅使用Python BS4追加/查找具有属性或包含特定字符串的文本的元素的最佳实践

尚方宝剑之说 2022-09-06 16:23:17
当前 discord.py (async.io) 代码,用于打印随机纽约时报文章的链接。@client.command()async def news(ctx):    url = 'https://www.nytimes.com/section/us'    r = requests.get(url)    soup = BeautifulSoup(r.content)    articles = soup.find_all('a')    newslist = []    for article in articles:        newslist.append(article['href'])    news = random.choice(newslist)    while "2020" not in news:        news = random.choice(newslist)    else:        await ctx.send('https://www.nytimes.com' + news)由于页面中的每个链接都被附加到新闻列表中,因此在打印/等待ctx.send之前,正在检查“2020”,这意味着它是一篇文章。有没有办法做一个find_all('a')只找到包含“2020”的链接,或者只附加包含“2020”的链接?这些方法会更有效吗?
查看完整描述

1 回答

?
慕容3067478

TA贡献1773条经验 获得超3个赞

import requests

from bs4 import BeautifulSoup

import re


r = requests.get("https://www.nytimes.com/section/us")


soup = BeautifulSoup(r.content, 'html.parser')


urls = []

for item in soup.findAll("a", href=re.compile("2020")):

    item = item.get("href")

    if not item.startswith("http"):

        item = f"https://www.nytimes.com{item}"

    else:

        pass

    if item not in urls:

        urls.append(item)

        print(item)

输出:


https://www.nytimes.com/2020/03/18/us/coronavirus-immigrants.html

https://www.nytimes.com/2020/03/18/us/coronavirus-nebraska-biocontainment.html

https://www.nytimes.com/2020/03/18/us/coronavirus-janitors-cleaners.html

https://www.nytimes.com/2020/03/18/us/small-business-coronavirus-charlotte.html

https://www.nytimes.com/2020/03/19/us/politics/coronavirus-heaven-frilot-mark-frilot.html

https://www.nytimes.com/2020/03/19/us/politics/coronavirus-state-department-travel.html

https://www.nytimes.com/2020/03/19/us/politics/coronavirus-congress-voting.html

https://www.nytimes.com/2020/03/19/us/coronavirus-foster-pets.html

https://www.nytimes.com/2020/03/19/books/molly-brodak-dies.html

https://www.nytimes.com/2020/03/19/us/politics/joe-biden-vice-president.html

https://www.nytimes.com/2020/03/19/us/coronavirus-location-tracking.html

https://www.nytimes.com/2020/03/19/climate/us-flood-season-forescast.html

https://www.nytimes.com/2020/03/19/health/coronavirus-masks-shortage.html

https://www.nytimes.com/2020/03/19/health/coronavirus-travel-ban.html

https://www.nytimes.com/2020/03/19/arts/mal-sharpe-dead.html

https://www.nytimes.com/2020/03/19/business/coronavirus-unemployment-states.html

https://www.nytimes.com/2020/03/19/us/work-from-home-mothers-coronavirus-covid19.html

https://www.nytimes.com/2020/03/19/us/politics/1000-checks-coronavirus-stimulus.html

https://www.nytimes.com/news-event/2020-election


查看完整回答
反对 回复 2022-09-06
  • 1 回答
  • 0 关注
  • 142 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信