为了账号安全,请及时绑定邮箱和手机立即绑定

捕获列表标签之间的文本并从 BeautifulSoup scrape 打印

捕获列表标签之间的文本并从 BeautifulSoup scrape 打印

沧海一幻觉 2022-06-02 14:38:53
刚刚开始使用 BeautifulSoup 和 Requests 进行 Web Scraping。我正在尝试创建一个脚本,可以在此处的有序列表中抓取消息我被困在如何打印那里列出的消息的第 2 行这是我到目前为止的脚本。from bs4 import BeautifulSoup import requests res = requests.get("https://www.serenataflowers.com/pollennation/love-text-messages/")soup = BeautifulSoup(res.text, 'html.parser')  ol = soup.find('ol') print(ol.prettify())该脚本仅打印出整个文本。我该如何打印文本 2 或文本 3 等等...提前致谢。编辑:这是我运行脚本时得到的结果。C:\Users\XXXX\MyPythonScripts>scrape.py<ol class="simple-list"> <li>  Meeting you was the best day of my life. </li> <li>  When you are next to me, or when we are apart, You are always the first in my thoughts and in my heart. </li> <li>  I never ever thought I’d like you this much and I never planned to have you on my mind this often. </li> <li>  I love the way you love me. </li> <li>  Spring drops and the sun outside the window tell me that this spring will be the flowering of our love. </li> <li>  I can’t spend a day without you, can’t you see? I love you so much. You are a part of me and this is forever. </li> <li>  You make me happy in a thousand ways. I love you to the moon and back, and I have no idea what I would do, if I lost you, because I feel like I will lose my entire world. </li> <li>  Nothing is going change my love for you, you are the man, who helped me to find myself in this life. </li> <li>  I can’t imagine living a life without you. You are my reason to be. </li> <li>  The wind whispers your name, stars illuminate my way to you, we will meet soon, love you! </li> 我实际上是想打印出<li>标签上的下一个内容。
查看完整描述

3 回答

?
肥皂起泡泡

TA贡献1829条经验 获得超6个赞

尝试使用 find_all 来获取结果列表。find 只返回它找到的第一件事。尝试在 li 上查找所有内容



查看完整回答
反对 回复 2022-06-02
?
呼如林

TA贡献1798条经验 获得超3个赞

您可以使用更快的类选择器来获取父级 ol ,然后使用 nth-of-type 来隔离特定行:


import requests

from bs4 import BeautifulSoup as bs


r  = requests.get('https://www.serenataflowers.com/pollennation/love-text-messages/')

soup = bs(r.content, 'lxml')

line_number = 2

print(soup.select_one(f'.simple-list li:nth-of-type({line_number})').text)

如果在某个时候想要所有行但值得了解nth-of-type和相关技巧,则索引整个列表会更快。


查看完整回答
反对 回复 2022-06-02
?
呼唤远方

TA贡献1856条经验 获得超11个赞

要获取每个引用中的文本列表,请使用您已经隔离findAll()的块上的方法。ol


ol = soup.find('ol') 

messages = [msg.text for msg in ol.findAll('li')]  # this goes through and isolates the text of each message

现在您可以通过它们的索引访问消息。请记住,列表的索引为 0,这意味着项目 1 实际上 = 0。


print(messages[0]) # Actually the first message

# output: Meeting you was the best day of my life.


查看完整回答
反对 回复 2022-06-02
  • 3 回答
  • 0 关注
  • 165 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
微信客服

购课补贴
联系客服咨询优惠详情

帮助反馈 APP下载

慕课网APP
您的移动学习伙伴

公众号

扫描二维码
关注慕课网微信公众号