3 回答
TA贡献1895条经验 获得超7个赞
使用 bs4 4.7.1 + 这很容易。您可以使用:hasand:contains获取具有包含字符串div的子项的父项,然后使用相邻的兄弟组合器获取下一个。strongPublished:div
from bs4 import BeautifulSoup
html = '''
<div class="featured-item-meta">
<div><strong>Published:</strong></div>
<div>October 14, 2015</div>
<ul class="creatorList">
<li>
<div><strong>Writer:</strong></div>
<div><a href="https://www.marvel.com/comics/creators/10329/g_willow_wilson">G. Willow Wilson</a>, <a href="https://www.marvel.com/comics/creators/12441/marguerite_bennett">Marguerite Bennett</a></div>
</li>
<li>
<div><strong>Cover Artist:</strong></div>
<div><a href="https://www.marvel.com/comics/creators/8825/jorge_molina">Jorge Molina</a></div>
</li>
</ul>
</div>
'''
soup = bs(html, 'lxml')
print(soup.select_one('div:has(strong:contains("Published:")) + div').text)
TA贡献1719条经验 获得超6个赞
这是一个解决方法
text = '<div class="featured-item-meta">\
<div><strong>Published:</strong></div>\
<div>October 14, 2015</div>\
<ul class="creatorList">\
<li>\
<div><strong>Writer:</strong></div>\
<div><a href="https://www.marvel.com/comics/creators/10329/g_willow_wilson">G. Willow Wilson</a>, <a href="https://www.marvel.com/comics/creators/12441/marguerite_bennett">Marguerite Bennett</a></div>\
</li>\
<li>\
<div><strong>Cover Artist:</strong></div>\
<div><a href="https://www.marvel.com/comics/creators/8825/jorge_molina">Jorge Molina</a></div>\
</li>\
</ul>\
</div>'
soap = BeautifulSoup(text,'html.parser')
print(soap.find('div',attrs={'class':'featured-item-meta'})\
.find_all('div')[1].text)
输出:
October 14, 2015
TA贡献1860条经验 获得超9个赞
from bs4 import BeautifulSoup as bsp
s = '''
<div class="featured-item-meta">
<div><strong>Published:</strong></div>
<div>October 14, 2015</div>
<ul class="creatorList">
<li>
<div><strong>Writer:</strong></div>
<div><a href="https://www.marvel.com/comics/creators/10329/g_willow_wilson">G. Willow Wilson</a>, <a href="https://www.marvel.com/comics/creators/12441/marguerite_bennett">Marguerite Bennett</a></div>
</li>
<li>
<div><strong>Cover Artist:</strong></div>
<div><a href="https://www.marvel.com/comics/creators/8825/jorge_molina">Jorge Molina</a></div>
</li>
</ul>
</div>
'''
print(bsp(s).find('div').findChildren('div')[1])
代码可能会根据您的完整网页及其结构略有变化。
添加回答
举报
