首页猿问 BeautifulSoup 解析...

BeautifulSoup 解析 html 列表

Python

慕田峪9158850 2024-01-27 15:30:54

我是解析新手..我有一个简单的 html，没有类属性列表，例如： <h2><a href="..">Title 1</a></h2> <ol> <li>Line 1..</li> <li>Line 2...</li> ... </ol> <h2><a href="..">Title 2</a></h2> <ol> <li>Line 2-1..</li> <li>Line 2-2...</li> ... </ol>...等等..我运行这段代码：import requestsfrom bs4 import BeautifulSoup as BSr = requests.get('http://...')html = BS(r.content, 'html.parser')H2 = html.find_all('h2')for h2 in H2: title = h2.text print(title)获取标题..但是我如何<ol>在同一循环中获取分配给该标题的列表？

查看完整描述

2 回答

largeQ

TA贡献2039条经验获得超8个赞

一个简单的方法是使用zip。尝试：

import requests

from bs4 import BeautifulSoup as BS

source = '''

<h2><a href="..">Title 1</a></h2>

<ol>

</ol>

<h2><a href="..">Title 2</a></h2>

<ol>

</ol>

'''

html = BS(source, 'html.parser')

for title, element in zip(html.find_all('h2'), html.find_all('ol')):

print(title.text, element.text)

结果：

Title 1

Line 1..

Line 2...

Title 2

Line 2-1..

Line 2-2...

注意：如果数量不同，可以用itertools.zip_longest代替zip。

反对回复 2024-01-27

HUX布斯

TA贡献1876条经验获得超6个赞

另一个解决方案：您可以使用.find_previous：

from bs4 import BeautifulSoup

txt = '''

<h2><a href="..">Title 1</a></h2>

<ol>

...

</ol>

<h2><a href="..">Title 2</a></h2>

<ol>

...

</ol>

'''

soup = BeautifulSoup(txt, 'html.parser')

out = {}

for li in soup.select('ol li'):

out.setdefault(li.find_previous('h2').text, []).append(li.text)

print(out)

印刷：

{'Title 1': ['Line 1', 'Line 2'],

'Title 2': ['Line 2-1', 'Line 2-2']}

反对回复 2024-01-27

2 回答
0 关注
355 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

BeautifulSoup 解析 html 列表

BeautifulSoup 解析 html 列表

2 回答

添加回答