已解决430363个问题，去搜搜看，总会有你想问的

使用 Python 解析 XML：将文本保留在属性内，同时删除其周围的标签

首页猿问使用 Python 解析...

使用 Python 解析 XML：将文本保留在属性内，同时删除其周围的标签

Python

潇潇雨雨 2022-07-05 19:45:21

Input:<milestone n="14" unit="verse" /> The name of the third river is<placeName key="tgn,1130850" authname="tgn,1130850">Hiddekel</placeName>: this is the one which flows in front of Assyria. The fourthriver is the <placeName key="tgn,1123842" authname="tgn,1123842">Euphrates</placeName>. 期望的输出：<milestone n="14" unit="verse" /> The name of the third river is Hiddekel: this is the one which flows in front of Assyria. The fourth river is the Euphrates. 您好，我想找到一种方法来从子元素 ( placeName) 中提取文本并将其放回较大的文本正文中。我在 XML 文件的其他地方也有类似的问题，例如人名。我希望能够在不摆脱里程碑的情况下提取名称和地点。谢谢您的帮助！当前代码：for p in chapter.findall('p'): i = 1 for text in p.itertext(): file.write(body.attrib["n"] + " " + chapter.attrib["n"] + ":" + str(i) + text) i = i + 1

查看完整描述

1 回答

繁华开满天机

TA贡献1816条经验获得超4个赞

可以用beautifulsoup和unwrap()方法来完成：

from bs4 import BeautifulSoup as bs

snippet = """your html above"""

soup = bs(snippet,'lxml')

pl = soup.find_all('placename')

for p in pl:

p.unwrap()

soup

输出：

The name of the third river is

Hiddekel: this is the one which flows in front of Assyria. The fourth

river is the Euphrates.

</body></html>

反对回复 2022-07-05

1 回答
0 关注
271 浏览

关注

添加回答

0/150

提交

取消

微信客服

购课补贴
联系客服咨询优惠详情

帮助反馈 APP下载

慕课网APP
您的移动学习伙伴

公众号

扫描二维码
关注慕课网微信公众号

热搜

最近搜索清空

使用 Python 解析 XML：将文本保留在属性内，同时删除其周围的标签

使用 Python 解析 XML：将文本保留在属性内，同时删除其周围的标签

1 回答

添加回答