为了账号安全,请及时绑定邮箱和手机立即绑定

Python 高效地从 XML 中提取嵌套元素

Python 高效地从 XML 中提取嵌套元素

慕斯709654 2024-01-16 10:46:32
我正在尝试解析大量包含大量嵌套元素的 XML 文件,以收集稍后使用的特定信息。由于文件数量巨大,我试图尽可能高效地完成此操作,以减少处理时间。我可以使用 xpath 提取所需的信息,如下所示,但似乎效率很低。特别是必须运行第二个 for 循环来使用另一个 xpath 搜索提取结果值。我可以使用更有效的方法来获得下面所需的输出吗?我可以通过单个 xpath 查询收集所需的信息吗?所需的解析格式:Id             Object    Type             ResultPackages       total     totalPackages    1200DeliveryMethod priority  packagesSent     100DeliveryMethod express   packagesSent     200DeliveryMethod ground    packagesSent     300DeliveryMethod priority  packagesReceived 100DeliveryMethod express   packagesReceived 200DeliveryMethod ground    packagesReceived 300XML 示例:<?xml version="1.0" encoding="utf-8"?>    <Data>        <Location localDn="Chicago"/>        <Info Id="Packages">            <job jobId="1"/>            <Type pos="1">totalPackages</Type>            <Value Object="total">                <result pos="1">1200</result>            </Value>        </Info>        <Info Id="DeliveryMethod">            <job jobId="1"/>            <Type pos="1">packagesSent</Type>            <Type pos="2">packagesReceived</Type>            <Value Object="priority">                <result pos="1">100</result>                <result pos="2">100</result>            </Value>            <Value Object="express">                <result pos="1">200</result>                <result pos="2">200</result>            </Value>            <Value Object="ground">                <result pos="1">300</result>                <result pos="2">300</result>            </Value>        </Info>  </Data>是否可以通过迭代来获取所有信息tree.xpath('//*')?
查看完整描述

2 回答

?
慕村225694

TA贡献1880条经验 获得超4个赞

其中一项优化不会像您现在使用tree.xpath('//*')if 语句那样遍历所有标签并进行检查。这可以替换为tree.xpath('//Type')


接下来需要优化的是迭代值。Value您无需一遍又一遍地迭代( tree.xpath('//Value')),您可以获得标签的所有同级Values标签Typeelem.xpath('./following-sibling::Value')


from lxml import etree


xml_file = open('stack_sample.xml')

tree = etree.parse(xml_file)

root = tree.getroot()


for elem in tree.xpath('//Type'):

    _id = elem.getparent().attrib["Id"]

    _type = elem.text

    _position = elem.attrib["pos"]

    values = elem.xpath('./following-sibling::Value')

    for value in values:

        _object = value.attrib['Object']

        _result = value.xpath(f'./result[@pos={_position}]/text()')[0]

        print(_id, _type, _object, _result)

这将打印出:


Packages totalPackages total 1200

DeliveryMethod packagesSent priority 100

DeliveryMethod packagesSent express 200

DeliveryMethod packagesSent ground 300

DeliveryMethod packagesReceived priority 100

DeliveryMethod packagesReceived express 200

DeliveryMethod packagesReceived ground 300

编辑

这是针对特定情况的解决方案,其中我们确定resultinValue标签的数量等于与其他解决方案Type是同级的标签的数量Value,另外解决方案假设Type和result按相同pos属性排序。


请记住,这是一种非常具体的解决方案,而不是通用的解决方案。


from lxml import etree


xml_file = open('stack_sample.xml')

tree = etree.parse(xml_file)

root = tree.getroot()


for elem in tree.xpath('//Type'):

    _id = elem.getparent().attrib["Id"]

    _type = elem.text

    _objects = elem.xpath('./following-sibling::Value/@Object')

    _results = elem.xpath('./following-sibling::Value/result/text()')

    for _object, _result in zip(_objects, _results):

            print(_id, _type, _object, _result)

输出:


Packages totalPackages total 1200

DeliveryMethod packagesSent priority 100

DeliveryMethod packagesSent express 100

DeliveryMethod packagesSent ground 200

DeliveryMethod packagesReceived priority 100

DeliveryMethod packagesReceived express 100

DeliveryMethod packagesReceived ground 200


查看完整回答
反对 回复 2024-01-16
?
茅侃侃

TA贡献1842条经验 获得超21个赞

//*如果您不迭代所有标签( ),而只是迭代,也许性能会更高<Value>:


from lxml import etree


xml_file = open('stack_sample.xml')

tree = etree.parse(xml_file)

root = tree.getroot()


for val in tree.xpath('//Value'):

    t = {t.get('pos'): t.text for t in val.getparent().xpath('./Type')}

    for r in val.xpath('./result'):

        print(val.getparent().get('Id'), val.get('Object'), t[r.get('pos')], r.text)

印刷:


Packages total totalPackages 1200

DeliveryMethod priority packagesSent 100

DeliveryMethod priority packagesReceived 100

DeliveryMethod express packagesSent 200

DeliveryMethod express packagesReceived 200

DeliveryMethod ground packagesSent 300

DeliveryMethod ground packagesReceived 300


查看完整回答
反对 回复 2024-01-16
  • 2 回答
  • 0 关注
  • 35 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信