首页猿问有没有办法创建 XML 元素树？

有没有办法创建 XML 元素树？

Python

慕的地6264312 2024-01-24 16:19:33

我目前正在编写一些 XSD 和 DTD 来验证一些 XML 文件，我正在手工编写它们，因为我在使用 XSD 生成器（例如 Oxygen）时有过非常糟糕的体验。但是，我已经有一个需要执行此操作的示例 XML，并且该 XML 非常巨大，例如，我有一个包含 4312 个子元素的元素。由于我对 XSD 生成器的体验非常糟糕，因此我想创建一种仅包含唯一标签和属性的 XML 树，这样在查看要编写的 XML 时我不必处理重复元素一个XSD。我的意思是，我有这个 XML（由 W3 提供）：<?xml version="1.0" encoding="UTF-8"?><breakfast_menu><food some_attribute="1.0"> <name>Belgian Waffles</name> <price>$5.95</price> <description> Two of our famous Belgian Waffles with plenty of real maple syrup </description> <calories>650</calories></food><food> <name>Strawberry Belgian Waffles</name> <price>$7.95</price> <description> Light Belgian waffles covered with strawberries and whipped cream </description> <calories>900</calories></food><food> <name>Berry-Berry Belgian Waffles</name> <price>$8.95</price> <description> Belgian waffles covered with assorted fresh berries and whipped cream </description> <calories>900</calories></food><food> <name>French Toast</name> <price>$4.50</price> <description> Thick slices made from our homemade sourdough bread </description> <calories>600</calories> <some_complex_type_element_1> <some_simple_type_element_1>Text.</some_simple_type_element_1> </some_complex_type_element_1></food><food> <name>Homestyle Breakfast</name> <price>$6.95</price> <description> Two eggs, bacon or sausage, toast, and our ever-popular hash browns </description> <calories>950</calories> <some_simple_type_element_2>Text.</some_simple_type_element_2></food></breakfast_menu>正如您所看到的，根元素下有 4 种类型的独特元素。这些都是：元素 1（有属性），元素 2 和 3，元素 4（有另一个复杂类型元素），元素 5（有另一个 simpleType 元素）。我想要实现的是此 XML 的某种树表示，但仅包含唯一元素且不包含文本。

查看完整描述

1 回答

小唯快跑啊

TA贡献1863条经验获得超2个赞

看看这是否满足您的需求。

from simplified_scrapy import SimplifiedDoc, utils

xml = '''

<?xml version="1.0" encoding="UTF-8"?>

<breakfast_menu>

<name>Belgian Waffles</name>

Two of our famous Belgian Waffles with plenty of real maple syrup

</description>

</food>

<food>

<name>Strawberry Belgian Waffles</name>

Light Belgian waffles covered with strawberries and whipped cream

</description>

</food>

<food>

<name>Berry-Berry Belgian Waffles</name>

Belgian waffles covered with assorted fresh berries and whipped cream

</description>

</food>

<food>

<name>French Toast</name>

Thick slices made from our homemade sourdough bread

</description>

<some_complex_type_element_1>

<some_simple_type_element_1>Text.</some_simple_type_element_1>

</some_complex_type_element_1>

</food>

<food>

<name>Homestyle Breakfast</name>

Two eggs, bacon or sausage, toast, and our ever-popular hash browns

</description>

<some_simple_type_element_2>Text.</some_simple_type_element_2>

</food>

</breakfast_menu>

'''

def loop(node):

para = {}

for k in node:

if k=='tag' or k=='html': continue

para[k] = ''

if para: node.setAttrs(para) # Remove attributes

children = node.children

if children:

for c in children:

loop(c)

else:

if node.text:

node.setContent('') # Remove value

doc = SimplifiedDoc(xml)

# Remove values and attributes

loop(doc.breakfast_menu)

dicNode = {}

for node in doc.breakfast_menu.children:

key = node.outerHtml

if dicNode.get(key):

node.remove() # Delete duplicate

else:

dicNode[key] = True

print(doc.html)

结果：

<?xml version="1.0" encoding="UTF-8"?>

<breakfast_menu>

</food>

<food>

</food>

<food>

<some_complex_type_element_1>

<some_simple_type_element_1></some_simple_type_element_1>

</some_complex_type_element_1>

</food>

<food>

<some_simple_type_element_2></some_simple_type_element_2>

</food>

</breakfast_menu>

对于大文件，请尝试以下方法。

from simplified_scrapy import SimplifiedDoc, utils

from simplified_scrapy.core.regex_helper import replaceReg

filePath = 'test.xml'

doc = SimplifiedDoc()

doc.loadFile(filePath, lineByline=True)

utils.appendFile('dest.xml','<?xml version="1.0" encoding="UTF-8"?><breakfast_menu>')

dicNode = {}

for node in doc.getIterable('food'):

key = node.outerHtml

key = replaceReg(key, '>[^>]*?<', '><')

key = replaceReg(key, '"[^"]*?"', '""')

if not dicNode.get(key):

dicNode[key] = True

utils.appendFile('dest.xml', key)

utils.appendFile('dest.xml', '</breakfast_menu>')

反对回复 2024-01-24

1 回答
0 关注
28 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

有没有办法创建 XML 元素树？

有没有办法创建 XML 元素树？

1 回答

添加回答