Skip to content Skip to sidebar Skip to footer

How Can I Prevent Lxml From Auto-closing Empty Elements When Serializing To String?

I am parsing a huge xml file which contains many empty elements such as When serializing with etree.tostring(root_element, pretty_print=True)

Solution 1:

Here is a way to do it. Ensure that the text value for all empty elements is not None.

Example:

from lxml import etree

XML = """
<root>
  <MemoryEnv></MemoryEnv>
  <AlsoEmpty></AlsoEmpty>
  <foo>bar</foo>
</root>"""

doc = etree.fromstring(XML)

for elem in doc.iter():
    if elem.text == None:
        elem.text = ''print etree.tostring(doc)

Output:

<root><MemoryEnv></MemoryEnv><AlsoEmpty></AlsoEmpty><foo>bar</foo></root>

An alternative is to use the write_c14n() method to write canonical XML (which does not use the special empty-element syntax) to a file.

from lxml import etree

XML = """
<root>
  <MemoryEnv></MemoryEnv>
  <AlsoEmpty></AlsoEmpty>
  <foo>bar</foo>
</root>"""

doc = etree.fromstring(XML)

doc.getroottree().write_c14n("out.xml")

Solution 2:

Using XML method (c14n) for printing and it works with lxml, it does not collapse empty elements.

>>>from lxml import etree>>>s = "<MemoryEnv></MemoryEnv>">>>root_element = etree.XML(s)>>>etree.tostring(root_element, method="c14n")
b'<MemoryEnv></MemoryEnv>'

Post a Comment for "How Can I Prevent Lxml From Auto-closing Empty Elements When Serializing To String?"