How Can I Prevent Lxml From Auto-closing Empty Elements When Serializing To String?
I am parsing a huge xml file which contains many empty elements such as When serializing with etree.tostring(root_element, pretty_print=True)
Solution 1:
Here is a way to do it. Ensure that the text
value for all empty elements is not None
.
Example:
from lxml import etree
XML = """
<root>
<MemoryEnv></MemoryEnv>
<AlsoEmpty></AlsoEmpty>
<foo>bar</foo>
</root>"""
doc = etree.fromstring(XML)
for elem in doc.iter():
if elem.text == None:
elem.text = ''print etree.tostring(doc)
Output:
<root><MemoryEnv></MemoryEnv><AlsoEmpty></AlsoEmpty><foo>bar</foo></root>
An alternative is to use the write_c14n()
method to write canonical XML (which does not use the special empty-element syntax) to a file.
from lxml import etree
XML = """
<root>
<MemoryEnv></MemoryEnv>
<AlsoEmpty></AlsoEmpty>
<foo>bar</foo>
</root>"""
doc = etree.fromstring(XML)
doc.getroottree().write_c14n("out.xml")
Solution 2:
Using XML method (c14n) for printing and it works with lxml, it does not collapse empty elements.
>>>from lxml import etree>>>s = "<MemoryEnv></MemoryEnv>">>>root_element = etree.XML(s)>>>etree.tostring(root_element, method="c14n")
b'<MemoryEnv></MemoryEnv>'
Post a Comment for "How Can I Prevent Lxml From Auto-closing Empty Elements When Serializing To String?"