Get Text From Mixed Element Xml Tags With Elementtree
I'm using ElementTree to parse an XML document that I have. I am getting the text from the u tags. Some of them have mixed content that I need to filter out or keep as text. Two ex
Solution 1:
The lost text bits, "¿Sí?" and "A mí no me suena.", are available as the tail
property of each <vocal>
element (the text following the element's end tag).
Here is a way to get the wanted output (tested with Python 2.7).
Assume that vocal.xml looks like this:
<root><u><vocaltype="filler"><desc>eh</desc></vocal>¿Sí?
</u><u>Pues...
<vocaltype="non-ling"><desc>laugh</desc></vocal>A mí no me suena.
</u></root>
Code:
from xml.etree import ElementTree as ET
root = ET.parse("vocal.xml")
for u in root.findall(".//u"):
v = u.find("vocal")
if v.get("type") == "filler":
frags = [u.text, v.findtext("desc"), v.tail]
else:
frags = [u.text, v.tail]
print " ".join(t.encode("utf-8").strip() for t in frags).strip()
Output:
eh ¿Sí?
Pues... A mí no me suena.
Post a Comment for "Get Text From Mixed Element Xml Tags With Elementtree"