Most Pythonic Way To Find The Sibling Of An Element In Xml
DEFINITION
This There are a couple of ways of doing this, but by relying on xpath to do most of the work, this expression should work. Using lxml: Output: This, these. You can use BeatifulSoup with CSS selectors for this task. The selector Prints: Further reading EDIT: To select direct sibling after the DEFINITION: Prints:Solution 1:
//*[@class='p_cat_heading'][contains(text(),'DEFINITION')]/following-sibling::*[1]
from lxml import html
data = [your snippet above]
exp = "//*[@class='p_cat_heading'][contains(text(),'DEFINITION')]/following-sibling::*[1]"
tree = html.fromstring(data)
target = tree.xpath(exp)
for i in target:
print(i.text_content())
Solution 2:
.p_cat_heading:contains("DEFINITION") ~ .p_cat_heading will select all elements with class p_cat_heading that are preceded by element with class p_cat_heading containing string "DEFINITION":data = '''
<pclass="p_cat_heading">THIS YOU DONT WANT</p><pclass="p_numberedbullet"><spanclass="calibre10">This</span>, <spanclass="calibre10">these</span>. </p><pclass="p_cat_heading">DEFINITION</p><pclass="p_numberedbullet"><spanclass="calibre10">This</span>, <spanclass="calibre10">these</span>. </p><pclass="p_cat_heading">PRONUNCIATION </p>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'lxml')
for heading in soup.select('.p_cat_heading:contains("DEFINITION") ~ .p_cat_heading'):
print(heading.text)
PRONUNCIATION
data = '''
<pclass="p_cat_heading">THIS YOU DONT WANT</p><pclass="p_numberedbullet"><spanclass="calibre10">This</span>, <spanclass="calibre10">these</span>. </p><pclass="p_cat_heading">DEFINITION</p><pclass="p_numberedbullet"><spanclass="calibre10">This is after DEFINITION</span>, <spanclass="calibre10">these</span>. </p><pclass="p_cat_heading">PRONUNCIATION </p><pclass="p_numberedbullet"><spanclass="calibre10">This is after PRONUNCIATION</span>, <spanclass="calibre10">these</span>. </p>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'lxml')
s = soup.select_one('.p_cat_heading:contains("DEFINITION") + :not(.p_cat_heading)')
print(s.text)
This is after DEFINITION, these.
Post a Comment for "Most Pythonic Way To Find The Sibling Of An Element In Xml"