Skip to content Skip to sidebar Skip to footer

Normalize-space Just Works With Xpath Not Css Selector

i am extracting data using scrapy and python. the data sometimes include spaces. i was using normalize-space with xpath to remove those spaces like this: xpath('normalize-space(.//

Solution 1:

Unfortunately, XPath functions are not available with CSS selectors in Scrapy.

You could first translate your div[class=location]::text CSS selector to the equivalent XPath expression and then wrap it in normalize-space() as input to .xpath().

Anyhow, as you are only interested in a final "whitespace-normalized" string, you could achieve the same with a Python function on the output of the CSS selector extract.

See for example http://snipplr.com/view/50410/normalize-whitespace/ :

def normalize_whitespace(str):
    import re
    str = str.strip()
    str = re.sub(r'\s+', ' ', str)
    return str

If you include this function somewhere in your Scrapy project, you could use it like this:

    car['Location'] = normalize_whitespace(
        u''.join(site.css('div[class=location]::text').extract()))

or

    car['Location'] = normalize_whitespace(
        site.css('div[class=location]::text').extract()[0])

Solution 2:

css() compiles an xpath, so you can chain it to a xpath() normalising the spaces, so change your code to:

car['Location'] = site.css('normalize-space(div[class=location])').xpath('normalize-space(text())').extract()

Post a Comment for "Normalize-space Just Works With Xpath Not Css Selector"