How To Add New Colum To Scrapy Output From Csv?
I parse websites and it works fine but I need to add new colum with IDs to output. That column is saved in csv with urls: https://www.ceneo.pl/48523541, 1362 https://www.ceneo.pl/4
Solution 1:
You can get ID
in start_requests
and assign to request using meta={'id': id_}
and later in parse
you can get ID
using response.meta['id']
.
This way you will have correct ID
in parse
.
I use string data
instead of file to create working example.
#!/usr/bin/env python3import scrapy
data = '''https://www.ceneo.pl/48523541, 1362
https://www.ceneo.pl/46374217, 2457'''classQuotesSpider(scrapy.Spider):
name = "quotes"defstart_requests(self):
#f = open('urls.csv', 'r')
f = data.split('\n')
for row in f:
url, id_ = row.split(',')
url = url.strip()
id_ = id_.strip()
#print(url, id_)# use meta to assign value yield scrapy.Request(url=url, callback=self.parse, meta={'id': id_})
defparse(self, response):
# use meta to receive value
id_ = response.meta["id"]
all_prices = response.xpath('(//td[@class="cell-price"] /a/span/span/span[@class="value"]/text())[position() <= 10]').extract()
all_sellers = response.xpath('(//tr/td/div/ul/li/a[@class="js_product-offer-link"]/text())[position()<=10]').extract()
all_sellers = [ item.replace('Opinie o ', '') for item in all_sellers ]
for price, seller inzip(all_prices, all_sellers):
yield {'urlid': id_, 'price': price.strip(), 'seller': seller.strip()}
# --- it runs without project and saves in `output.csv` ---from scrapy.crawler import CrawlerProcess
c = CrawlerProcess({
'USER_AGENT': 'Mozilla/5.0',
'FEED_FORMAT': 'csv',
'FEED_URI': 'output.csv',
})
c.crawl(QuotesSpider)
c.start()
BTW: there is standard function id()
so I use variable id_
instead of id
Post a Comment for "How To Add New Colum To Scrapy Output From Csv?"