Not Able To Follow Link Using Scrapy
I am not able to follow the link and get back the values. I tried using the below code I am able to crawl the first link after that it doesnt redirect to the second follow link(fun
Solution 1:
You forgot to return your Request in the parse()
method. Try this code:
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.http.request import Request
classScrapyOrgSpider(BaseSpider):
name = "example.com"
allowed_domains = ["example.com"]
start_urls = ["http://www.example.com/abcd"]
defparse(self, response):
self.log('@@ Original response: %s' % response)
req = Request("http://www.example.com/follow", callback=self.a_1)
self.log('@@ Next request: %s' % req)
return req
defa_1(self, response):
hxs = HtmlXPathSelector(response)
self.log('@@ extraction: %s' %
hxs.select("//a[@class='channel-link']").extract())
Log output:
2012-11-22 12:20:06-0600 [scrapy] INFO:Scrapy0.17.0started(bot:oneoff)2012-11-22 12:20:06-0600 [scrapy] DEBUG: Enabled extensions:LogStats,TelnetConsole,CloseSpider,WebService,CoreStats,SpiderState2012-11-22 12:20:06-0600 [scrapy] DEBUG: Enabled downloader middlewares:HttpAuthMiddleware,DownloadTimeoutMiddleware,UserAgentMiddleware,RetryMiddleware,DefaultHeadersMiddleware,RedirectMiddleware,CookiesMiddleware,HttpCompressionMiddleware,ChunkedTransferMiddleware,DownloaderStats2012-11-22 12:20:06-0600 [scrapy] DEBUG: Enabled spider middlewares:HttpErrorMiddleware,OffsiteMiddleware,RefererMiddleware,UrlLengthMiddleware,DepthMiddleware2012-11-22 12:20:06-0600 [scrapy] DEBUG: Enabled item pipelines:2012-11-22 12:20:06-0600 [example.com] INFO:Spideropened2012-11-22 12:20:06-0600 [example.com] INFO:Crawled0pages(at0pages/min),scraped0items(at0items/min)2012-11-22 12:20:06-0600 [scrapy] DEBUG:Telnetconsolelisteningon0.0.0.0:60232012-11-22 12:20:06-0600 [scrapy] DEBUG:Webservicelisteningon0.0.0.0:60802012-11-22 12:20:07-0600 [example.com] DEBUG:Redirecting(302)to<GEThttp://www.iana.org/domains/example/>from<GEThttp://www.example.com/abcd>2012-11-22 12:20:07-0600 [example.com] DEBUG:Crawled(200)<GEThttp://www.iana.org/domains/example/>(referer:None)2012-11-22 12:20:07-0600 [example.com] DEBUG:@@Original response:<200http://www.iana.org/domains/example/>2012-11-22 12:20:07-0600 [example.com] DEBUG:@@Next request:<GEThttp://www.example.com/follow>2012-11-22 12:20:07-0600 [example.com] DEBUG:Redirecting(302)to<GEThttp://www.iana.org/domains/example/>from<GEThttp://www.example.com/follow>2012-11-22 12:20:08-0600 [example.com] DEBUG:Crawled(200)<GEThttp://www.iana.org/domains/example/>(referer:http://www.iana.org/domains/example/)2012-11-22 12:20:08-0600 [example.com] DEBUG:@@extraction: []
2012-11-22 12:20:08-0600 [example.com] INFO:Closingspider(finished)
Post a Comment for "Not Able To Follow Link Using Scrapy"