Skip to content Skip to sidebar Skip to footer

How To Use Scrapy With An Internet Connection Through A Proxy With Authentication

My internet connection is through a proxy with authentication and when i try to run scraoy library to make the more simple example, for example : scrapy shell http://stackoverflow.

Solution 1:

Scrapy supports proxies by using HttpProxyMiddleware:

This middleware sets the HTTP proxy to use for requests, by setting the proxy meta value to Request objects. Like the Python standard library modules urllib and urllib2, it obeys the following environment variables:

  • http_proxy
  • https_proxy
  • no_proxy

Also see:

Solution 2:

Repeating the answer by Mahmoud M. Abdel-Fattah, because the page is not available now. Credit goes to him, however, I made slight modifications.

If middlewares.py already exist, add the following code into it.

classProxyMiddleware(object):
    # overwrite process requestdefprocess_request(self, request, spider):
        # Set the location of the proxy
        request.meta['proxy'] = "http://YOUR_PROXY_IP:PORT"# Use the following lines if your proxy requires authentication
        proxy_user_pass = "USERNAME:PASSWORD"# setup basic authentication for the proxy
        encoded_user_pass = base64.encodestring(proxy_user_pass.encode())
        #encoded_user_pass = base64.encodestring(proxy_user_pass)
        request.headers['Proxy-Authorization'] = 'Basic ' + \
            str(encoded_user_pass)

In settings.py file, add the following code

    DOWNLOADER_MIDDLEWARES = {
    'project_name.middlewares.ProxyMiddleware': 100,
}

This should work by setting http_proxy. However, In my case, I'm trying to access a URL with HTTPS protocol, need to set https_proxy which I'm still investigating. Any lead on that will be of great help.

Post a Comment for "How To Use Scrapy With An Internet Connection Through A Proxy With Authentication"