Skip to content Skip to sidebar Skip to footer

URLDecoding Requests

I am trying to get the original url from requests. Here is what I have so far: res = requests.get(...) url = urllib.unquote(res.url).decode('utf8') I then get an error that says:

Solution 1:

UnicodeEncodeError: 'ascii' codec can't encode characters

You are trying to decode a string that is Unicode already. It raises AttributeError on Python 3 (unicode string has no .decode() method there). Python 2 tries to encode the string into bytes first using sys.getdefaultencoding() ('ascii') before passing it to .decode('utf8') which leads to UnicodeEncodeError.

In short, do not call .decode() on Unicode strings, use this instead:

print urllib.unquote(res.url.encode('ascii')).decode('utf-8')

Without .decode() call, the code prints bytes (assuming a bytestring is passed to unquote()) that may lead to mojibake if the character encoding used by your environment is not utf-8. To avoid mojibake, always print Unicode (don't print text as bytes), do not hardcode the character encoding of your environment inside your script i.e., .decode() is necessary here.


There is a bug in urllib.unquote() if you pass it a Unicode string:

>>> print urllib.unquote(u'​%C3%A4')
ä
>>> print urllib.unquote('​%C3%A4') # utf-8 output
ä

Pass bytestrings to unquote() on Python 2.


Post a Comment for "URLDecoding Requests"