How To Get Rid Of Ascii Encoding Error In Python
Solution 1:
You have a U+2026 HORIZONTAL ELLIPSIS character in your string definition:
... Deepika Padukone, Esha Gupta or Yami Gautam…. ...
^
Python requires that you declare the source code encoding if you are to use any non-ASCII characters in your source.
Your options are to:
Declare the encoding, as specified in the linked PEP 263. It's is a comment that must be the first or second line of your source file.
What you set it to depends on your code editor. If you are saving files encoded as UTF-8, then the comment looks something like:
# coding: utf-8
but the format is flexible. You can spell it
encoding
too, for example, and use=
instead of:
.Replace the horizontal ellipsis with three dots, as used in the rest of the string
- Replace the codepoint with
\xhh
escape sequences to represent encoded data. U+2026 encoded to UTF-8 is\xe2\x80\xa6
.
Solution 2:
add # coding: utf-8
to the top of your file.
# coding: utf-8
string = "Deepika Padukone, Esha Gupta or Yami Gautam - Who's looks hotter and sexier? Vote! - It's ... Deepika Padukone, Esha Gupta or Yami Gautam…. Deepika Padukone$
fp = open("test.txt", "w+");
fp.write("%s" %string);
Explanation:
The error is caused by the replacing standard characters like apostrophe (‘) by non-standard characters like quotation mark (`) during copying. It happens quite often when you copy text from a pdf file. The difference is very subtle, but there is a huge difference as far as Python is concerned. The apostrophe is completely legal to indicate a text string, but the quotation mark is not.
Technically, it’s not exactly illegal to use any kind of characters we want. It’s just that we have to tell Python what kind of encoding we are using so that it knows what to do with these non-standard characters. Adding # coding: utf-8
to the top of that file will tell python that your encoding is utf-8.
UTF-8 is an encoding format to represent the characters in the Unicode set. It is used very widely on the web. Unicode is the industry standard for representing and handling text on many different platforms including the web, enterprise software, printing etc. UTF-8 is one of the more popular ways used for encoding this character set.
Post a Comment for "How To Get Rid Of Ascii Encoding Error In Python"