Regular Expression To Replace "escaped" Characters With Their Originals

November 17, 2024 Post a Comment

NOTE: I'm not parsing lots of or html or generic html with regex. I know that's bad TL;DR: I have strings like A sentence with an exclamation\! Next is a \* character Where there

Solution 1:

You are missing something, namely the r prefix:

r = re.compile(r"\\.") # Slash followed by anything

Both python and re attach meaning to \; your doubled backslash becomes just one backslash when you pass the string value to re.compile(), by which time re sees \., meaning a literal full stop.:

>>>print"""\\."""
\.

By using r'' you tell python not to interpret escape codes, so now re is given a string with \\., meaning a literal backslash followed by any character:

>>>printr"""\\."""
\\.

Demo:

>>>import re>>>s = "test \\* \\! test * !! **">>>r = re.compile(r"\\.") # Slash followed by anything>>>r.sub("-", s)
'test - - test * !! **'

The rule of thumb is: when defining regular expressions, use r'' raw string literals, saving you to have to double-escape everything that has meaning to both Python and regular expression syntax.

Next, you want to replace the 'escaped' character; use groups for that, re.sub() lets you reference groups as the replacement value:

r = re.compile(r"\\(.)") # Note the parethesis, that's a capturing group
r.sub(r'\1', s)          # \1 means: replace with value of first capturing group

Now the output is:

>>>r = re.compile(r"\\(.)") # Note the parethesis, that's a capturing group>>>r.sub(r'\1', s) 
'test * ! test * !! **'

Python Courses, Training, and Tutorials

Regular Expression To Replace "escaped" Characters With Their Originals

Solution 1:

Post a Comment for "Regular Expression To Replace "escaped" Characters With Their Originals"