Python Regular Expressions Ending At "
I'm trying to take a long sting and extract all the urls it contains. page.findall(r'http://.+') is what I have, but that doesn't result in what I want. The urls are all wrapped
Solution 1:
There are very complex url-parsing regexes out there, but if you want to stop at a "
, just use [^\"]+
for the url part.
Or switch to a single-quoted string and remove the \
.
Also, if you have https
mixed in, it will break, so you might want to just go with
page.findall(r'"(http[^"]+)"')
But now we're getting into url-parsing regexes.
Solution 2:
It is better to use a non greedy expression here instead of using [^\"]+
. That way your regex would be r'"http://.+?"'
. The question mark after the plus makes it so that it finds to the first encounter of a double quote.
Post a Comment for "Python Regular Expressions Ending At ""