Skip to content Skip to sidebar Skip to footer

Python Regular Expressions Ending At "

I'm trying to take a long sting and extract all the urls it contains. page.findall(r'http://.+') is what I have, but that doesn't result in what I want. The urls are all wrapped

Solution 1:

There are very complex url-parsing regexes out there, but if you want to stop at a ", just use [^\"]+ for the url part.

Or switch to a single-quoted string and remove the \.

Also, if you have https mixed in, it will break, so you might want to just go with

page.findall(r'"(http[^"]+)"')

But now we're getting into url-parsing regexes.

Solution 2:

It is better to use a non greedy expression here instead of using [^\"]+. That way your regex would be r'"http://.+?"'. The question mark after the plus makes it so that it finds to the first encounter of a double quote.

Post a Comment for "Python Regular Expressions Ending At ""