Skip to content Skip to sidebar Skip to footer

Split Twitter Rss String Using Python

I am trying to parse Twitter RSS feeds and put the information in a sqlite database, using Python. Here's an example: MiamiPete: today's 'Last Call' is now up http://bit.ly/MGDzu #

Solution 1:

It seems like your data-driven design is rather flawed. Unless all your entries have a text part, an url and up to 4 tags, it's not going to work.

You also need to separate saving to db from parsing. Parsing could be easily done with a regexep (or even string methods):

>>> s = your_string
>>> s.split()
['MiamiPete:', "today's", '"Last', 'Call"', 'is', 'now', 'up', 'http://bit.ly/MGDzu', '#stocks', '#stockmarket', '#finance', '#money']
>>> url = [i for i in s.split() if i.startswith('http://')]
>>> url
['http://bit.ly/MGDzu']
>>> tags = [i for i in s.split() if i.startswith('#')]
>>> tags
['#stocks', '#stockmarket', '#finance', '#money']
>>> ' '.join(i for i in s.split() if i notin url+tags)
'MiamiPete: today\'s "Last Call" is now up'

Single-table db design would probably have to go, though.

Solution 2:

Also, you can parse your strings using regexps:

>>> s = (u'MiamiPete: today\'s "Last Call" is now up http://bit.ly/MGDzu ''#stocks #stockmarket #finance #money')
>>> re.match(r'(.*) (http://[^ ]+)', s).groups()
(u'MiamiPete: today\'s "Last Call" is now up', u'http://bit.ly/MGDzu')
>>> re.findall(r'(#\w+)', s)
[u'#stocks', u'#stockmarket', u'#finance', u'#money']

Solution 3:

Twitter has an api that may be easier for you to use here, http://apiwiki.twitter.com/Twitter-API-Documentation.

You can get the results as JSON or XML and use one of the many Python libraries to parse the results.

Or if you must your the RSS there are Python feed parsers like, http://www.feedparser.org/.

Solution 4:

I would highly recommend using the Twitter API. There are actually two APIs, one for the main twitter server and one for the search server. They are used for different things.

You can find sample code, pytwitter on svn. Add simplejson and you can be doing very powerful things in a matter of minutes.

Good luck

Post a Comment for "Split Twitter Rss String Using Python"