Parsing Twitter’s User Timeline with Python
Usually, when you want to include the tweets on a web, it is common to embed a javascript code to show it. Twitter offers a very useful widget for this purpose. http://twitter.com/about/resources/widgets/
This is the best way to show the user timeline without worries. You just copy and paste some javascript code, but sometimes you need more flexibility and copy and paste javascript widget it’s not enough.
In this post you’ll find a way to parse the Twitter’s user timeline from a json file with Python. There are a lot of ways to do that, here’s mine.
This is not a Python wrapper arround the twitter api, if you are looking for something like that, please visit http://code.google.com/p/python-twitter/
Getting the Twitter’s user time
To download the last tweets, I use the GET statuses/user_timeline function that returns a json file. I’ve created a function to download the json file and parse it. I’ve also created a Tweet class to store the tweet info.
The “read_tweets()” function parse the json file and store the tweet info in a list of Tweet instances. The Tweet class also provide some methods to store info easly.
Let’s see the code.
import Tweetimport simplejsonimport urllib2
def read_tweets(user, num_tweets): tweets = [] url = "http://api.twitter.com/1/statuses/user_timeline.json?\ screen_name=%s&count=%s&include_rts=true" % (user, num_tweets) file = urllib2.urlopen(url) content = file.read() json = simplejson.loads(content) for js_tweet in json: tweet = Tweet() tweet.id = js_tweet['id'] tweet.username = js_tweet['user']['screen_name'] try: tweet.retweet_user = js_tweet['retweeted_status']['user']['screen_name'] tweet.retweeted = True except: tweet.retweeted = False tweet.set_date(js_tweet['created_at']) #tweet.id, tweet.username must exist tweet.set_tweet_url() #convert plain text to html text tweet.set_text(js_tweet['text']) #tweet.id, tweet.username must exist tweet.set_profile_url() if tweet.retweeted: tweet.user_avatar_url = js_tweet['retweeted_status']['user']['profile_image_url'] else: tweet.user_avatar_url = js_tweet['user']['profile_image_url'] tweets.append(tweet) return tweetsIn “read_tweets()” I use urllib2 to read the file and simplejson to parse it. To store the info I use the Tweet Class methods.
import timefrom datetime import datetimeimport re
class Tweet(): """Store the tweet info """ id = None username = None url = None user_avatar_url = None tweet_url = None profile_url = None html_text = None retweeted = None retweet_user = None date = None
def set_date(self, date_str): """Convert string to datetime """ time_struct = time.strptime(date_str, "%a %b %d %H:%M:%S +0000 %Y")#Tue Apr 26 08:57:55 +0000 2011 self.date = datetime.fromtimestamp(time.mktime(time_struct)) def set_text(self, plain_text): """convert plain text into html text with http, user and hashtag links """ re_http = re.compile(r"(http://[^ ]+)") self.html_text = re_http.sub(r'\1', plain_text) re_https = re.compile(r"(https://[^ ]+)") self.html_text = re_https.sub(r'\1', self.html_text) re_user = re.compile(r'@[0-9a-zA-Z+_]*',re.IGNORECASE) for iterator in re_user.finditer(self.html_text): a_username = iterator.group(0) username = a_username.replace('@','') link = '' + a_username + '' self.html_text = self.html_text.replace(a_username, link) re_hash = re.compile(r'#[0-9a-zA-Z+_]*',re.IGNORECASE) for iterator in re_hash.finditer(self.html_text): h_tag = iterator.group(0) link_tag = h_tag.replace('#','%23') link = '' + h_tag + '' self.html_text = self.html_text.replace(h_tag + " ", link + " ") #check last tag offset = len(self.html_text) - len(h_tag) index = self.html_text.find(h_tag, offset) if index >= 0: self.html_text = self.html_text[:index] + " " + link
def set_profile_url(self): """Create the url profile """ if self.retweeted: self.profile_url = "http://www.twitter.com/%s" % self.retweet_user else: self.profile_url = "http://www.twitter.com/%s" % self.username def set_tweet_url(self): """Create the url of the tweet """ self.tweet_url = "http://www.twitter.com/%s/status/%s" % (self.username, self.id)The most important Tweet class method is “set_text()”. It converts plain text into html code with http, user and hashtag links. I use python regular expressions to find, http://xxx, https://xxx, #xxx and @xxx and replace it for a valid Twitter link.
Django advice
If you gonna use this code with Django, please, let me give you some advice.
“read_tweets()” download a json file every time you call it so please don’t use it in a view. It’ll delay the view too much. The tweets must be stored with “read_tweets()” and the view must read this stored tweets. You can store the tweets in a lot ways. I use a “manage.py” script added to cron to stored it in memcache, but It’s your choice.
If you gonna show the html code stored en Tweet.html_text in a Django template don’t forget use the safe tag.
{{ tweet.html_text|safe }}