02 November 2012

Saving your tweets in a database using sqlite, rhino, scribe, javascript

In the current post, I 'll describe a simple method to save your tweets in a sqlite database using Mozilla Rhino.

Prerequisites

  • sqlite
  • Apache Rhino. I think it should be de-facto available when the java developer toolkit (JDK) is installed
  • Scribe, the simple OAuth library for Java . It also requires Apache codec

The config.js file

Open an account on https://dev.twitter.com/ and create an App to receive an API-key and an API-secret.
Create the following file 'config.js' filled with the correct parameters.

The javascript

The following javascript file opens a Oauth connection, retrieves the tweets and stores them into sqlite. I've commented the code, I hope it is clear enough.

Running the script using Rhino

scribe.libs=/path/to/scribe-1.3.2.jar:/path/to/commons-codec.jar
rhino.libs=/usr/share/java/js.jar:/usr/share/java/jline.jar
sqlite.libs=/path/to/sqlitejdbc-v056.jar
CLASSPATH=${rhino.libs}:${scribe.libs}:${sqlite.libs}

java -cp ${CLASSPATH} org.mozilla.javascript.tools.shell.Main -f twitter2sqlite.js
At the first time, the user is asked to authorize the application to use the twitter API

The script runs forever (Ctrl-C to break), listening to the new tweets.

As a test, I wrote the following tweet:


... and the tweet was later inserted in the database...

Sleep...

Inserted ({created_at:"Fri Nov 02 20:29:04 +0000 2012", id:264464160664981500, id_str:"264464160664981504", text:"wrote a tool to save my tweets: This is a test . ( #rhino, #jdbc, #sqlite, #scribe #javascript )", source:"web", truncated:false, in_reply_to_status_id:null, in_reply_to_status_id_str:null, in_reply_to_user_id:null, in_reply_to_user_id_str:null, in_reply_to_screen_name:null, geo:null, coordinates:null, place:null, contributors:null, retweet_count:0, entities:{hashtags:[{text:"rhino", indices:[51, 57]}, {text:"jdbc", indices:[59, 64]}, {text:"sqlite", indices:[66, 73]}, {text:"scribe", indices:[75, 82]}, {text:"javascript", indices:[83, 94]}], urls:[], user_mentions:[]}, favorited:false, retweeted:false})

Sleep...
Sleep...
Sleep...

Later, the tweets can be extracted using the sqlite command line:

$  sqlite3 tweets.sqlite 'select * from tweet'

264464160664981504|({created_at:"Fri Nov 02 20:29:04 +0000 2012", id:264464160664981500, id_str:"264464160664981504", text:"wrote a tool to save my tweets: This
264421310841638913|({created_at:"Fri Nov 02 17:38:47 +0000 2012", id:264421310841638900, id_str:"264421310841638913", text:"The tools for recalibration have cha
264264932097400832|({created_at:"Fri Nov 02 07:17:24 +0000 2012", id:264264932097400830, id_str:"264264932097400832", text:"@warandpeace you're welcome. Your sh
264158323287416832|({created_at:"Fri Nov 02 00:13:46 +0000 2012", id:264158323287416830, id_str:"264158323287416832", text:"Drawing of the day November 1, 2012.
264142732174438400|({created_at:"Thu Nov 01 23:11:49 +0000 2012", id:264142732174438400, id_str:"264142732174438400", text:"[delicious] PLOS Collections: How th
264064117558624256|({created_at:"Thu Nov 01 17:59:26 +0000 2012", id:264064117558624260, id_str:"264064117558624256", text:"I've added a stupid basic dependency
264025607724204034|({created_at:"Thu Nov 01 15:26:24 +0000 2012", id:264025607724204030, id_str:"264025607724204034", text:"in the desert lab, checking my on-go
264013563704795136|({created_at:"Thu Nov 01 14:38:33 +0000 2012", id:264013563704795140, id_str:"264013563704795136", text:"Drawing of the day November 1, 2012.
263996436679630848|({created_at:"Thu Nov 01 13:30:29 +0000 2012", id:263996436679630850, id_str:"263996436679630848", text:"RT @RealistComics: he's tall, dark a
263966759210590208|({created_at:"Thu Nov 01 11:32:34 +0000 2012", id:263966759210590200, id_str:"263966759210590208", text:"RT @guermonprez: #Aubry Un avion nor
263946369847398402|({created_at:"Thu Nov 01 10:11:33 +0000 2012", id:263946369847398400, id_str:"263946369847398402", text:"[delicious] OVal: object validation 
263946366919790593|({created_at:"Thu Nov 01 10:11:32 +0000 2012", id:263946366919790600, id_str:"263946366919790593", text:"[delicious] MyBatis #tweet: a first 
263941020729896960|({created_at:"Thu Nov 01 09:50:17 +0000 2012", id:263941020729896960, id_str:"263941020729896960", text:"RT @josh_wills: I have never been pr
263938670187388928|({created_at:"Thu Nov 01 09:40:57 +0000 2012", id:263938670187388930, id_str:"263938670187388928", text:"RT @softmodeling @peterneubauer: Usi
263936362716200960|({created_at:"Thu Nov 01 09:31:47 +0000 2012", id:263936362716200960, id_str:"263936362716200960", text:"declined to review an article about 
263934528186351616|({created_at:"Thu Nov 01 09:24:29 +0000 2012", id:263934528186351600, id_str:"263934528186351616", text:"@figshare Thanks, ( was http://t.co/
263815846139412480|({created_at:"Thu Nov 01 01:32:53 +0000 2012", id:263815846139412480, id_str:"263815846139412480", text:"Drawing of the day October 30, 2012.
263731855919026176|({created_at:"Wed Oct 31 19:59:09 +0000 2012", id:263731855919026180, id_str:"263731855919026176", text:"[delicious] An integrated map of gen
263726281647067136|({created_at:"Wed Oct 31 19:36:59 +0000 2012", id:263726281647067140, id_str:"263726281647067136", text:"RT @bryan_howie: 1000 Genomes paper 
263695076516052992|({created_at:"Wed Oct 31 17:33:00 +0000 2012", id:263695076516053000, id_str:"263695076516052992", text:"\"Forget your Past\" ( abandoned Bul

That's it
Pierre

No comments: