YOKOFAKUN: April 2011

28 April 2011

mapping people I'm following on twitter: (KML, java, geonames.org)

I wrote a java tool to map the people I'm following on twitter. This tool invokes the twitter API to fetch the profiles of my contacts and it uses the geonames web services to guess the geolocation of the places.

The source code is available on github at https://github.com/lindenb/jsandbox/blob/master/src/sandbox/TwitterToKML.java.
the build.xml is here.

Compilation

ant twitterkml
#get your twitter-id at "http://api.twitter.com/1/users/show.xml?screen_name=<your-twitter-username>"
java -jar dist/twitterkml.jar -g <geonames-id> -o result.kml <twitter-numeric-id>
##WAIT. I don't use the OAuth API, so my program waits until the 'rate limit' is enabled again.

'following'

View @yokofakun's followings date:2011-04-28 21:03:42.079 in a larger map

'followers'

View @yokofakun's followers date:2011-04-29 in a larger map

That's it,
Pierre

dbNSFP: a lightweight database of human non-synonymous SNPs and their functional predictions

People from the "Human Genetics Center" in Houston have compiled a new resource named dbNSFP and described in http://www.ncbi.nlm.nih.gov/pubmed/21520341.

Hum Mutat. 2011 Apr 21. doi:10.1002/humu.21517.
dbNSFP: a lightweight database of human non-synonymous SNPs and their functional predictions.
Liu X, Jian X, Boerwinkle E.

They have compiled the "prediction scores from four new and popular algorithms (SIFT, Polyphen2, LRT and MutationTaster), along with a conservation score (PhyloP) and other related information, for every potential NS in the human genome (a total of 75,931,005)." .

So, you don't have to send some new jobs to SIFT or Polyphen. Everything has already been calculated and joined here.

The database is available from http://sites.google.com/site/jpopgen/dbNSFP.

Downloading

lindenb@yokofakun:~$ wget "http://dl.dropbox.com/u/17001647/dbNSFP/dbNSFP.chr1-22XY.zip"
--2011-04-27 13:50:26-- http://dl.dropbox.com/u/17001647/dbNSFP/dbNSFP.chr1-22XY.zip
Proxy request sent, awaiting response... 200 OK
Length: 1200703405 (1.1G) [application/zip]
Saving to: `dbNSFP.chr1-22XY.zip'

100%[=================================================================================================================>] 1,200,703,405 1.82M/s in 10m 11s

2011-04-27 14:00:38 (1.87 MB/s) - `dbNSFP.chr1-22XY.zip' saved [1200703405/1200703405]

Content

unzip -t dbNSFP.chr1-22XY.zip
Archive: dbNSFP.chr1-22XY.zip
testing: dbNSFP.chr1 OK
testing: dbNSFP.chr10 OK
testing: dbNSFP.chr11 OK
testing: dbNSFP.chr12 OK
testing: dbNSFP.chr13 OK
testing: dbNSFP.chr14 OK
testing: dbNSFP.chr15 OK
testing: dbNSFP.chr16 OK
testing: dbNSFP.chr17 OK
testing: dbNSFP.chr18 OK
testing: dbNSFP.chr19 OK
testing: dbNSFP.chr2 OK
testing: dbNSFP.chr20 OK
testing: dbNSFP.chr21 OK
testing: dbNSFP.chr22 OK
testing: dbNSFP.chr3 OK
testing: dbNSFP.chr4 OK
testing: dbNSFP.chr5 OK
testing: dbNSFP.chr6 OK
testing: dbNSFP.chr7 OK
testing: dbNSFP.chr8 OK
testing: dbNSFP.chr9 OK
testing: dbNSFP.chrX OK
testing: dbNSFP.chrY OK

Sample (verticalized)

>>2
$1 #chr : 22
$2 pos(1-based) : 15453440
$3 ref : T
$4 alt : G
$5 aaref : M
$6 aaalt : L
$7 hg19pos(1-based) : 17073440
$8 genename : CCT8L2
$9 geneid : 150160
$10 CCDSid : CCDS13738.1
$11 refcodon : ATG
$12 codonpos : 1
$13 fold-degenerate : 0
$14 aapos : 1
$15 cds_strand : -
$16 LRT_Omega : 1.116940
$17 PhyloP_score : 0.963611
$18 PlyloP_pred : C
$19 SIFT_score : 1.0
$20 SIFT_pred : D
$21 Polyphen2_score : 0.25
$22 Polyphen2_pred : P
$23 LRT_score : 0.419288
$24 LRT_pred : U
$25 MutationTaster_score : 1.0
$26 MutationTaster_pred : D
<<2
>>3
$1 #chr : 22
$2 pos(1-based) : 15453440
$3 ref : T
$4 alt : C
$5 aaref : M
$6 aaalt : V
$7 hg19pos(1-based) : 17073440
$8 genename : CCT8L2
$9 geneid : 150160
$10 CCDSid : CCDS13738.1
$11 refcodon : ATG
$12 codonpos : 1
$13 fold-degenerate : 0
$14 aapos : 1
$15 cds_strand : -
$16 LRT_Omega : 1.116940
$17 PhyloP_score : 0.963611
$18 PlyloP_pred : C
$19 SIFT_score : 1.0
$20 SIFT_pred : D
$21 Polyphen2_score : 0.25
$22 Polyphen2_pred : P
$23 LRT_score : 0.419288
$24 LRT_pred : U
$25 MutationTaster_score : 1.0
$26 MutationTaster_pred : D
<<3

That's it,

Pierre

22 April 2011

Playing with the HTML5 File API: translating a Fasta file.

In the current post, I'm using the new HTML5 File Api. This new API can read the content of a file on the client side without needing a remote server. Let me repeat this:

YOU DO NOT NEED A SERVER
YOU DO NOT NEED TO COPY AND PASTE THE CONTENT OF THE FILE IN A TEXTAREA

.
As an example, the following code reads a whole DNA fasta file stored on your computer and translate each DNA sequence to a protein. When the user selects a new file, a FileReader object is created and a callback function translating the DNA is invoked when the fasta file has been loaded.

Test (your browser must support HTML5)

Source code

That's it,

Pierre

15 April 2011

"404 not found": An update for "bioinformatics/cabios"

Yesterday, I blogged about the persistence of the URLs present in the abstract of NAR. Today , I've updated my tool and used it to scan the abstracts of the following pubmed query: "Bioinformatics"[JOUR] or "Comput Appl Biosci"[JOUR].

Here is the result:

Year	Total	Alive	%
	18	1	5
1995	1	0	0
1996	9	3	33
1997	13	3	23
1998	86	19	22
1999	70	17	24
2000	83	25	30
2001	110	64	58
2002	121	78	64
2003	284	170	59
2004	402	257	63
2005	495	359	72
2006	374	297	79
2007	448	381	85
2008	466	415	89
2009	507	462	91
2010	605	566	93
2011	283	268	94

Again, even if we can reach a web site, it doesn't mean that the service described in an article is still available or maintained.

As suggested by Egon Willighagen, I've uploaded the RDF output of my program on figshare: http://figshare.com/figures/index.php/Bioinformatics.404_20110415.rdf.

That's it,

Pierre

14 April 2011

"404 not found": a database of non-functional resources in the NAR database collection

Today, Andra Waagmeester asked on Biostar :"NAR nicely lists all their database issues on http://www.oxfordjournals.org/nar/database/c/. Is the list also available in a downloadable format?".

I suggested to download from pubmed all the articles published in an annual issue of NAR , to extract the URLs from the abstract and to check if they were still active. I just wrote a java program doing this job (it is available on github at https://github.com/lindenb/jsandbox/blob/master/src/sandbox/NucleicAcidsResearch404.java)

A few comments:

The connection timeout was fixed to 10 seconds.
Some URLs are poorly written e.g: http://www.ncbi.nlm.nih.gov/pubmed/14681415
An abstract can contain more than one URL
There can be different URLs for the same database
getting a HTTP:404 error doesn't mean that the database has really been discontinued.
getting a status HTTP:200 doesn't mean that the database is still active and/or maintained
1155 URLs have been extracted from this pubmed query `"Nucleic Acids Res"[JOUR] "Database issue"[ISS]` (as far as I can see , this query only goes to 2004) Edit:ok, that was because NCBI eFetch is limited to 10K records