14 October 2012

Calculating time from submission to publication / Degree of burden in submitting a paper

After "404 not found": a database of non-functional resources in the NAR database collection, I've uploaded my second dataset on figshare:
Calculating time from submission to publication / Degree of burden in submitting a paper
.

Calculating time from submission to publication / Degree of burden in submitting a paper. Pierre Lindenbaum,  Ryan Delahanty.
figshare.
Retrieved 10:13, Oct 14, 2012 (GMT)
http://dx.doi.org/10.6084/m9.figshare.96403

This dataset was inspired by this post on biostar, initialy asked by Ryan Delahanty: I was wondering if it would be possible to calculate some kind of a metric for the speed-of-publication for each journal. I'm not sure submitted and accepted dates are available for all papers, but I noticed in XML data there are fields like the following:
<PubmedData>
        <History>
            <PubMedPubDate PubStatus="received">
                <Year>2011</Year>
                <Month>11</Month>
                <Day>29</Day>
                <Hour>6</Hour>
                <Minute>0</Minute>
            </PubMedPubDate>
            <PubMedPubDate PubStatus="accepted">
                <Year>2011</Year>
                <Month>12</Month>
                <Day>20</Day>
                <Hour>6</Hour>
                <Minute>0</Minute>
            </PubMedPubDate>
           (...)

In this dataset, the script 'pubmed.sh" downloads the the journals from http://www.ncbi.nlm.nih.gov/books/NBK3827/table/pubmedhelp.pubmedhelptable45/ , the 'eigenfactors' from http://www.eigenfactor.org.

For each journal , It scans pubmed (starting from year=2000) and get the difference between the date[@PubStatus='received'] and the date[@PubStatus='accepted'].

titleissneigenfactordays
"Acta biochimica Polonica"0001-527X0.003996119.770935960591
"Acta biomaterialia"1742-70610.02152129.682692307692
"Acta biotheoretica"0001-53420.000844161.897058823529
"Acta cirurgica brasileira / Sociedade Brasileira para Desenvolvimento Pesquisa em Cirurgia"0102-86500.00128122.038461538462
"Acta cytologica"0001-55470.00230565.3006134969325
"Acta diabetologica"0940-54290.001851299.6
"Acta haematologica"0001-57920.002825118.654676258993
"Acta histochemica"0065-12810.002162110.471204188482
"Acta histochemica et cytochemica"0044-59910.00067781.6455696202532
"Acta neurochirurgica"0001-62680.009685204.371830985916
"Acta neuropathologica"0001-63220.02347169.7277882797732
"Acta theriologica"0001-70510.000901147.0
"Acta tropica"0001-706X0.01011196.577777777778
"Acta veterinaria Scandinavica"0044-605X0.00161282.0
"Addictive behaviors"0306-46030.017915163.049731182796
"Advances in space research "0273-11770.021217205.0
Ambio0044-74470.007463181.878048780488
"American journal of human genetics"0002-92970.12015667.1898928024502
"American journal of hypertension"0895-70610.017359104.074576271186
(....)

Here is the kind of figure I got:

As far as I remember, "Cell" is the point having the highest eigenfactor.


Note: pubmed contains some errors: e.g. received > accepted (http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=20591334&retmode=xml) or some dates in the future: ( http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=12921703&retmode=xml )


That's it,

Pierre

4 comments:

Mike Taylor said...

But note that some journals give what can most charitably be described as very misleading information on submission-to-acceptance times. See http://svpow.com/2012/10/03/dear-royal-society-please-stop-lying-to-us-about-publication-times/

Christian said...

Awesome. Any reason I don't find bioinformatics papers in the CSV file?

Pierre Lindenbaum said...

@Christian, many journals do not contain any information about the dates.

sharmanedit said...

This is really interesting, Pierre, and I commend you for putting it on figshare. It is useful to know how to extract these dates automatically and manipulate them.
However, the submitted to accepted time is not a particularly useful statistic, as it includes the time the authors take to revise (if revision is invited). More useful would be submission to first decision time (which I think would not be possible to obtain from PubMed) and acceptance to publication (which presumably would). I've written more about acceptance to publication times here: http://sharmanedit.wordpress.com/2012/06/13/acceptance-to-publication-time/
Have you thought about doing a similar analysis for acceptance to publication time? If you do, I would predict a bimodal distribution, with one peak for journals that publish very soon after acceptance before or without copyediting or typesetting, and another peak for those that edit and typeset before publication.