30 January 2014

Parallelizing #RStats using #make

In the current post, I'll show how to use R as the main SHELL of GNU-Make instead of using a classical linux shell like 'bash'. Why would you do this ?

  • awesomeness
  • Make-based workflow management
  • Make-based execution with --jobs. GNU make knows how to execute several recipes at once. Normally, make will execute only one recipe at a time, waiting for it to finish before executing the next. However, the '-j' or '--jobs' option tells make to execute many recipes simultaneously.
The following recipe has been tested with GNU-Make 4.0 and I'm not sure it would world with '<=3.81'.

The only problem is that R doesn't accept a multiline-argument on the command line (see http://stackoverflow.com/questions/21442674) so I created a wrapper 'mockR' that save the argument '-e "code"' into a file and pipe it into R:

(Edit1: A comment from madscientist : Re your script; you can save yourself some wear-and-tear on your disk and avoid the need for temp files and cleanup by just piping the input directly: echo "$R" | R --vanilla --no-readline --quiet . Just a thought. ")

(Edit2: the exit value of 'R' should also be returned by 'mockR'.)

This file is set as executable:
$ chmod u+x ./mockR
In the makefile, we tell 'make' to use 'mockR' instead of '/usr/bin/sh':
SHELL = ./mockR
The R code will be passed to 'mockR' using the argument '-e "code"'
.SHELLFLAGS= -e
We also set 'ONESHELL': "If .ONESHELL is mentioned as a target, then when a target is built all lines of the recipe will be given to a single invocation of the shell rather than each line being invoked separately"
.ONESHELL:

Example 1

We download the table 'knownGene' from the UCSC and we plot a pdf file 'countExons=f(txStart)'. Please, note that the targets are created using some R statements, NOT bash statements:

Now Invoke make


Example 2

Using a the eval and the call function we can make the previous 'Makefile' applicable for all the chromosomes:

Now Invoke make USING TRHEE PARALLEL JOBS





You can now watch the final pdf files:




That's it,
Pierre

Mapping the UCSC/Web-Sequences to a world map.

People at the UCSC have recently released a new track for the GenomeBrowser


"We're pleased to announce the release of the Web Sequences track on the UCSC Genome Browser. This track, produced in collaboration with Microsoft Research, contains the results of a 30-day scan for DNA sequences from over 40 billion different webpages. The sequences were then mapped with Blat to the human genome (...) The data were extracted from a variety of sources including patents, online textbooks, help forums, and any other webpages that contain DNA sequence. In essence, this track displays the Blat alignments of nearly every DNA sequence on the internet!"

I've mapped each genomic location from this track to a country and generated the following (unreadable) picture:

How this picture was generated

  • I've downloaded the data from the UCSC using the Table browser. The data look like this:
    #bin    chrom   chromStart      chromEnd        name    score   strand  thickStart      thickEnd        reserved        blockCount      blockSizes      chromStarts     tSeqTypes     seqIds  seqRanges       publisher       pmid    doi     issn    journal title   firstAuthor     year    impact  classes locus
    585     chr1    14789   15004   3500336380      75              14789   15004   8421504 2       40,35   0,180   g       350033638000000000      0-75                          Tophat, Cufflinks and replicates - Page 2 - SEQanswers  seqanswers.com  0       0               WASH2P,WASH7P
    585     chr1    15017   15590   3500327042      381             15017   15590   8421504 2       326,55  0,518   g       350032704200000008      0-747                        Research Technologies at Indiana University      biomedapp.iu.edu        0       0               WASH7P
    585     chr1    68858   68895   3500020489      37              68858   68895   8421504 1       37      0       g       350002048900000000,350002048900000001   0-36,0-36    Genome mapability - Musings from a PhD candidate davetang.org    0       0               OR4F5
    585     chr1    69170   69479   3500359797      142             69170   69479   8421504 2       76,66   0,243   c       350035979700000000,350035979700000002   0-76,10-76    CRAM compression and TLEN SAM's field - SEQanswers      seqanswers.com  0       0               OR4F5
    585     chr1    70013   70230   3500427570      150             70013   70230   8421504 2       75,75   0,142   g       350042757000000000,350042757000000001   0-75,0-75     Inconsistency with SAM flag output? - SEQanswers        seqanswers.com  0       0               OR4F5
    585     chr1    98860   98888   3500207083      26              98860   98888   8421504 3       5,7,14  0,6,14  g       350020708300000108,350020708300000060,350020708300000239      0-24,0-21,0-21                                          Method For The Simultaneous Determination Of Blood Group And Platelet Antigen Genotypes         .freshpatents.com     0       0               OR4F5
    586     chr1    137603  138008  3500170315      405             137603  138008  8421504 1       405     0       p       350017031500015076,350017031500015074   0-135,0-270  Balding D. (2007) Handbook of Statistical Genetics       www.scribd.com  0       0               OR4F5
    586     chr1    139485  143008  3500419332      1794            139485  143008  8421504 2       65,1729 0,1794  g       350041933200000004,350041933200000000,350041933200000001,350041933200000002,350041933200000003        0-1263,0-1859,0-1852,0-1860,0-576                                               PPT  Evolution by Genome Duplication PowerPoint presentation | free to view   www.powershow.com       0       0               OR4F5
    586     chr1    141535  143008  3500270480      1372            141535  143008  8421504 24      57,60,58,59,61,59,60,59,59,62,61,58,60,58,16,59,59,59,59,57,57,59,58,58 0,61,125,187,250,314,377,441,503,566,631,695,756,819,881,919,981,1044,1107,1170,1230,1291,1353,1415   g       350027048000000003,350027048000000002   0-902,0-525                  Chen-Kung Chou 3-22-2004 www.dls.ym.edu.tw       0       0               OR4F5
  • I want to generate BED file: 'chrom/start/end/country'. The 23rd column contains the URL of the web-sequence. I use the domain of the URL to try to guess the country. The following awk script was used to generate the file:
    BEGIN   {
            FS="[\t]";
            }
    
            {
            country=$23;
            for(;;)
                    {
                    slash=index(country,"/");
                    if(slash==0) break;
                    country=substr(country,1,slash-1);
                    }
            for(;;)
                    {
                    colon=index(country,":");
                    if(colon==0) break;
                    country=substr(country,1,colon-1);
                    }
            if( country ~ /\.$/ ) next;
            if( country ~ /\.com$/ ) next;
            if( country ~ /\.org$/ ) next;
            if( country ~ /\.cat$/ ) next;
            if( country ~ /\.net$/ ) next;
            if( country ~ /\.gov$/ ) next;
            if( country ~ /\.edu$/ ) next;
            if( country ~ /\.name$/ ) next;
            if( country ~ /\.info$/ ) next;
            if( country ~ /\.biz$/ ) next;
            if( country ~ /\.[0-9]+$/ ) next;
            if( index(country,".")==0) next;
            if( index(country," ")!=0) next;
            for(;;)
                    {
                    dot=index(country,".");
                    if(dot==0) break;
                    country=substr(country,dot+1);
                    }
    
                    if( country== "af") {country="afghanistan";}
                    else if( country== "ax") {country="Ålandislands";}
                    else if( country== "al") {country="albania";}
                    else if( country== "dz") {country="algeria";}
                    else if( country== "as") {country="americansamoa";}
                    else if( country== "ad") {country="andorra";}
                    else if( country== "ao") {country="angola";}
                    else if( country== "ai") {country="anguilla";}
                    else if( country== "aq") {country="antarctica";}
                    else if( country== "ag") {country="antiguaandbarbuda";}
                    else if( country== "ar") {country="argentina";}
                    else if( country== "am") {country="armenia";}
                    else if( country== "aw") {country="aruba";}
                    else if( country== "au") {country="australia";}
                    else if( country== "at") {country="austria";}
                    else if( country== "az") {country="azerbaijan";}
                    else if( country== "bs") {country="bahamas";}
                    else if( country== "bh") {country="bahrain";}
                    else if( country== "bd") {country="bangladesh";}
                    else if( country== "bb") {country="barbados";}
                    else if( country== "by") {country="belarus";}
                    else if( country== "be") {country="belgium";}
                    else if( country== "bz") {country="belize";}
                    else if( country== "bj") {country="benin";}
                    else if( country== "bm") {country="bermuda";}
                    else if( country== "bt") {country="bhutan";}
                    else if( country== "bo") {country="bolivia,plurinationalstateof";}
                    else if( country== "bq") {country="bonaire,sinteustatiusandsaba";}
                    else if( country== "ba") {country="bosniaandherzegovina";}
                    else if( country== "bw") {country="botswana";}
                    else if( country== "bv") {country="bouvetisland";}
                    else if( country== "br") {country="brazil";}
                    else if( country== "io") {country="britishindianoceanterritory";}
                    else if( country== "bn") {country="bruneidarussalam";}
                    else if( country== "bg") {country="bulgaria";}
                    else if( country== "bf") {country="burkinafaso";}
                    else if( country== "bi") {country="burundi";}
                    else if( country== "kh") {country="cambodia";}
                    else if( country== "cm") {country="cameroon";}
                    else if( country== "ca") {country="canada";}
                    else if( country== "cv") {country="capeverde";}
                    else if( country== "ky") {country="caymanislands";}
                    else if( country== "cf") {country="centralafricanrepublic";}
                    else if( country== "td") {country="chad";}
                    else if( country== "cl") {country="chile";}
                    else if( country== "cn") {country="china";}
                    else if( country== "cx") {country="christmasisland";}
                    else if( country== "cc") {country="cocos(keeling)islands";}
                    else if( country== "co") {country="colombia";}
                    else if( country== "km") {country="comoros";}
                    else if( country== "cg") {country="congo";}
                    else if( country== "cd") {country="congo,thedemocraticrepublicofthe";}
                    else if( country== "ck") {country="cookislands";}
                    else if( country== "cr") {country="costarica";}
                    else if( country== "ci") {country="cÔted'ivoire";}
                    else if( country== "hr") {country="croatia";}
                    else if( country== "cu") {country="cuba";}
                    else if( country== "cw") {country="curaÇao";}
                    else if( country== "cy") {country="cyprus";}
                    else if( country== "cz") {country="czechrepublic";}
                    else if( country== "dk") {country="denmark";}
                    else if( country== "dj") {country="djibouti";}
                    else if( country== "dm") {country="dominica";}
                    else if( country== "do") {country="dominicanrepublic";}
                    else if( country== "ec") {country="ecuador";}
                    else if( country== "eg") {country="egypt";}
                    else if( country== "sv") {country="elsalvador";}
                    else if( country== "gq") {country="equatorialguinea";}
                    else if( country== "er") {country="eritrea";}
                    else if( country== "ee") {country="estonia";}
                    else if( country== "et") {country="ethiopia";}
                    else if( country== "fk") {country="falklandislands(malvinas)";}
                    else if( country== "fo") {country="faroeislands";}
                    else if( country== "fj") {country="fiji";}
                    else if( country== "fi") {country="finland";}
                    else if( country== "fr") {country="france";}
                    else if( country== "gf") {country="frenchguiana";}
                    else if( country== "pf") {country="frenchpolynesia";}
                    else if( country== "tf") {country="frenchsouthernterritories";}
                    else if( country== "ga") {country="gabon";}
                    else if( country== "gm") {country="gambia";}
                    else if( country== "ge") {country="georgia";}
                    else if( country== "de") {country="germany";}
                    else if( country== "gh") {country="ghana";}
                    else if( country== "gi") {country="gibraltar";}
                    else if( country== "gr") {country="greece";}
                    else if( country== "gl") {country="greenland";}
                    else if( country== "gd") {country="grenada";}
                    else if( country== "gp") {country="guadeloupe";}
                    else if( country== "gu") {country="guam";}
                    else if( country== "gt") {country="guatemala";}
                    else if( country== "gg") {country="guernsey";}
                    else if( country== "gn") {country="guinea";}
                    else if( country== "gw") {country="guinea-bissau";}
                    else if( country== "gy") {country="guyana";}
                    else if( country== "ht") {country="haiti";}
                    else if( country== "hm") {country="heardislandandmcdonaldislands";}
                    else if( country== "va") {country="holysee(vaticancitystate)";}
                    else if( country== "hn") {country="honduras";}
                    else if( country== "hk") {country="china";}
                    else if( country== "hu") {country="hungary";}
                    else if( country== "is") {country="iceland";}
                    else if( country== "in") {country="india";}
                    else if( country== "id") {country="indonesia";}
                    else if( country== "ir") {country="iran";}
                    else if( country== "iq") {country="iraq";}
                    else if( country== "ie") {country="ireland";}
                    else if( country== "im") {country="isleofman";}
                    else if( country== "il") {country="israel";}
                    else if( country== "it") {country="italy";}
                    else if( country== "jm") {country="jamaica";}
                    else if( country== "jp") {country="japan";}
                    else if( country== "je") {country="jersey";}
                    else if( country== "jo") {country="jordan";}
                    else if( country== "kz") {country="kazakhstan";}
                    else if( country== "ke") {country="kenya";}
                    else if( country== "ki") {country="kiribati";}
                    else if( country== "kp") {country="northkorea";}
                    else if( country== "kr") {country="southkorea";}
                    else if( country== "kw") {country="kuwait";}
                    else if( country== "kg") {country="kyrgyzstan";}
                    else if( country== "la") {country="laopeople'sdemocraticrepublic";}
                    else if( country== "lv") {country="latvia";}
                    else if( country== "lb") {country="lebanon";}
                    else if( country== "ls") {country="lesotho";}
                    else if( country== "lr") {country="liberia";}
                    else if( country== "ly") {country="libya";}
                    else if( country== "li") {country="liechtenstein";}
                    else if( country== "lt") {country="lithuania";}
                    else if( country== "lu") {country="luxembourg";}
                    else if( country== "mo") {country="macao";}
                    else if( country== "mk") {country="macedonia,theformeryugoslavrepublicof";}
                    else if( country== "mg") {country="madagascar";}
                    else if( country== "mw") {country="malawi";}
                    else if( country== "my") {country="malaysia";}
                    else if( country== "mv") {country="maldives";}
                    else if( country== "ml") {country="mali";}
                    else if( country== "mt") {country="malta";}
                    else if( country== "mh") {country="marshallislands";}
                    else if( country== "mq") {country="martinique";}
                    else if( country== "mr") {country="mauritania";}
                    else if( country== "mu") {country="mauritius";}
                    else if( country== "yt") {country="mayotte";}
                    else if( country== "mx") {country="mexico";}
                    else if( country== "fm") {country="micronesia,federatedstatesof";}
                    else if( country== "md") {country="moldova,republicof";}
                    else if( country== "mc") {country="monaco";}
                    else if( country== "mn") {country="mongolia";}
                    else if( country== "me") {country="montenegro";}
                    else if( country== "ms") {country="montserrat";}
                    else if( country== "ma") {country="morocco";}
                    else if( country== "mz") {country="mozambique";}
                    else if( country== "mm") {country="myanmar";}
                    else if( country== "na") {country="namibia";}
                    else if( country== "nr") {country="nauru";}
                    else if( country== "np") {country="nepal";}
                    else if( country== "nl") {country="netherlands";}
                    else if( country== "nc") {country="newcaledonia";}
                    else if( country== "nz") {country="newzealand";}
                    else if( country== "ni") {country="nicaragua";}
                    else if( country== "ne") {country="niger";}
                    else if( country== "ng") {country="nigeria";}
                    else if( country== "nu") {country="niue";}
                    else if( country== "nf") {country="norfolkisland";}
                    else if( country== "mp") {country="northernmarianaislands";}
                    else if( country== "no") {country="norway";}
                    else if( country== "om") {country="oman";}
                    else if( country== "pk") {country="pakistan";}
                    else if( country== "pw") {country="palau";}
                    else if( country== "ps") {country="palestine,stateof";}
                    else if( country== "pa") {country="panama";}
                    else if( country== "pg") {country="papuanewguinea";}
                    else if( country== "py") {country="paraguay";}
                    else if( country== "pe") {country="peru";}
                    else if( country== "ph") {country="philippines";}
                    else if( country== "pn") {country="pitcairn";}
                    else if( country== "pl") {country="poland";}
                    else if( country== "pt") {country="portugal";}
                    else if( country== "pr") {country="puertorico";}
                    else if( country== "qa") {country="qatar";}
                    else if( country== "re") {country="france";}
                    else if( country== "ro") {country="romania";}
                    else if( country== "ru") {country="russia";}
                    else if( country== "rw") {country="rwanda";}
                    else if( country== "bl") {country="saintbarthÉlemy";}
                    else if( country== "sh") {country="sainthelena,ascensionandtristandacunha";}
                    else if( country== "kn") {country="saintkittsandnevis";}
                    else if( country== "lc") {country="saintlucia";}
                    else if( country== "mf") {country="saintmartin(frenchpart)";}
                    else if( country== "pm") {country="saintpierreandmiquelon";}
                    else if( country== "vc") {country="saintvincentandthegrenadines";}
                    else if( country== "ws") {country="samoa";}
                    else if( country== "sm") {country="sanmarino";}
                    else if( country== "st") {country="saotomeandprincipe";}
                    else if( country== "sa") {country="saudiarabia";}
                    else if( country== "sn") {country="senegal";}
                    else if( country== "rs") {country="serbia";}
                    else if( country== "sc") {country="seychelles";}
                    else if( country== "sl") {country="sierraleone";}
                    else if( country== "sg") {country="singapore";}
                    else if( country== "sx") {country="sintmaarten(dutchpart)";}
                    else if( country== "sk") {country="slovakia";}
                    else if( country== "si") {country="slovenia";}
                    else if( country== "sb") {country="solomonislands";}
                    else if( country== "so") {country="somalia";}
                    else if( country== "za") {country="southafrica";}
                    else if( country== "gs") {country="southgeorgiaandthesouthsandwichislands";}
                    else if( country== "ss") {country="southsudan";}
                    else if( country== "es") {country="spain";}
                    else if( country== "lk") {country="srilanka";}
                    else if( country== "sd") {country="sudan";}
                    else if( country== "sr") {country="suriname";}
                    else if( country== "sj") {country="svalbardandjanmayen";}
                    else if( country== "sz") {country="swaziland";}
                    else if( country== "se") {country="sweden";}
                    else if( country== "ch") {country="switzerland";}
                    else if( country== "sy") {country="syrianarabrepublic";}
                    else if( country== "tw") {country="taiwan";}
                    else if( country== "tj") {country="tajikistan";}
                    else if( country== "tz") {country="tanzania";}
                    else if( country== "th") {country="thailand";}
                    else if( country== "tl") {country="timor-leste";}
                    else if( country== "tg") {country="togo";}
                    else if( country== "tk") {country="tokelau";}
                    else if( country== "to") {country="tonga";}
                    else if( country== "tt") {country="trinidadandtobago";}
                    else if( country== "tn") {country="tunisia";}
                    else if( country== "tr") {country="turkey";}
                    else if( country== "tm") {country="turkmenistan";}
                    else if( country== "tc") {country="turksandcaicosislands";}
                    else if( country== "tv") {country="tuvalu";}
                    else if( country== "ug") {country="uganda";}
                    else if( country== "ua") {country="ukraine";}
                    else if( country== "ae") {country="unitedarabemirates";}
                    else if( country== "gb") {country="unitedkingdom";}
                    else if( country== "uk") {country="unitedkingdom";}
                    else if( country== "us") {country="USA";}
                    else if( country== "um") {country="unitedstatesminoroutlyingislands";}
                    else if( country== "uy") {country="uruguay";}
                    else if( country== "uz") {country="uzbekistan";}
                    else if( country== "vu") {country="vanuatu";}
                    else if( country== "ve") {country="venezuela";}
                    else if( country== "vn") {country="vietnam";}
                    else if( country== "vg") {country="virginislands,british";}
                    else if( country== "vi") {country="virginislands,u.s.";}
                    else if( country== "wf") {country="wallisandfutuna";}
                    else if( country== "eh") {country="westernsahara";}
                    else if( country== "ye") {country="yemen";}
                    else if( country== "zm") {country="zambia";}
                    else if( country== "zw") {country="zimbabwe";}
                    else { next;}
    
            printf("%s\t%s\t%s\t%s\n",$2,$3,$4,country);
            }
    
  • For the world map, I've used a SVG-vectorial map from wikipedia: https://commons.wikimedia.org/wiki/File:World_V2.0.svg.

    The coordinates of the boundaries of each country are defined in a SVG 'path' element:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 20010904//EN" "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/
    <svg xmlns="http://www.w3.org/2000/svg" width="8.88889in" height="4.44444in" viewBox="0 0 800 400">
      <path id="Taiwan" fill="none" stroke="black" stroke-width="1" d="M 668.85,151.22  C 668.98,150.71 ...
      <path id="Estonia" fill="none" stroke="black" stroke-width="1" d="M 460.75,68.26  C 459.95,68.11 4 ...
      <path id="Latvia" fill="none" stroke="black" stroke-width="1" d="M 461.23,72.27  C 460.75,72.20 46 ...
      <path id="Lithuania" fill="none" stroke="black" stroke-width="1" d="M 452.39,79.42  C 452.67,79.72 ...
      <path id="Byelarus" fill="none" stroke="black" stroke-width="1" d="M 453.57,81.92  C 453.87,82.37  ...
      <path id="Ukraine" fill="none" stroke="black" stroke-width="1" d="M 453.09,85.95  C 453.43,86.30 4 ...
      <path id="Moldova" fill="none" stroke="black" stroke-width="1" d="M 460.57,93.00  C 461.00,93.70 4 ...
      <path id="Syria" fill="none" stroke="black" stroke-width="1" d="M 480.33,127.61  C 481.03,127.90 4 ...
      <path id="Turkey" fill="none" stroke="black" stroke-width="1" d="M 499.47,116.91  C 499.31,116.32  ...
      <path id="Kuwait" fill="none" stroke="black" stroke-width="1" d="M 505.26,133.84  C 504.97,134.56  ...
      <path id="Saudi Arabia" fill="none" stroke="black" stroke-width="1" d="M 495.83,163.75  C 496.31,1 ...
      <path id="United Arab Emirates" fill="none" stroke="black" stroke-width="1" d="M 516.03,150.12  C  ...
      <path id="Yemen" fill="none" stroke="black" stroke-width="1" d="M 517.68,161.79  C 517.01,160.50 5 ...
      <path id="Slovenia" fill="none" stroke="black" stroke-width="1" d="M 430.55,97.07  C 429.62,97.36  ...
      <path id="Croatia" fill="none" stroke="black" stroke-width="1" d="M 439.09,103.97  C 439.01,103.46 ...
      <path id="Bosnia and Herzegovina" fill="none" stroke="black" stroke-width="1" d="M 440.44,105.02   ...
    (...)
  • I've joined the data using a custom java program (available on github at: https://github.com/lindenb/jvarkit/wiki/WorldMapGenome ). The program transforms the 'path' elements to a GeneralPath
    $  cat map.bed |\
         java -jar dist/worldmapgenome.jar \
         -u World_V2.0.svg \
         -w 2000 -o ~/ouput.jpg \
         -R hg19.fasta
That's it,
Pierre

20 December 2013

(video) #renabigo2013 "Make & Bioinformatics:everything but #usegalaxy"

(In French): "Mon Make à moi": parallélisations, workflows et pipelines pour le NGS, tout sauf Galaxy [34:50 mn]

Les workflows en Bio-Informatique

Centre Inria de Rennes - IRISA

29 Novembre 2013

12 December 2013

Inside Jvarkit: view BAM, cut, stats, head, tail, shuffle, downsample, group-by-gene VCFs...

Here are a few tools I recently wrote (and reinvented) for Jvarkit.

BamViewGui
a simple java-Swing-based BAM viewer.
VcfShuffle
Shuffle a VCF.
GroupByGene
Group VCF data by Gene
$ curl -s -k "https://raw.github.com/arq5x/gemini/master/test/test4.vep.snpeff.vcf" |\
java -jar dist/groupbygene.jar |\
head | column  -t

#chrom  min.POS    max.POS    gene.name  gene.type         samples.affected  count.variations  M10475  M10478  M10500  M128215
chr10   52004315   52004315   ASAH2      snpeff-gene-name  2                 1                 0       0       1       1
chr10   52004315   52004315   ASAH2      vep-gene-name     2                 1                 0       0       1       1
chr10   52497529   52497529   ASAH2B     snpeff-gene-name  2                 1                 0       1       1       0
chr10   52497529   52497529   ASAH2B     vep-gene-name     2                 1                 0       1       1       0
chr10   48003992   48003992   ASAH2C     snpeff-gene-name  3                 1                 1       1       1       0
chr10   48003992   48003992   ASAH2C     vep-gene-name     3                 1                 1       1       1       0
chr10   126678092  126678092  CTBP2      snpeff-gene-name  1                 1                 0       0       0       1
chr10   126678092  126678092  CTBP2      vep-gene-name     1                 1                 0       0       0       1
chr10   135336656  135369532  CYP2E1     snpeff-gene-name  3                 2                 0       2       1       1
DownSampleVcf
Down sample a VCF.
VcfHead
Print the first variants of a VCF.
VcfTail
Print the last variants of a VCF
VcfCutSamples
Select/Exclude some samples from a VCF
VcfStats>
Generate some statistics from a VCF. The ouput is a XML file that can be processed with xslt.
$ curl  "https://raw.github.com/arq5x/gemini/master/test/test4.vep.snpeff.vcf" |\
  java -jar dist/vcfstats.jar |\
  xmllint --format -

<?xml version="1.0" encoding="UTF-8"?>
<vcf-statistics version="314bf88924a4003e6d6189ad3280d8b4df485aa1" input="stdin" date="Thu Dec 12 16:20:14 CET 2013">
  <section name="General">
    <statistics name="general" description="general">
      <counts name="general" description="General" keytype="string">
        <property key="num.dictionary.chromosomes">93<
         (...)

That's it,

Pierre

20 November 2013

Inside Jvarkit: Shrink your fastqs by 40% by aligning them to REF before compression.

BamToFastq is an implementation of https://twitter.com/DNAntonie/status/402909852277932032"




Example : piping bwa mem


$ bwa mem -M  human_g1k_v37.fasta  Sample1_L001_R1_001.fastq.gz Sample2_S5_L001_R2_001.fastq.gz |\
  java -jar dist/bam2fastq.jar  -F tmpR1.fastq.gz -R tmpR2.fastq.gz
before:
$ ls -lah Sample1_L001_R1_001.fastq.gz Sample2_S5_L001_R2_001.fastq.gz
-rw-r--r-- 1 lindenb lindenb 181M Jun 14 15:20 Sample1_L001_R1_001.fastq.gz
-rw-r--r-- 1 lindenb lindenb 190M Jun 14 15:20 Sample1_L001_R2_001.fastq.gz

after (these are Haloplex Data, with a lot of duplicates )

$ ls -lah tmpR1.fastq.gz  tmpR2.fastq.gz
-rw-rw-r-- 1 lindenb lindenb  96M Nov 20 17:10 tmpR1.fastq.gz
-rw-rw-r-- 1 lindenb lindenb 106M Nov 20 17:10 tmpR2.fastq.gz

using BZ2:

$  ls -lah *.bz2
-rw-rw-r-- 1 lindenb lindenb 77M Nov 20 17:55 tmpR1.fastq.bz2
-rw-rw-r-- 1 lindenb lindenb 87M Nov 20 17:55 tmpR2.fastq.bz2

That's it
Pierre

30 October 2013

GNU Make: saving the versions of the tools using 'order-only-prerequisites' : my notebook

Rule 3 of "Ten Simple Rules for Reproducible Computational Research". is
:

Archive the Exact Versions of All External Programs Used
.
I work with Makefile-based workflows: how can I save the version of each software used when I invoke 'make', whatever the target is ? A naive solution is to add a dependency to each target. For example, the following makefile takes a simple SAM file, convert it to BAM, sort and index. For each target, I've added a dependency named "dump_params" that append the version of samtools to a file "config.txt".

But that solution doesn't work because make re-builds all targets even if the top target already exists.
$ make

date << config.txt && \
 echo -n "Samtools " << config.txt && \
 samtools  2<&1 | grep Version << config.txt
samtools view -Sb samtools-0.1.18/examples/toy.sam < unsorted.bam
[samopen] SAM header is present: 2 sequences.
samtools sort unsorted.bam sorted
samtools index sorted.bam


$ make

date << config.txt && \
 echo -n "Samtools " << config.txt && \
 samtools  2<&1 | grep Version << config.txt
samtools view -Sb samtools-0.1.18/examples/toy.sam < unsorted.bam
[samopen] SAM header is present: 2 sequences.
samtools sort unsorted.bam sorted
samtools index sorted.bam

The solution I got via Stackoverflow is to use a order-only-prerequisites: "Order-only prerequisites can be specified by placing a pipe symbol (|) in the prerequisites list: any prerequisites to the left of the pipe symbol are normal; any prerequisites to the right are order-only... (...) Note that if you declare the same file to be both a normal and an order-only prerequisite, the normal prerequisite takes precedence (...)". The makefile with the 'order-only-prerequisites' is now:




And that works ! the final target is generated only once, but the file 'config.txt' is always generated.
$ make
date << config.txt && \
 echo -n "Samtools " << config.txt && \
 samtools  2<&1 | grep Version << config.txt
samtools view -Sb samtools-0.1.18/examples/toy.sam < unsorted.bam
[samopen] SAM header is present: 2 sequences.
samtools sort unsorted.bam sorted
samtools index sorted.bam

$ make
date << config.txt && \
 echo -n "Samtools " << config.txt && \
 samtools  2<&1 | grep Version << config.txt

$ make
date << config.txt && \
 echo -n "Samtools " << config.txt && \
 samtools  2<&1 | grep Version << config.txt
That's it,
Pierre

Update :another solution

Citing MadScientist's answer on stackoverflow : Another option is to use immediately expanded shell functions, like:
__dummy := $(shell echo "Makefile was run." >> config.txt)
Since it's immediately expanded the shell script will be invoked once, as the makefile is read in. There's no need to define a dump_params target or include it as a prerequisite. This is more old-school, but has the advantage that it will run for every invocation of make, without having to go through and ensure every target has the proper order-only prerequisite defined.




25 October 2013

YES: "Choice of transcripts and software has a large effect on variant annotation"

This post was inspired by Aaron Quinlan's tweet:


Here is an example of a missense mutation found with VCFPredictions, a simple tool I wrote for variant effect prediction.

#CHROM POS ID ALT REF 
1 23710805 rs605468 A G
my tool uses the UCSC knownGene track, here is the context of the variant in the UCSC genome browser. There is one exon for TCEA3 (uc021oig.1) on the reverse strand.


If the base at 23710805 is changed from 'A'→'G' on the forward strand, there will be a non-synonymous variation Y(TAT)→H(CAT) on the reverse strand.


At the NCBI rs605468 is said to be " located in the intron region of NM_003196.1."

VEP cannot find this missense variation:

Uploaded Variation Location Allele Gene Feature Feature type Consequence Position in cDNA Position in CDS Position in protein Amino acid change Codon change Co-located Variation Extra
rs605468 1:23710805 G - ENSR00001518296 Transcript regulatory_region_variant - - - - - rs605468 GMAF=A:0.1204
rs605468 1:23710805 G ENSG00000204219 ENST00000476978 Transcript intron_variant - - - - - rs605468 GMAF=A:0.1204
rs605468 1:23710805 G ENSG00000204219 ENST00000450454 Transcript intron_variant - - - - - rs605468 GMAF=A:0.1204

(of course, my tool doesn't find some variations found in VEP too)

That's it,
Pierre

24 October 2013

PubMed Commons & Bioinformatics: a call for action


NCBI pubmed Commons/@PubMedCommons is a new system that enables researchers to share their opinions about scientific publications. Researchers can comment on any publication indexed by PubMed, and read the comments of others.
Now that we can add some comments to the papers in pubmed, I suggest to flag the articles to mark the deprecated softwares, databases, hyperlinks using a simple controlled syntax. Here are a few examples: the line starts with '!MOVED' or '!NOTFOUND' and is followed by a URL or/and a list of PMID or/and a quoted comment.

Examples

!MOVED: for http://www.ncbi.nlm.nih.gov/pubmed/8392714 (Rebase/1993) to http://www.ncbi.nlm.nih.gov/pubmed/19846593 (Rebase/2010)
!MOVED PMID:19846593 "A more recent version" 
In http://www.ncbi.nlm.nih.gov/pubmed/19919682 the URL has moved to http://biogps.org.
!MOVED <http://biogps.org> 
I moved the sources of http://www.ncbi.nlm.nih.gov/pubmed/9682060 to github
!MOVED <https://github.com/lindenb/cloneit/tree/master/c> 
!NOTFOUND: for http://www.ncbi.nlm.nih.gov/pubmed/9545460 ( Biocat EXCEL template ) url http://www.ebi.ac.uk/biocat/biocat.html returns a 404.
!NOTFOUND "The URL http://www.ebi.ac.uk/biocat/biocat.html was not found " 


That's it,

Pierre

23 October 2013

Inside the variation toolkit: Generating a structured document describing an Illumina directory.

I wrote a tool named "Illuminadir" : it creates a structured (JSON or XML) representation of a directory containing some Illumina FASTQs (I only tested it with HiSeq , paired end-data and indexes).

Motivation

Illuminadir scans folders , search for FASTQs and generate a structured summary of the files (xml or json).
Currently only tested with HiSeq data having an index

Compilation

See also Compilation

ant illuminadir

Options

Option Description
IN=File root directories This option may be specified 1 or more times.
JSON=Boolean json output Default value: false. Ouput, could be used with jsvelocity https://github.com/lindenb/jsvelocity

Example

$ java  -jar dist/illuminadir.jar \
	I=dir1 \
	I=dir2 | xsltproc xml2script.xslt > script.bash
(...)

XML output

The XML ouput looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<illumina>
  <!--com.github.lindenb.jvarkit.tools.misc.IlluminaDirectory IN=[RUN62_XFC2DM8ACXX/data]    JSON=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false-->
  <directory path="RUN62_XFC2DM8ACXX/data">
    <samples>
      <sample name="SAMPLE1">
        <pair md5="cd4b436ce7aff4cf669d282c6d9a7899" lane="8" index="ATCACG" split="2">
          <fastq md5filename="3369c3457d6603f06379b654cb78e696" side="1" path="RUN62_XFC2DM8ACXX/data/OUT/Sample_SAMPLE1/SAMPLE1_ATCACG_L008_R1_002.fastq.gz" file-size="359046311"/>
          <fastq md5filename="832039fa00b5f40108848e48eb437e0b" side="2" path="RUN62_XFC2DM8ACXX/data/OUT/Sample_SAMPLE1/SAMPLE1_ATCACG_L008_R2_002.fastq.gz" file-size="359659451"/>
        </pair>
        <pair md5="b3050fa3307e63ab9790b0e263c5d240" lane="8" index="ATCACG" split="3">
          <fastq md5filename="091727bb6b300e463c3d708e157436ab" side="1" path="RUN62_XFC2DM8ACXX/data/OUT/Sample_SAMPLE1/SAMPLE1_ATCACG_L008_R1_003.fastq.gz" file-size="206660736"/>
          <fastq md5filename="20235ef4ec8845515beb4e13da34b5d3" side="2" path="RUN62_XFC2DM8ACXX/data/OUT/Sample_SAMPLE1/SAMPLE1_ATCACG_L008_R2_003.fastq.gz" file-size="206715143"/>
        </pair>
        <pair md5="9f7ee49e87d01610372c43ab928939f6" lane="8" index="ATCACG" split="1">
          <fastq md5filename="54cb2fd33edd5c2e787287ccf1595952" side="1" path="RUN62_XFC2DM8ACXX/data/OUT/Sample_SAMPLE1/SAMPLE1_ATCACG_L008_R1_001.fastq.gz" file-size="354530831"/>
          <fastq md5filename="e937cbdf32020074e50d3332c67cf6b3" side="2" path="RUN62_XFC2DM8ACXX/data/OUT/Sample_SAMPLE1/SAMPLE1_ATCACG_L008_R2_001.fastq.gz" file-size="356908963"/>
        </pair>
        <pair md5="0697846a504158eef523c0f4ede85288" lane="7" index="ATCACG" split="2">
          <fastq md5filename="6fb35d130efae4dcfa79260281504aa3" side="1" path="RUN62_XFC2DM8ACXX/data/OUT/Sample_SAMPLE1/SAMPLE1_ATCACG_L007_R1_002.fastq.gz" file-size="357120615"/>
(...)
      <pair md5="634cbb29ca64604174963a4fff09f37a" lane="7" split="1">
        <fastq md5filename="bc0b283a58946fd75a95b330e0aefdc8" side="1" path="RUN62_XFC2DM8ACXX/data/Undetermined_indices/Sample_lane7/lane7_Undetermined_L007_R1_001.fastq.gz" file-size="371063045"/>
        <fastq md5filename="9eab26c5b593d50d642399d172a11835" side="2" path="RUN62_XFC2DM8ACXX/data/Undetermined_indices/Sample_lane7/lane7_Undetermined_L007_R2_001.fastq.gz" file-size="372221753"/>
      </pair>
      <pair md5="bf31099075d6c3c7ea052b8038cb4a03" lane="8" split="2">
        <fastq md5filename="f229389da36a3efc20888bffdec09b80" side="1" path="RUN62_XFC2DM8ACXX/data/Undetermined_indices/Sample_lane8/lane8_Undetermined_L008_R1_002.fastq.gz" file-size="374331268"/>
        <fastq md5filename="417fd9f28d24f63ce0d0808d97543315" side="2" path="RUN62_XFC2DM8ACXX/data/Undetermined_indices/Sample_lane8/lane8_Undetermined_L008_R2_002.fastq.gz" file-size="372181102"/>
      </pair>
      <pair md5="95cab850b0608c53e8c83b25cfdb3b2b" lane="8" split="3">
        <fastq md5filename="23f5be8a962697f50e2a271394242e2f" side="1" path="RUN62_XFC2DM8ACXX/data/Undetermined_indices/Sample_lane8/lane8_Undetermined_L008_R1_003.fastq.gz" file-size="60303589"/>
        <fastq md5filename="3f39f212c36d0aa884b81649ad56630c" side="2" path="RUN62_XFC2DM8ACXX/data/Undetermined_indices/Sample_lane8/lane8_Undetermined_L008_R2_003.fastq.gz" file-size="59123627"/>
      </pair>
      <pair md5="ab108b1dda7df86f33f375367b86bfe4" lane="8" split="1">
        <fastq md5filename="14f8281cf7d1a53d29cd03cb53a45b4a" side="1" path="RUN62_XFC2DM8ACXX/data/Undetermined_indices/Sample_lane8/lane8_Undetermined_L008_R1_001.fastq.gz" file-size="371255111"/>
        <fastq md5filename="977fd388e1b3451dfcdbf9bdcbb89ed4" side="2" path="RUN62_XFC2DM8ACXX/data/Undetermined_indices/Sample_lane8/lane8_Undetermined_L008_R2_001.fastq.gz" file-size="370744530"/>
      </pair>
    </undetermined>
  </directory>
</illumina>

How to use that file ? here is a example of XSLT stylesheet that can generate a Makefile to generate a LaTeX about the number of reads per Lane/Sample/Index

<?xml version='1.0'  encoding="ISO-8859-1"?>
<xsl:stylesheet
	xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
	version='1.0' 
	> 
<xsl:output method="text"/>


<xsl:template match="/">
.PHONY:all clean

all: report.pdf

report.pdf: report.tex 
	pdflatex $&lt;

report.tex : all.count
	echo 'T&lt;-read.table("$&lt;",head=TRUE,sep="\t");$(foreach FTYPE,Index Sample Lane, T2&lt;-tapply(T$$count,T$$${FTYPE},sum);png("${FTYPE}.png");barplot(T2,las=3);dev.off();)' | R --no-save
	echo "\documentclass{report}" &gt; $@
	echo "\usepackage{graphicx}" &gt;&gt; $@
	echo "\date{\today}" &gt;&gt; $@
	echo "\title{FastQ Report}" &gt;&gt; $@
	echo "\begin{document}" &gt;&gt; $@
	echo "\maketitle" &gt;&gt; $@
	$(foreach FTYPE,Index Sample Lane, echo "\section{By ${FTYPE}}#\begin{center}#\includegraphics{${FTYPE}.png}#\end{center}" | tr "#" "\n" &gt;&gt; $@ ; )
	echo "\end{document}" &gt;&gt; $@
	


all.count : $(addsuffix .count, <xsl:for-each select="//fastq" ><xsl:value-of select="@md5filename"/><xsl:text> </xsl:text></xsl:for-each>) 
	echo -e "Lane\tsplit\tside\tsize\tcount\tIndex\tSample"  &gt; $@ &amp;&amp; \
	cat $^ &gt;&gt; $@

<xsl:apply-templates select="//fastq" mode="count"/>

clean:
	rm -f all.count report.pdf report.tex $(addsuffix .count, <xsl:for-each select="//fastq" ><xsl:value-of select="@md5filename"/><xsl:text> </xsl:text></xsl:for-each>) 

</xsl:template>

<xsl:template match="fastq" mode="count">
$(addsuffix .count, <xsl:value-of select="@md5filename"/>): <xsl:value-of select="@path"/>
	gunzip -c $&lt; | awk '(NR%4==1)' | wc -l  | xargs  printf "<xsl:value-of select="../@lane"/>\t<xsl:value-of select="../@split"/>\t<xsl:value-of select="@side"/>\t<xsl:value-of select="@file-size"/>\t%s\t<xsl:choose><xsl:when test="../@index"><xsl:value-of select="../@index"/></xsl:when><xsl:otherwise>Undetermined</xsl:otherwise></xsl:choose>\t<xsl:choose><xsl:when test="../../@name"><xsl:value-of select="../../@name"/></xsl:when><xsl:otherwise>Undetermined</xsl:otherwise></xsl:choose>\n"   &gt; $@

</xsl:template>
</xsl:stylesheet>
$ xsltproc  illumina.xml illumina2makefile.xsl > Makefile
output:
.PHONY:all clean

all: report.pdf

report.pdf: report.tex 
	pdflatex $<

report.tex : all.count
	echo 'T<-read.table("$<",head=TRUE,sep="\t");$(foreach FTYPE,Index Sample Lane, T2<-tapply(T$$count,T$$${FTYPE},sum);png("${FTYPE}.png");barplot(T2,las=3);dev.off();)' | R --no-save
	echo "\documentclass{report}" > $@
	echo "\usepackage{graphicx}" >> $@
	echo "\date{\today}" >> $@
	echo "\title{FastQ Report}" >> $@
	echo "\begin{document}" >> $@
	echo "\maketitle" >> $@
	$(foreach FTYPE,Index Sample Lane, echo "\section{By ${FTYPE}}#\begin{center}#\includegraphics{${FTYPE}.png}#\end{center}" | tr "#" "\n" >> $@ ; )
	echo "\end{document}" >> $@



all.count : $(addsuffix .count, 3369c3457d6603f06379b654cb78e696 832039fa00b5f40108848e48eb437e0b 091727bb6b300e463c3d708e157436ab 20235ef4ec88....)
	echo -e "Lane\tsplit\tside\tsize\tcount\tIndex\tSample"  > $@ && \
	cat $^ >> $@


$(addsuffix .count, 3369c3457d6603f06379b654cb78e696): RUN62_XFC2DM8ACXX/data/OUT/Sample_SAMPLE1/SAMPLE1_ATCACG_L008_R1_002.fastq.gz
	gunzip -c $< | awk '(NR%4==1)' | wc -l  | xargs  printf "8\t2\t1\t359046311\t%s\tATCACG\tSAMPLE1\n"   > $@


$(addsuffix .count, 832039fa00b5f40108848e48eb437e0b): RUN62_XFC2DM8ACXX/data/OUT/Sample_SAMPLE1/SAMPLE1_ATCACG_L008_R2_002.fastq.gz
	gunzip -c $< | awk '(NR%4==1)' | wc -l  | xargs  printf "8\t2\t2\t359659451\t%s\tATCACG\tSAMPLE1\n"   > $@
(....)

JSON output

The JSON output looks like this

[{"directory":"RUN62_XFC2DM8ACXX/data","samples":[{"sample":"SAMPLE1","files":[{
"md5pair":"cd4b436ce7aff4cf669d282c6d9a7899","lane":8,"index":"ATCACG","split":2
,"forward":{"md5filename":"3369c3457d6603f06379b654cb78e696","path":"20131001_SN
L149_0062_XFC2DM8ACXX/data/OUT/Sample_SAMPLE1/SAMPLE1_ATCACG_L008_R1_002.fastq.g
z","side":1,"file-size":359046311},"reverse":{"md5filename":"832039fa00b5f401088
48e48eb437e0b","path":"20131001_SNL149_0062_XFC2DM8ACXX/data/OUT/Sample_SAMPLE1/
SAMPLE1_ATCACG_L008_R2_002.fastq.gz","side":2,"file-size":359659451}},{"md5pair"
:"b3050fa3307e63ab9790b0e263c5d240","lane":8,"index":"ATCACG","split":3,"forward
":{"md5filename":"091727bb6b300e463c3d708e157436ab","path":"20131001_SNL149_0062
_XFC2DM8ACXX/data/OUT/Sample_SAMPLE1/SAMPLE1_ATCACG_L008_R1_003.fastq.gz","side"
:1,"file-size":206660736},"reverse":{"md5filename":"20235ef4ec8845515beb4e13da34
b5d3","path":"20131001_SNL149_0062_XFC2DM8ACXX/data/OUT/Sample_SAMPLE1/SAMPLE1_A
TCACG_L008_R2_003.fastq.gz","side":2,"file-size":206715143}},{"md5pair":"9f7ee49
e87d01610372c43ab928939f6","lane":8,"index":"ATCACG","split":1,"forward":{"md5fi
lename":"54cb2fd33edd5c2e787287ccf1595952","path":"20131001_SNL149_0062_XFC2DM8A
CXX/data/OUT/Sample_SAMPLE1/SAMPLE1_ATCACG_L008_R1_001.fastq.gz","side":1,"file-
size":354530831},"reverse":{"md5filename":"e937cbdf32020074e50d3332c67cf6b3","pa
th":"20131001_SNL149_0062_XFC2DM8ACXX/data/OUT/Sample_SAMPLE1/SAMPLE1_ATCACG_L00
8_R2_001.fastq.gz","side":2,"file-size":356908963}},{"md5pair":"0697846a504158ee
f523c0f4ede85288","lane":7,"index":"ATCACG","split":2,"forward":{"md5filename":"
It can be processed using a tool like jsvelocity to generate the same kind of Makefile:

The velocity template for jsvelocity


#macro(maketarget $fastq)

$(addsuffix .count, ${fastq.md5filename}): ${fastq.path}
	gunzip -c $< | awk '(NR%4==1)' | wc -l  | xargs  printf "${fastq.parentNode.lane}\t${fastq.parentNode.split}\t${fastq.side}\t${fastq['file-size']}\t%s\t#if(${fastq.parentNode.containsKey("index")})${fastq.parentNode.index}#{else}Undetermined#{end}\t#if(${fastq.parentNode.parentNode.containsKey("name")})${fastq.parentNode.parentNode.name}#{else}Undetermined#{end}\n"   > $@

#end

.PHONY:all clean

all: report.pdf

report.pdf: report.tex 
	pdflatex $<

report.tex : all.count
	echo 'T<-read.table("$<",head=TRUE,sep="\t");$(foreach FTYPE,Index Sample Lane, T2<-tapply(T$$count,T$$${FTYPE},sum);png("${FTYPE}.png");barplot(T2,las=3);dev.off();)' | R --no-save
	echo "\documentclass{report}" > $@
	echo "\usepackage{graphicx}" >> $@
	echo "\date{\today}" >> $@
	echo "\title{FastQ Report}" >> $@
	echo "\begin{document}" >> $@
	echo "\maketitle" >> $@
	$(foreach FTYPE,Index Sample Lane, echo "\section{By ${FTYPE}}#\begin{center}#\includegraphics{${FTYPE}.png}#\end{center}" | tr "#" "\n" >> $@ ; )
	echo "\end{document}" >> $@

all.count : $(addsuffix .count, #foreach($dir in $all) #foreach($sample in ${dir.samples})#foreach($pair in ${sample.files}) ${pair.forward.md5filename}  ${pair.reverse.md5filename} #end #end #foreach($pair in   ${dir.undetermined}) ${pair.forward.md5filename}  ${pair.reverse.md5filename} #end  #end )



#foreach($dir in $all)
#foreach($sample in ${dir.samples})
#foreach($pair in ${sample.files})
#maketarget($pair.forward)
#maketarget($pair.reverse)
#end
#end
#foreach($pair in   ${dir.undetermined})
#maketarget($pair.forward)
#maketarget($pair.reverse)
#end 
#end


clean:
	rm -f all.count  $(addsuffix .count,  #foreach($dir in $all)
#foreach($sample in ${dir.samples})
#foreach($pair in ${sample.files}) ${pair.forward.md5filename}  ${pair.reverse.md5filename}  #end #end
#foreach($pair in   ${dir.undetermined}) ${pair.forward.md5filename}  ${pair.reverse.md5filename}  #end  #end )

transform using jsvelocity:


java -jar dist/jsvelocity.jar \
     -d all illumina.json \
      illumina.vm > Makefile

ouput: same as above

That's it,

Pierre

PS: This post was generated using the XSLT stylesheet :"github2html.xsl" and https://github.com/lindenb/jvarkit/wiki/Illuminadir.