05 December 2014

Divide-and-conquer in a #Makefile : recursivity and #parallelism.

This post is my notebook about implementing a divide-and-conquer strategy in GNU make.
Say you have a list of 'N' VCFs files. You want to create a list of:

  • common SNPs in vcf1 and vcf2
  • common SNPs in vcf3 and the previous list
  • common SNPs in vcf4 and the previous list
  • (...)
  • common SNPs in vcfN and the previous list
Yes, I know I can do this using:grep -v '^#' f.vcf|cut -f 1,2,4,5 | sort | uniq

Using a linear Makefile it could look like:

list2: vcf1 vcf2
    grep -v '^#' $^ |cut -f 1,2,4,5 | sort | uniq > $@
list3: vcf3 list2
    grep -v '^#' $^ |cut -f 1,2,4,5 | sort | uniq > $@
list4: vcf4 list3
    grep -v '^#' $^ |cut -f 1,2,4,5 | sort | uniq > $@
list5: vcf5 list4
    grep -v '^#' $^ |cut -f 1,2,4,5 | sort | uniq > $@
(...)

We can speed-up the workflow using the parallel option of make -j (number-of-parallel-jobs) and using a divide-and-conquer strategy. Here, the targets 'list1_2' and 'list3_4' can be processed independently in parallel.

list1_2: vcf1 vcf2
    grep -v '^#' $^ |cut -f 1,2,4,5 | sort | uniq > $@

list3_4: vcf3 vcf4
    grep -v '^#' $^ |cut -f 1,2,4,5 | sort | uniq > $@

list1_4: list1_2 list3_4
    grep -v '^#' $^ |cut -f 1,2,4,5 | sort | uniq > $@

By using the internal 'Make' functions $(eval), $(call), $(shell) we can define a recursive method "recursive". This method takes two arguments which are the 0-based indexes of a VCF in the list of VCFs. Here is the Makefile:
Running the makefile:
$ make 
gunzip -c Sample18.vcf.gz | grep -v '^#' | cut -f 1,2,4,5 | LC_ALL=C sort | uniq > target0_1
gunzip -c Sample13.vcf.gz | grep -v '^#' | cut -f 1,2,4,5 | LC_ALL=C sort | uniq > target1_2
LC_ALL=C comm -12 target0_1 target1_2 > target0_2
gunzip -c Sample1.vcf.gz | grep -v '^#' | cut -f 1,2,4,5 | LC_ALL=C sort | uniq > target2_3
gunzip -c Sample19.vcf.gz | grep -v '^#' | cut -f 1,2,4,5 | LC_ALL=C sort | uniq > target3_4
gunzip -c Sample12.vcf.gz | grep -v '^#' | cut -f 1,2,4,5 | LC_ALL=C sort | uniq > target4_5
LC_ALL=C comm -12 target3_4 target4_5 > target3_5
LC_ALL=C comm -12 target2_3 target3_5 > target2_5
LC_ALL=C comm -12 target0_2 target2_5 > target0_5
gunzip -c Sample17.vcf.gz | grep -v '^#' | cut -f 1,2,4,5 | LC_ALL=C sort | uniq > target5_6
gunzip -c Sample16.vcf.gz | grep -v '^#' | cut -f 1,2,4,5 | LC_ALL=C sort | uniq > target6_7
LC_ALL=C comm -12 target5_6 target6_7 > target5_7
gunzip -c Sample9.vcf.gz | grep -v '^#' | cut -f 1,2,4,5 | LC_ALL=C sort | uniq > target7_8
gunzip -c Sample15.vcf.gz | grep -v '^#' | cut -f 1,2,4,5 | LC_ALL=C sort | uniq > target8_9
gunzip -c Sample5.vcf.gz | grep -v '^#' | cut -f 1,2,4,5 | LC_ALL=C sort | uniq > target9_10
LC_ALL=C comm -12 target8_9 target9_10 > target8_10
LC_ALL=C comm -12 target7_8 target8_10 > target7_10
LC_ALL=C comm -12 target5_7 target7_10 > target5_10
LC_ALL=C comm -12 target0_5 target5_10 > target0_10
gunzip -c Sample14.vcf.gz | grep -v '^#' | cut -f 1,2,4,5 | LC_ALL=C sort | uniq > target10_11
gunzip -c Sample3.vcf.gz | grep -v '^#' | cut -f 1,2,4,5 | LC_ALL=C sort | uniq > target11_12
LC_ALL=C comm -12 target10_11 target11_12 > target10_12
gunzip -c Sample11.vcf.gz | grep -v '^#' | cut -f 1,2,4,5 | LC_ALL=C sort | uniq > target12_13
gunzip -c Sample2.vcf.gz | grep -v '^#' | cut -f 1,2,4,5 | LC_ALL=C sort | uniq > target13_14
gunzip -c Sample6.vcf.gz | grep -v '^#' | cut -f 1,2,4,5 | LC_ALL=C sort | uniq > target14_15
LC_ALL=C comm -12 target13_14 target14_15 > target13_15
LC_ALL=C comm -12 target12_13 target13_15 > target12_15
LC_ALL=C comm -12 target10_12 target12_15 > target10_15
gunzip -c Sample20.vcf.gz | grep -v '^#' | cut -f 1,2,4,5 | LC_ALL=C sort | uniq > target15_16
gunzip -c Sample10.vcf.gz | grep -v '^#' | cut -f 1,2,4,5 | LC_ALL=C sort | uniq > target16_17
LC_ALL=C comm -12 target15_16 target16_17 > target15_17
gunzip -c Sample4.vcf.gz | grep -v '^#' | cut -f 1,2,4,5 | LC_ALL=C sort | uniq > target17_18
gunzip -c Sample8.vcf.gz | grep -v '^#' | cut -f 1,2,4,5 | LC_ALL=C sort | uniq > target18_19
gunzip -c Sample7.vcf.gz | grep -v '^#' | cut -f 1,2,4,5 | LC_ALL=C sort | uniq > target19_20
LC_ALL=C comm -12 target18_19 target19_20 > target18_20
LC_ALL=C comm -12 target17_18 target18_20 > target17_20
LC_ALL=C comm -12 target15_17 target17_20 > target15_20
LC_ALL=C comm -12 target10_15 target15_20 > target10_20
LC_ALL=C comm -12 target0_10 target10_20 > target0_20
and here is the generated workflow (drawn with make2graph ).
:
That's it
Pierre.

04 December 2014

XML+XSLT = #Makefile -based #workflows for #bioinformatics

I've recently read some conversations on Twitter about Makefile-based bioinformatics workflows. I've suggested on biostars.org (Standard simple format to describe a bioinformatics analysis pipeline) that a XML file could be used to describe a model of data and XSLT could transform this model to a Makefile-based workflow. I've already explored this idea in a previous post (Generating a pipeline of analysis (Makefile) for NGS with xslt. ) and in our lab, we use JSON and jsvelocity to generate our workflows (e.g: https://gist.github.com/lindenb/3c07ca722f793cc5dd60).
In the current post, I'll describe my github repository containing a complete XML+XSLT example: https://github.com/lindenb/ngsxml.

Download the data

git clone "https://github.com/lindenb/ngsxml.git"


The model
The XML model is self-explanatory:
<?xml version="1.0" encoding="UTF-8"?>
<model name="myProject" description="my project" directory="OUT">
  <project name="Proj1">
    <sample name="Sample1">
      <fastq>
        <for>test/fastq/sample_1_01_R1.fastq.gz</for>
        <rev>test/fastq/sample_1_01_R2.fastq.gz</rev>
      </fastq>
      <fastq id="groupid2" lane="2" library="lib1" platform="ILMN" median-size="98">
        <for>test/fastq/sample_1_02_R1.fastq.gz</for>
        <rev>test/fastq/sample_1_02_R2.fastq.gz</rev>
      </fastq>
    </sample>
    <sample name="Sample2">
      <fastq>
        <for>test/fastq/sample_2_01_R1.fastq.gz</for>
        <rev>test/fastq/sample_2_01_R2.fastq.gz</rev>
      </fastq>
    </sample>
  </project>
</model>


Validating the model:
This XML document is possibly validated with a XML schema:
$ xmllint  --schema xsd/schema01.xsd --noout test/model01.xml
test/model01.xml validates


Generating the Makefile:
The XML document is transformed into a Makefile using the following XSLT stylesheet: https://github.com/lindenb/ngsxml/blob/master/stylesheets/model2make.xsl
xsltproc --output Makefile stylesheets/model2make.xsl test/model01.xml
The Makefile:
# Name
#  myProject
# Description:
#  my project
# 
include config.mk
OUTDIR=OUT
BINDIR=$(abspath ${OUTDIR})/bin


# if tools are undefined
bwa.exe ?=${BINDIR}/bwa-0.7.10/bwa
samtools.exe ?=${BINDIR}/samtools-0.1.19/samtools
bcftools.exe ?=${BINDIR}/samtools-0.1.19/bcftools/bcftools
tabix.exe ?=${BINDIR}/tabix-0.2.6/tabix
bgzip.exe ?=${BINDIR}/tabix-0.2.6/bgzip

.PHONY=all clean all_bams all_vcfs

all: all_vcfs


all_vcfs:  \
 $(OUTDIR)/Projects/Proj1/VCF/Proj1.vcf.gz


all_bams:  \
 $(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/Proj1_Sample1.bam \
 $(OUTDIR)/Projects/Proj1/Samples/Sample2/BAM/Proj1_Sample2.bam

#
# VCF for project 'Proj1'
# 
$(OUTDIR)/Projects/Proj1/VCF/Proj1.vcf.gz : $(addsuffix .bai, $(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/Proj1_Sample1.bam $(OUTDIR)/Projects/Proj1/Samples/Sample2/BAM/Proj1_Sample2.bam) \
 $(addsuffix .fai,${REFERENCE}) ${samtools.exe}  ${bgzip.exe} ${tabix.exe} ${bcftools.exe}
 mkdir -p $(dir $@) && \
 ${samtools.exe} mpileup -uf ${REFERENCE} $(basename $(filter %.bai,$^)) | \
 ${bcftools.exe} view -vcg - > $(basename $@)  && \
 ${bgzip.exe} -f $(basename $@) && \
 ${tabix.exe} -f -p vcf $@
 
    



#
# index final BAM for Sample 'Sample1'
# 
$(addsuffix .bai, $(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/Proj1_Sample1.bam): $(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/Proj1_Sample1.bam ${samtools.exe}
 mkdir -p $(dir $@) && \
 ${samtools.exe} index  $<
#
# prepare final BAM for Sample 'Sample1'
# 
$(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/Proj1_Sample1.bam : $(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/${tmp.prefix}rmdup.bam
 mkdir -p $(dir $@) && \
 cp  $< $@


$(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/${tmp.prefix}rmdup.bam : $(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/${tmp.prefix}merged.bam ${samtools.exe}
 mkdir -p $(dir $@) && \
 ${samtools.exe} rmdup  $<  $@



#
# merge BAMs 
#
$(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/${tmp.prefix}merged.bam :  \
  $(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/${tmp.prefix}1_sorted.bam \
  $(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/${tmp.prefix}2_sorted.bam ${samtools.exe}
 mkdir -p $(dir $@) && \
  ${samtools.exe} merge -f $@ $(filter %.bam,$^)
  



 
#
# Index BAM $(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/${tmp.prefix}1_sorted.bam
#
$(addsuffix .bai,$(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/${tmp.prefix}1_sorted.bam ): $(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/${tmp.prefix}1_sorted.bam ${samtools.exe}
 ${samtools} index $<

#
# Align test/fastq/sample_1_01_R1.fastq.gz and test/fastq/sample_1_01_R2.fastq.gz
#
$(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/${tmp.prefix}1_sorted.bam : \
 test/fastq/sample_1_01_R1.fastq.gz  \
 test/fastq/sample_1_01_R2.fastq.gz \
 $(addsuffix .bwt,${REFERENCE}) \
 ${bwa.exe} ${samtools.exe}
 mkdir -p $(dir $@) && \
 ${bwa.exe} mem -R '@RG\tID:idp20678948\tSM:Sample1\tLB:Sample1\tPL:ILLUMINA\tPU:1' \
  ${REFERENCE} \
  test/fastq/sample_1_01_R1.fastq.gz \
  test/fastq/sample_1_01_R2.fastq.gz |\
 ${samtools.exe} view -uS - |\
 ${samtools.exe} sort - $(basename $@) 





 
#
# Index BAM $(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/${tmp.prefix}2_sorted.bam
#
$(addsuffix .bai,$(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/${tmp.prefix}2_sorted.bam ): $(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/${tmp.prefix}2_sorted.bam ${samtools.exe}
 ${samtools} index $<

#
# Align test/fastq/sample_1_02_R1.fastq.gz and test/fastq/sample_1_02_R2.fastq.gz
#
$(OUTDIR)/Projects/Proj1/Samples/Sample1/BAM/${tmp.prefix}2_sorted.bam : \
 test/fastq/sample_1_02_R1.fastq.gz  \
 test/fastq/sample_1_02_R2.fastq.gz \
 $(addsuffix .bwt,${REFERENCE}) \
 ${bwa.exe} ${samtools.exe}
 mkdir -p $(dir $@) && \
 ${bwa.exe} mem -R '@RG\tID:groupid2\tSM:Sample1\tLB:lib1\tPL:ILMN\tPU:2\tPI:98' \
  ${REFERENCE} \
  test/fastq/sample_1_02_R1.fastq.gz \
  test/fastq/sample_1_02_R2.fastq.gz |\
 ${samtools.exe} view -uS - |\
 ${samtools.exe} sort - $(basename $@) 




#
# index final BAM for Sample 'Sample2'
# 
$(addsuffix .bai, $(OUTDIR)/Projects/Proj1/Samples/Sample2/BAM/Proj1_Sample2.bam): $(OUTDIR)/Projects/Proj1/Samples/Sample2/BAM/Proj1_Sample2.bam ${samtools.exe}
 mkdir -p $(dir $@) && \
 ${samtools.exe} index  $<
#
# prepare final BAM for Sample 'Sample2'
# 
$(OUTDIR)/Projects/Proj1/Samples/Sample2/BAM/Proj1_Sample2.bam : $(OUTDIR)/Projects/Proj1/Samples/Sample2/BAM/${tmp.prefix}rmdup.bam
 mkdir -p $(dir $@) && \
 cp  $< $@


$(OUTDIR)/Projects/Proj1/Samples/Sample2/BAM/${tmp.prefix}rmdup.bam : $(OUTDIR)/Projects/Proj1/Samples/Sample2/BAM/${tmp.prefix}1_sorted.bam ${samtools.exe}
 mkdir -p $(dir $@) && \
 ${samtools.exe} rmdup  $<  $@





 
#
# Index BAM $(OUTDIR)/Projects/Proj1/Samples/Sample2/BAM/${tmp.prefix}1_sorted.bam
#
$(addsuffix .bai,$(OUTDIR)/Projects/Proj1/Samples/Sample2/BAM/${tmp.prefix}1_sorted.bam ): $(OUTDIR)/Projects/Proj1/Samples/Sample2/BAM/${tmp.prefix}1_sorted.bam ${samtools.exe}
 ${samtools} index $<

#
# Align test/fastq/sample_2_01_R1.fastq.gz and test/fastq/sample_2_01_R2.fastq.gz
#
$(OUTDIR)/Projects/Proj1/Samples/Sample2/BAM/${tmp.prefix}1_sorted.bam : \
 test/fastq/sample_2_01_R1.fastq.gz  \
 test/fastq/sample_2_01_R2.fastq.gz \
 $(addsuffix .bwt,${REFERENCE}) \
 ${bwa.exe} ${samtools.exe}
 mkdir -p $(dir $@) && \
 ${bwa.exe} mem -R '@RG\tID:idp20681172\tSM:Sample2\tLB:Sample2\tPL:ILLUMINA\tPU:1' \
  ${REFERENCE} \
  test/fastq/sample_2_01_R1.fastq.gz \
  test/fastq/sample_2_01_R2.fastq.gz |\
 ${samtools.exe} view -uS - |\
 ${samtools.exe} sort - $(basename $@) 




$(addsuffix .fai,${REFERENCE}): ${REFERENCE} ${samtools.exe}
 ${samtools.exe} faidx $<

$(addsuffix .bwt,${REFERENCE}): ${REFERENCE} ${bwa.exe}
 ${bwa.exe} index $<


${BINDIR}/bwa-0.7.10/bwa :
 rm -rf $(BINDIR)/bwa-0.7.10/ && \
 mkdir -p $(BINDIR) && \
 curl -o $(BINDIR)/bwa-0.7.10.tar.bz2 -L "http://sourceforge.net/projects/bio-bwa/files/bwa-0.7.10.tar.bz2/download?use_mirror=freefr" && \
 tar xvfj $(BINDIR)/bwa-0.7.10.tar.bz2 -C $(OUTDIR)/bin  && \
 rm $(BINDIR)/bwa-0.7.10.tar.bz2 && \
 make -C $(dir $@)

${BINDIR}/samtools-0.1.19/bcftools/bcftools: ${BINDIR}/samtools-0.1.19/samtools

${BINDIR}/samtools-0.1.19/samtools  : 
 rm -rf $(BINDIR)/samtools-0.1.19/ && \
 mkdir -p $(BINDIR) && \
 curl -o $(BINDIR)/samtools-0.1.19.tar.bz2 -L "http://sourceforge.net/projects/samtools/files/samtools-0.1.19.tar.bz2/download?use_mirror=freefr" && \
 tar xvfj $(BINDIR)/samtools-0.1.19.tar.bz2 -C $(OUTDIR)/bin  && \
 rm $(BINDIR)/samtools-0.1.19.tar.bz2 && \
 make -C $(dir $@)


${BINDIR}/tabix-0.2.6/bgzip : ${BINDIR}/tabix-0.2.6/tabix


${BINDIR}/tabix-0.2.6/tabix  : 
 rm -rf $(BINDIR)/tabix-0.2.6/ && \
 mkdir -p $(BINDIR) && \
 curl -o $(BINDIR)/tabix-0.2.6.tar.bz2 -L "http://sourceforge.net/projects/samtools/files/tabix-0.2.6.tar.bz2/download?use_mirror=freefr" && \
 tar xvfj $(BINDIR)/tabix-0.2.6.tar.bz2 -C $(OUTDIR)/bin  && \
 rm $(BINDIR)/tabix-0.2.6.tar.bz2 && \
 make -C $(dir $@) tabix bgzip

clean:
 rm -rf ${BINDIR}

Drawing the Workflow:
The workflow is drawn with https://github.com/lindenb/makefile2graph.


Running Make:
And here is the output of make:
rm -rf /home/lindenb/src/ngsxml/OUT/bin/bwa-0.7.10/ && \
 mkdir -p /home/lindenb/src/ngsxml/OUT/bin && \
 curl -o /home/lindenb/src/ngsxml/OUT/bin/bwa-0.7.10.tar.bz2 -L "http://sourceforge.net/projects/bio-bwa/files/bwa-0.7.10.tar.bz2/download?use_mirror=freefr" && \
 tar xvfj /home/lindenb/src/ngsxml/OUT/bin/bwa-0.7.10.tar.bz2 -C OUT/bin  && \
 rm /home/lindenb/src/ngsxml/OUT/bin/bwa-0.7.10.tar.bz2 && \
 make -C /home/lindenb/src/ngsxml/OUT/bin/bwa-0.7.10/
bwa-0.7.10/
bwa-0.7.10/bamlite.c
(...)
gcc -g -Wall -Wno-unused-function -O2 -msse -msse2 -msse3 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS QSufSort.o bwt_gen.o bwase.o bwaseqio.o bwtgap.o bwtaln.o bamlite.o is.o bwtindex.o bwape.o kopen.o pemerge.o bwtsw2_core.o bwtsw2_main.o bwtsw2_aux.o bwt_lite.o bwtsw2_chain.o fastmap.o bwtsw2_pair.o main.o -o bwa -L. -lbwa -lm -lz -lpthread
make[1]: Entering directory `/home/lindenb/src/ngsxml'
/home/lindenb/src/ngsxml/OUT/bin/bwa-0.7.10/bwa index test/ref/ref.fa
rm -rf /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/ && \
 mkdir -p /home/lindenb/src/ngsxml/OUT/bin && \
 curl -o /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19.tar.bz2 -L "http://sourceforge.net/projects/samtools/files/samtools-0.1.19.tar.bz2/download?use_mirror=freefr" && \
 tar xvfj /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19.tar.bz2 -C OUT/bin  && \
 rm /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19.tar.bz2 && \
 make -C /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/
samtools-0.1.19/
samtools-0.1.19/.gitignore
samtools-0.1.19/AUTHORS
(...)
gcc -g -Wall -O2 -o bamcheck bamcheck.o -L.. -lm -lbam -lpthread -lz
make[3]: Leaving directory `/home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/misc'
make[2]: Leaving directory `/home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19'
mkdir -p OUT/Projects/Proj1/Samples/Sample1/BAM/ && \
 /home/lindenb/src/ngsxml/OUT/bin/bwa-0.7.10/bwa mem -R '@RG\tID:idp11671724\tSM:Sample1\tLB:Sample1\tPL:ILLUMINA\tPU:1' \
  test/ref/ref.fa \
  test/fastq/sample_1_01_R1.fastq.gz \
  test/fastq/sample_1_01_R2.fastq.gz |\
 /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/samtools view -uS - |\
 /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/samtools sort - OUT/Projects/Proj1/Samples/Sample1/BAM/__DELETE__1_sorted 
mkdir -p OUT/Projects/Proj1/Samples/Sample1/BAM/ && \
 /home/lindenb/src/ngsxml/OUT/bin/bwa-0.7.10/bwa mem -R '@RG\tID:groupid2\tSM:Sample1\tLB:lib1\tPL:ILMN\tPU:2\tPI:98' \
  test/ref/ref.fa \
  test/fastq/sample_1_02_R1.fastq.gz \
  test/fastq/sample_1_02_R2.fastq.gz |\
 /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/samtools view -uS - |\
 /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/samtools sort - OUT/Projects/Proj1/Samples/Sample1/BAM/__DELETE__2_sorted 
mkdir -p OUT/Projects/Proj1/Samples/Sample1/BAM/ && \
  /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/samtools merge -f OUT/Projects/Proj1/Samples/Sample1/BAM/__DELETE__merged.bam OUT/Projects/Proj1/Samples/Sample1/BAM/__DELETE__1_sorted.bam OUT/Projects/Proj1/Samples/Sample1/BAM/__DELETE__2_sorted.bam
mkdir -p OUT/Projects/Proj1/Samples/Sample1/BAM/ && \
 /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/samtools rmdup  OUT/Projects/Proj1/Samples/Sample1/BAM/__DELETE__merged.bam  OUT/Projects/Proj1/Samples/Sample1/BAM/__DELETE__rmdup.bam
mkdir -p OUT/Projects/Proj1/Samples/Sample1/BAM/ && \
 cp  OUT/Projects/Proj1/Samples/Sample1/BAM/__DELETE__rmdup.bam OUT/Projects/Proj1/Samples/Sample1/BAM/Proj1_Sample1.bam
mkdir -p OUT/Projects/Proj1/Samples/Sample1/BAM/ && \
 /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/samtools index  OUT/Projects/Proj1/Samples/Sample1/BAM/Proj1_Sample1.bam
mkdir -p OUT/Projects/Proj1/Samples/Sample2/BAM/ && \
 /home/lindenb/src/ngsxml/OUT/bin/bwa-0.7.10/bwa mem -R '@RG\tID:idp11673828\tSM:Sample2\tLB:Sample2\tPL:ILLUMINA\tPU:1' \
  test/ref/ref.fa \
  test/fastq/sample_2_01_R1.fastq.gz \
  test/fastq/sample_2_01_R2.fastq.gz |\
 /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/samtools view -uS - |\
 /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/samtools sort - OUT/Projects/Proj1/Samples/Sample2/BAM/__DELETE__1_sorted 
mkdir -p OUT/Projects/Proj1/Samples/Sample2/BAM/ && \
 /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/samtools rmdup  OUT/Projects/Proj1/Samples/Sample2/BAM/__DELETE__1_sorted.bam  OUT/Projects/Proj1/Samples/Sample2/BAM/__DELETE__rmdup.bam
mkdir -p OUT/Projects/Proj1/Samples/Sample2/BAM/ && \
 cp  OUT/Projects/Proj1/Samples/Sample2/BAM/__DELETE__rmdup.bam OUT/Projects/Proj1/Samples/Sample2/BAM/Proj1_Sample2.bam
mkdir -p OUT/Projects/Proj1/Samples/Sample2/BAM/ && \
 /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/samtools index  OUT/Projects/Proj1/Samples/Sample2/BAM/Proj1_Sample2.bam
/home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/samtools faidx test/ref/ref.fa
rm -rf /home/lindenb/src/ngsxml/OUT/bin/tabix-0.2.6/ && \
 mkdir -p /home/lindenb/src/ngsxml/OUT/bin && \
 curl -o /home/lindenb/src/ngsxml/OUT/bin/tabix-0.2.6.tar.bz2 -L "http://sourceforge.net/projects/samtools/files/tabix-0.2.6.tar.bz2/download?use_mirror=freefr" && \
 tar xvfj /home/lindenb/src/ngsxml/OUT/bin/tabix-0.2.6.tar.bz2 -C OUT/bin  && \
 rm /home/lindenb/src/ngsxml/OUT/bin/tabix-0.2.6.tar.bz2 && \
 make -C /home/lindenb/src/ngsxml/OUT/bin/tabix-0.2.6/ tabix bgzip
tabix-0.2.6/
tabix-0.2.6/ChangeLog
tabix-0.2.6/Makefile
(...)
make[2]: Leaving directory `/home/lindenb/src/ngsxml/OUT/bin/tabix-0.2.6'
mkdir -p OUT/Projects/Proj1/VCF/ && \
 /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/samtools mpileup -uf test/ref/ref.fa OUT/Projects/Proj1/Samples/Sample1/BAM/Proj1_Sample1.bam OUT/Projects/Proj1/Samples/Sample2/BAM/Proj1_Sample2.bam | \
 /home/lindenb/src/ngsxml/OUT/bin/samtools-0.1.19/bcftools/bcftools view -vcg - > OUT/Projects/Proj1/VCF/Proj1.vcf  && \
 /home/lindenb/src/ngsxml/OUT/bin/tabix-0.2.6/bgzip -f OUT/Projects/Proj1/VCF/Proj1.vcf && \
 /home/lindenb/src/ngsxml/OUT/bin/tabix-0.2.6/tabix -f -p vcf OUT/Projects/Proj1/VCF/Proj1.vcf.gz
make[1]: Leaving directory `/home/lindenb/src/ngsxml'
At the end, a VCF is generated
##fileformat=VCFv4.1
##samtoolsVersion=0.1.19-44428cd
(...)
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1 Sample2
chr4_gl000194_random 1973 . C G 14.4 . DP=3;VDB=2.063840e-02;AF1=1;AC1=4;DP4=0,0,3,0;MQ=60;FQ=-32.3 GT:PL:GQ 1/1:31,6,0:11 1/1:17,3,0:8
chr4_gl000194_random 2462 . A T 14.4 . DP=3;VDB=2.063840e-02;AF1=1;AC1=4;DP4=0,0,0,3;MQ=60;FQ=-32.3 GT:PL:GQ 1/1:31,6,0:11 1/1:17,3,0:8
chr4_gl000194_random 2492 . G C 14.4 . DP=3;VDB=2.063840e-02;AF1=1;AC1=4;DP4=0,0,0,3;MQ=60;FQ=-32.3 GT:PL:GQ 1/1:31,6,0:11 1/1:17,3,0:8
chr4_gl000194_random 2504 . A T 14.4 . DP=3;VDB=2.063840e-02;AF1=1;AC1=4;DP4=0,0,0,3;MQ=60;FQ=-32.3 GT:PL:GQ 1/1:31,6,0:11 1/1:17,3,0:8
That's it,

Pierre