30 October 2013

GNU Make: saving the versions of the tools using 'order-only-prerequisites' : my notebook

Rule 3 of "Ten Simple Rules for Reproducible Computational Research". is
:

Archive the Exact Versions of All External Programs Used
.
I work with Makefile-based workflows: how can I save the version of each software used when I invoke 'make', whatever the target is ? A naive solution is to add a dependency to each target. For example, the following makefile takes a simple SAM file, convert it to BAM, sort and index. For each target, I've added a dependency named "dump_params" that append the version of samtools to a file "config.txt".

But that solution doesn't work because make re-builds all targets even if the top target already exists.
$ make

date << config.txt && \
 echo -n "Samtools " << config.txt && \
 samtools  2<&1 | grep Version << config.txt
samtools view -Sb samtools-0.1.18/examples/toy.sam < unsorted.bam
[samopen] SAM header is present: 2 sequences.
samtools sort unsorted.bam sorted
samtools index sorted.bam


$ make

date << config.txt && \
 echo -n "Samtools " << config.txt && \
 samtools  2<&1 | grep Version << config.txt
samtools view -Sb samtools-0.1.18/examples/toy.sam < unsorted.bam
[samopen] SAM header is present: 2 sequences.
samtools sort unsorted.bam sorted
samtools index sorted.bam

The solution I got via Stackoverflow is to use a order-only-prerequisites: "Order-only prerequisites can be specified by placing a pipe symbol (|) in the prerequisites list: any prerequisites to the left of the pipe symbol are normal; any prerequisites to the right are order-only... (...) Note that if you declare the same file to be both a normal and an order-only prerequisite, the normal prerequisite takes precedence (...)". The makefile with the 'order-only-prerequisites' is now:




And that works ! the final target is generated only once, but the file 'config.txt' is always generated.
$ make
date << config.txt && \
 echo -n "Samtools " << config.txt && \
 samtools  2<&1 | grep Version << config.txt
samtools view -Sb samtools-0.1.18/examples/toy.sam < unsorted.bam
[samopen] SAM header is present: 2 sequences.
samtools sort unsorted.bam sorted
samtools index sorted.bam

$ make
date << config.txt && \
 echo -n "Samtools " << config.txt && \
 samtools  2<&1 | grep Version << config.txt

$ make
date << config.txt && \
 echo -n "Samtools " << config.txt && \
 samtools  2<&1 | grep Version << config.txt
That's it,
Pierre

Update :another solution

Citing MadScientist's answer on stackoverflow : Another option is to use immediately expanded shell functions, like:
__dummy := $(shell echo "Makefile was run." >> config.txt)
Since it's immediately expanded the shell script will be invoked once, as the makefile is read in. There's no need to define a dump_params target or include it as a prerequisite. This is more old-school, but has the advantage that it will run for every invocation of make, without having to go through and ensure every target has the proper order-only prerequisite defined.




1 comment:

Christian said...

The way I approach this is by always including the full path to a program (which includes the program version) when running it. Then I version control my makefile and I always know which program versions generated the output.