11 February 2013

Counting the reads in a BAM file using SGE and the Open-MPI library: my notebook.

In the current post, I'll describe my first Open MPI program. Open MPI is "a Message Passing Interface (MPI) library, a standardized and portable message-passing system to function on a wide variety of parallel computers". My C program takes a list of BAMs, distributes some jobs on the SGE (SUN/Oracle Grid Engine) to count the number of reads and returns the results to a master process.

Downloading and installing

The library was downloaded from http://www.open-mpi.org/software/ompi/v1.6/, compiled with the option --with-sge and installed in '/commun/data/packages/openmpi'.
./configure --prefix=/commun/data/packages/openmpi --with-sge
make 
make install

Configuring SGE to run MPI-based program

As described in http://docs.oracle.com/cd/E19080-01/n1.grid.eng6/817-5677/6ml49n2c0/index.html .
$ su
$ source /commun/sge_root/bird/common/settings.csh
$ setenv ARCH lx24-amd64
$ qconf -ap orte
And added:
pe_name            orte
slots              448
user_lists         NONE
xuser_lists        NONE
start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    $round_robin
control_slaves     TRUE
job_is_first_task  TRUE
urgency_slots      min
accounting_summary FALSE
Add orte to the parallele env list:
qconf -mattr queue pe_list "orte" all.q
Using qconf -mconf I've also set shell_start_mode to 'unix_behavior'.

Algorithm

In the 'main' method, test if this is the master(==0) or a slave process(!=0):
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0)
 {
 master(argc-1,&argv[1]);
 }
else
 {
 slave(argc-1,&argv[1]);
 }

MASTER

The master gets the number of available proc:
MPI_Comm_size(MPI_COMM_WORLD, &proc_count)
Loop over the number of available processors and the BAMS, for each BAM, create a fixed size structure containing the path to the BAM as well as the count of reads initialized to '0':
typedef struct Params_t
 {
 char filename[FILENAME_MAX];
 int is_error;
 long count_total;
 long count_mapped;
 long count_properly_paired;
 long count_dup;
 }Params;
This structure is sent to the slaves:
 MPI_Send(
 &param, /* message buffer */
 sizeof(Params), /* one data item */
 MPI_CHAR, /* data item is an array of bytes */
 rank, /* destination process rank */
 TAG_COUNT_BAM, /* user chosen message tag */
 MPI_COMM_WORLD /* default communicator */
 ); 
and we wait for the slaves to return those 'Params' with the filled values.:
MPI_Recv(&param, /* message buffer */
 sizeof(Params), /* one data item */
 MPI_CHAR, /* data item is an array of bytes */
 MPI_ANY_SOURCE, /* receive from any sender */
 MPI_ANY_TAG, /* any type of message */
 MPI_COMM_WORLD, /* default communicator */
 &status);
And the result is printed:
printf("%s\t%ld\t%ld\t%ld\t%ld\n",
 param.filename,
 param.count_total,
 param.count_mapped,
 param.count_properly_paired,
 param.count_dup
 );
At the end of the master method, when no more BAM is available, the slaves are released (tag is 'TAG_RELEASE_SLAVE') with :
MPI_Send(
 &empty, /* message buffer */
 sizeof(Params), /* one data item */
 MPI_CHAR, /* data item is an array of bytes */
 rank, /* destination process rank */
 TAG_RELEASE_SLAVE, /* SHutdown */
 MPI_COMM_WORLD
 ); /* default communicator */

SLAVE

The slave method receives some messages from the master:
MPI_Recv(
 &param,
 sizeof(Params),
 MPI_CHAR,
 0,
 MPI_ANY_TAG,
 MPI_COMM_WORLD,
 &status);
If the message was 'TAG_RELEASE_SLAVE' we exit the method. Else the BAM file is scanned using the SAmtools API for C.
count_reads(&param);
and the result is sent back to the master:
MPI_Send(
 &param,
 sizeof(Params),
 MPI_CHAR,
 0,
 0,
 MPI_COMM_WORLD
 );
.

Running the Job on SGE

I still don't understand why I need to write the following line:
# qsh -pe orte 4
Your job 84581 ("INTERACTIVE") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 84581 has been successfully scheduled.
Run the analysis:
qrsh -cwd -pe orte 100 /commun/data/packages/openmpi/bin/mpirun \
 /path/to/ompibam \
 `find /commun/data/projects/folder1234/align -name "*.bam"`
qstat -f shows the jobs running:
arch          states
   ---------------------------------------------------------------------------------
   all.q@node01                   BIP   0/15/64        2.76 lx24-amd64
      84598 0.55500 mpirun     lindenb      r     02/11/2013 12:03:36    15
   ---------------------------------------------------------------------------------
   all.q@node02                   BIP   0/14/64        3.89 lx24-amd64
      84598 0.55500 mpirun     lindenb      r     02/11/2013 12:03:36    14
   ---------------------------------------------------------------------------------
   all.q@node03                   BIP   0/14/64        3.23 lx24-amd64
      84598 0.55500 mpirun     lindenb      r     02/11/2013 12:03:36    14
   ---------------------------------------------------------------------------------
   all.q@node04                   BIP   0/14/64        3.68 lx24-amd64
      84598 0.55500 mpirun     lindenb      r     02/11/2013 12:03:36    14
   ---------------------------------------------------------------------------------
   all.q@node05                   BIP   0/15/64        2.91 lx24-amd64
      84598 0.55500 mpirun     lindenb      r     02/11/2013 12:03:36    15
   ---------------------------------------------------------------------------------
   all.q@node06                   BIP   0/14/64        3.91 lx24-amd64
      84598 0.55500 mpirun     lindenb      r     02/11/2013 12:03:36    14
   ---------------------------------------------------------------------------------
   all.q@node07                   BIP   0/14/64        3.79 lx24-amd64
      84598 0.55500 mpirun     lindenb      r     02/11/2013 12:03:36    14 
output:
#filename total mapped properly_paired dup
/commun/data/projects/folder1234/11X0589_markdup.bam 1179766 973943 684794 0
/commun/data/projects/folder1234/11X0589_pair1_sorted.bam 1179766 973943 684794 0
/commun/data/projects/folder1234/11X0589_realigned.bam 1179766 973943 684794 0
/commun/data/projects/folder1234/11X0589_recal.bam 1179766 973943 684794 0
/commun/data/projects/folder1234/11X0602_pair1_sorted.bam 982006 815636 566136 0
/commun/data/projects/folder1234/11X0602_realigned.bam 982006 815636 566136 0
/commun/data/projects/folder1234/11X0602_markdup.bam 982006 815636 566136 0
/commun/data/projects/folder1234/11X0602_pair1_unsorted.bam 982006 815636 566136 0
/commun/data/projects/folder1234/11X0375_pair1_sorted.bam 1255772 1031155 735480 0
/commun/data/projects/folder1234/11X0375_realigned.bam 1255772 1031155 735480 0
/commun/data/projects/folder1234/11X0375_markdup.bam 1255772 1031155 735480 0
/commun/data/projects/folder1234/11X0367_pair1_sorted.bam 1422144 1158315 796466 0
/commun/data/projects/folder1234/11X0367_markdup.bam 1422144 1158315 796466 0
/commun/data/projects/folder1234/11X0375_pair1_unsorted.bam 1255772 1031155 735480 0
/commun/data/projects/folder1234/11X0367_realigned.bam 1422144 1158315 796466 0
/commun/data/projects/folder1234/11X0291_pair1_sorted.bam 1296294 1067233 743924 0
/commun/data/projects/folder1234/11X0291_markdup.bam 1296294 1067233 743924 0
/commun/data/projects/folder1234/11X0291_realigned.bam 1296294 1067233 743924 0
/commun/data/projects/folder1234/11X0311_markdup.bam 1353534 1117463 788474 0
/commun/data/projects/folder1234/11X0311_realigned.bam 1353534 1117463 788474 0
/commun/data/projects/folder1234/11X0375_recal.bam 1255772 1031155 735480 0
/commun/data/projects/folder1234/11X0589_pair1_unsorted.bam 1179766 973943 684794 0
/commun/data/projects/folder1234/11X0367_pair1_unsorted.bam 1422144 1158315 796466 0
/commun/data/projects/folder1234/11X0290_pair1_sorted.bam 1436970 1191044 837726 0
/commun/data/projects/folder1234/11X0290_realigned.bam 1436970 1191044 837726 0
/commun/data/projects/folder1234/11X0291_pair1_unsorted.bam 1296294 1067233 743924 0
/commun/data/projects/folder1234/11X0311_pair1_sorted.bam 1353534 1117463 788474 0
/commun/data/projects/folder1234/11X0240_pair1_sorted.bam 1489672 1229654 830042 0
/commun/data/projects/folder1234/11X0240_realigned.bam 1489672 1229654 830042 0
/commun/data/projects/folder1234/11X0240_markdup.bam 1489672 1229654 830042 0
/commun/data/projects/folder1234/XX10272_pair1_sorted.bam 1579608 1296449 876040 0
/commun/data/projects/folder1234/XX10272_markdup.bam 1579608 1296449 876040 0
/commun/data/projects/folder1234/XX10272_realigned.bam 1579608 1296449 876040 0
/commun/data/projects/folder1234/11X0602_recal.bam 982006 815636 566136 0
/commun/data/projects/folder1234/11X0290_markdup.bam 1436970 1191044 837726 0
/commun/data/projects/folder1234/11X0290_pair1_unsorted.bam 1436970 1191044 837726 0
/commun/data/projects/folder1234/11X0247_pair1_sorted.bam 1489832 1233208 862842 0
/commun/data/projects/folder1234/11X0456_pair1_sorted.bam 1486490 1220969 848738 0
/commun/data/projects/folder1234/11X0456_realigned.bam 1486490 1220969 848738 0
/commun/data/projects/folder1234/11X0247_markdup.bam 1489832 1233208 862842 0
/commun/data/projects/folder1234/11X0247_realigned.bam 1489832 1233208 862842 0
/commun/data/projects/folder1234/11X0456_markdup.bam 1486490 1220969 848738 0
/commun/data/projects/folder1234/11X0367_recal.bam 1422144 1158315 796466 0
/commun/data/projects/folder1234/11X0305_pair1_sorted.bam 1818236 1489267 1027458 0
/commun/data/projects/folder1234/11X0305_markdup.bam 1818236 1489267 1027458 0
/commun/data/projects/folder1234/11X0305_realigned.bam 1818236 1489267 1027458 0
/commun/data/projects/folder1234/11X0542_pair1_sorted.bam 1784860 1469183 1016314 0
/commun/data/projects/folder1234/XX07551_recal.bam 1778358 1456518 1018694 0
/commun/data/projects/folder1234/11X0542_realigned.bam 1784860 1469183 1016314 0
/commun/data/projects/folder1234/11X0542_markdup.bam 1784860 1469183 1016314 0
/commun/data/projects/folder1234/11X0311_pair1_unsorted.bam 1353534 1117463 788474 0
/commun/data/projects/folder1234/11X0291_recal.bam 1296294 1067233 743924 0
/commun/data/projects/folder1234/XX07551_pair1_sorted.bam 1778358 1456518 1018694 0
/commun/data/projects/folder1234/11X0240_pair1_unsorted.bam 1489672 1229654 830042 0
/commun/data/projects/folder1234/XX10272_pair1_unsorted.bam 1579608 1296449 876040 0
/commun/data/projects/folder1234/XX07551_markdup.bam 1778358 1456518 1018694 0
/commun/data/projects/folder1234/XX07551_realigned.bam 1778358 1456518 1018694 0
/commun/data/projects/folder1234/11X0305_pair1_unsorted.bam 1818236 1489267 1027458 0
/commun/data/projects/folder1234/11X0311_recal.bam 1353534 1117463 788474 0
/commun/data/projects/folder1234/11X0456_pair1_unsorted.bam 1486490 1220969 848738 0
/commun/data/projects/folder1234/11X0247_pair1_unsorted.bam 1489832 1233208 862842 0
/commun/data/projects/folder1234/11X0542_pair1_unsorted.bam 1784860 1469183 1016314 0
/commun/data/projects/folder1234/XX07551_pair1_unsorted.bam 1778358 1456518 1018694 0
/commun/data/projects/folder1234/11X0433_pair1_sorted.bam 2178798 1800943 1249738 0
/commun/data/projects/folder1234/11X0433_markdup.bam 2178798 1800943 1249738 0
/commun/data/projects/folder1234/XX10272_recal.bam 1579608 1296449 876040 0
/commun/data/projects/folder1234/11X0433_realigned.bam 2178798 1800943 1249738 0
/commun/data/projects/folder1234/11X0240_recal.bam 1489672 1229654 830042 0
/commun/data/projects/folder1234/11X0290_recal.bam 1436970 1191044 837726 0
/commun/data/projects/folder1234/11X0456_recal.bam 1486490 1220969 848738 0
/commun/data/projects/folder1234/11X0247_recal.bam 1489832 1233208 862842 0
/commun/data/projects/folder1234/11X0433_pair1_unsorted.bam 2178798 1800943 1249738 0
/commun/data/projects/folder1234/11X0305_recal.bam 1818236 1489267 1027458 0
/commun/data/projects/folder1234/11X0542_recal.bam 1784860 1469183 1016314 0
/commun/data/projects/folder1234/11X0433_recal.bam 2178798 1800943 1249738 0
However I noticed seems that running this code on the cluster is slower that running it on the master node...

The C code

Note: the C code is compiled with ortecc, not gcc.


I'm still very new to this API and to those concepts. Feel free to suggest anything :-)
That's it,
Pierre

1 comment:

Tifossi said...

Hi Pierre,
Thanks a ton for this post, I'm going to try this out on my cluster here. I have a tool which iterates through a bcf/vcf file and computes a statistic at each locus. I'll let you know if I can parallelize that using OpenMP !
Thanks again,
Avinash