17 December 2009

Playing with Erlang (3): concurrency

In this post I'll show how to write a multithreaded erlang program where each child process will calculate if a given number is a Prime Number (not rocket science, a brute force algorithm will be used).

First this erlang module is called 'prime':

-module(prime).

The method named 'test' with no argument is public
-export([test/0]).

The method named 'start' creates a new thread for the function 'loop'. It returns the ID of this new thread.
start()-> spawn(fun loop/0).

The 'loop' method waits for a message in a child process. When it receives a message matching the pattern '{ _From, {prime,Number},Start }' it invokes the recursive function is_prime and prints the result:
loop()->
receive
{ _From, {prime,Number},Start }->
io:format("result=~p\n Process-ID=~p\n Duration=~p ~n",[
is_prime(2,Number),
self(),
timer:now_diff(now(),Start)/1.0e6
])
;
{_From,_Other} ->
io:format("Booomm~n",[])
end.

The function 'is_prime' is a simple-and-stupid recursive method testing if a number is a prime where we search if the numbers 'Div' lower than 'N' have a modulus of 'Number/Div=0':
is_prime(Div,Number)->
if
Div =:= Number -> {result,{number,Number},{prime,true}};
Number rem Div =:= 0 -> {result,{number,Number},{prime,false}};
true-> is_prime(Div+1,Number)
end.

The method named 'rpc' sends the ID of the current process, the prime number and the current system time to the child process:
rpc(Pid,Request)->
Pid ! { self(), Request , now()}
.

The 'test' method creates a list 'NUMS' of numbers. For each number N in this list the function prime:start is called and it returns a new thread-ID. This Thread-ID is sent with 'N' to the function rpc.
test()->
NUMS=[111, 197951153,197951154,102950143,65537,3,7,8,12],
lists:foreach( fun(N)-> Pid= start(), rpc(Pid,{prime,N}) end, NUMS )
.



Your browser does not support the <CANVAS> element !


All in one, here is the full code:
-module(prime).
-export([test/0]).

start()-> spawn(fun loop/0).

rpc(Pid,Request)->
Pid ! { self(), Request , now()}
.

loop()->
receive
{ _From, {prime,Number},Start }->
io:format("result=~p\n Process-ID=~p\n Duration=~p ~n",[
is_prime(2,Number),
self(),
timer:now_diff(now(),Start)/1.0e6
])
;
{_From,_Other} ->
io:format("Booomm~n",[])
end.

is_prime(Div,Number)->
if
Div =:= Number -> {result,{number,Number},{prime,true}};
Number rem Div =:= 0 -> {result,{number,Number},{prime,false}};
true-> is_prime(Div+1,Number)
end.


test()->
NUMS=[111, 197951153,197951154,102950143,65537,3,7,8,12],
lists:foreach( fun(N)-> Pid= start(), rpc(Pid,{prime,N}) end, NUMS )
.

Compiling:
erlc prime.erl

Running:
erl -noshell -s prime test

Result (as you can see, the biggest prime appears at the end of the list because it took more time for the calculation):
result={result,{number,65537},{prime,true}}
Process-ID=<0.34.0>
Duration=0.006522
result={result,{number,111},{prime,false}}
Process-ID=<0.30.0>
Duration=7.6e-5
result={result,{number,197951154},{prime,false}}
Process-ID=<0.32.0>
Duration=4.57e-4
result={result,{number,3},{prime,true}}
Process-ID=<0.35.0>
Duration=7.25e-4
result={result,{number,7},{prime,true}}
Process-ID=<0.36.0>
Duration=7.23e-4
result={result,{number,8},{prime,false}}
Process-ID=<0.37.0>
Duration=7.22e-4
result={result,{number,12},{prime,false}}
Process-ID=<0.38.0>
Duration=7.2e-4
result={result,{number,102950143},{prime,true}}
Process-ID=<0.33.0>
Duration=8.856508
result={result,{number,197951153},{prime,true}}
Process-ID=<0.31.0>
Duration=49.51542

That's it.
Pierre.

14 December 2009

Parsing a genetic code using flex and bison. Part 2/2

(this post was inspired from this tutorial: "Writing a Reentrant Parser with Flex and Bison").
In the previous post I've shown how to parse a NCBI genetic code in ASN.1 using flex and bison. However the generated code is non reentrant that is to say that some globals variables prevent it to be used in a concurrent environment.

If we want to use this code in a multithread environment, some changes must be added: first we need a new structure GCState where the state of the lexer will be saved between each time bison requests for a new token:
typedef struct gcState
{
/** reference to the flex lexer */
void* handle_to_the_scanner;
}GCStateStruct,*GCState;
Your browser does not support the <CANVAS> element !

The Lexer

In the lexer some options must be added: we tell Flex to create a reentrant parser and that the yylex function generated by flex takes an extra argument yylval
%option bison-bridge
%option reentrant
Prior to be used, this context 'yyscan_t' has to be initialized:
void gcParse(FILE* in)
{
GCStateStruct ctx;
memset(&ctx,0,sizeof(GCStateStruct));
yyscan_t scanner;//structure created by flex

if(yylex_init_extra(&ctx,&scanner)!=0)//initialize the scanner
{
return;
}
ctx.handle_to_the_scanner=&scanner;//will tell bison where is the lexer
yyset_in (in,scanner );//change the input stream

gc_parse(ctx);//generated by bison
yylex_destroy(scanner);
}
.The for each method in yylex, a reference to the state in GCState is added:
[0-9]+ {yyget_lval(yyscanner)->integer=atoi(yyget_text(yyscanner)) ;return INTEGER;}
\"[^\"]+\" {
yyget_lval(yyscanner)->s= malloc(yyget_leng(yyscanner)-1);
if(yyget_lval(yyscanner)->s==NULL) exit(EXIT_FAILURE);
strncpy(yyget_lval(yyscanner)->s,&yytext[1],yyget_leng(yyscanner)-2);
return TEXT;

The Parser

We tell bison that we want a reentrant parser
%pure-parser
, each time the lexer will be asked for a new token, we want to pass an extra GCState parameter:
%parse-param{GCState state}
. We also tell bison that the yylex function generated by flex takes an extra parameter hoding the state of the scanner:
%lex-param{void* handle_to_the_scanner}
.A #define is added to tell bison how to extract the handle to the scanner from the GCState:
#define handle_to_the_scanner (state->handle_to_the_scanner)

All in one

Here is the lexer gc.l:
%option prefix="gc_"
%option noyywrap
%option bison-bridge
%option reentrant

%{
#include <stdio.h>
#include <stdlib.h>
#include "gc.h"
#include "gc.tab.h" //generated by bison
%}

%%
\-\-.*\n ;/* ignore comments */

Genetic\-code\-table return GENETIC_CODE_TABLE;
name return NAME;
id return ID;
ncbieaa return NCBIEAA;
sncbieaa return SNCBIEAA;
[0-9]+ {yyget_lval(yyscanner)->integer=atoi(yyget_text(yyscanner)) ;return INTEGER;}
\:\:= return ASSIGN;
, return COMMA;
\{ return OPEN;
\} return CLOSE;
\"[^\"]+\" {
yyget_lval(yyscanner)->s= malloc(yyget_leng(yyscanner)-1);
if(yyget_lval(yyscanner)->s==NULL) exit(EXIT_FAILURE);
strncpy(yyget_lval(yyscanner)->s,&yytext[1],yyget_leng(yyscanner)-2);
return TEXT;
}
[\n\t ] ;//ignore blanks
. return yytext[0];

%%

void gcParse(FILE* in)
{
GCStateStruct ctx;
memset(&ctx,0,sizeof(GCStateStruct));
yyscan_t scanner;

if(yylex_init_extra(&ctx,&scanner)!=0)
{
return;
}
ctx.handle_to_the_scanner=&scanner;
yyset_in (in,scanner );
yyset_debug(1,scanner);
gc_parse(ctx);
yylex_destroy(scanner);
}
Here is the parser gc.y:

%{
#include <stdio.h>
#include <stdlib.h>
#include "gc.h"
%}

%union {
char* s;
int integer;
GeneticCode gCode;
}

%pure-parser
%name-prefix="gc_"
%defines
%error-verbose
%parse-param{GCState state}
%lex-param{void* handle_to_the_scanner}

%token GENETIC_CODE_TABLE NAME OPEN CLOSE COMMA ASSIGN NCBIEAA SNCBIEAA ID
%token<integer> INTEGER
%token<s> TEXT

%type<gCode> code
%type<gCode> codes
%type<s> optional_name
%start input

%{
void yyerror(GCState state,const char* message)
{
fprintf(stderr,"ERROR:%s\n",message);
}
#define handle_to_the_scanner (state->handle_to_the_scanner)

%}

%%

input: GENETIC_CODE_TABLE ASSIGN OPEN codes CLOSE
{
GeneticCode gc=$4;
fputs("Genetic-code-table ::= {",stdout);
while(gc!=NULL)
{
printf("{name \"%s\",", gc->name1);
if(gc->name2!=NULL) printf("name \"%s\",", gc->name2);
printf("id %d,ncbieaa \"%s\",sncbieaa \"%s\"}",
gc->id, gc->ncbieaa, gc->sncbieaa
);
if(gc->next!=NULL) fputc(',',stdout);
gc=gc->next;
}
fputc('}',stdout);
/** free memory here.... */
};

codes: code
{
$$=$1;
}
| codes COMMA code
{
$$=$3;
$$->next=$1;
}
;


code: OPEN
NAME TEXT COMMA
optional_name
ID INTEGER COMMA
NCBIEAA TEXT COMMA
SNCBIEAA TEXT
CLOSE
{
$$ = malloc(sizeof(GeneticCodeStruct));
if($$==NULL) exit(EXIT_FAILURE);
$$->name1=$3;
$$->name2=$5;
$$->id=$7;
$$->ncbieaa=$10;
$$->sncbieaa=$13;
}
;
optional_name:/* nothing */
{
$$=NULL;
}
| NAME TEXT COMMA
{
$$=$2;
}
%%

extern void gcParse(FILE* in);

int main(int argc,char** argv)
{
gcParse(stdin);
return 0;
}
and the file gc.h
#ifndef GENETIC_CODE_H
#define GENETIC_CODE_H

typedef struct geneticCode
{
char* name1;
char* name2;
int id;
char* ncbieaa;
char* sncbieaa;
struct geneticCode *next;
}GeneticCodeStruct,*GeneticCode;


typedef struct gcState
{
/** reference to the flex lexer */
void* handle_to_the_scanner;
}GCStateStruct,*GCState;

#endif
Compiling...
flex gc.l
bison gc.y
gcc gc.tab.c lex.gc_.c
Testing:
cat gc.ptr |./a.out | ./a.out |./a.out |./a.out

Genetic-code-table ::= {{name "Standard",name "SGC0",id 1,ncbieaa "FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",sncbieaa "---M---------------M---------------M----------------------------"},
{name "Vertebrate Mitochondrial",name "SGC1",id 2,ncbieaa "FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG",sncbieaa "--------------------------------MMMM---------------M------------"},
(...)
,{name "Thraustochytrium Mitochondrial",id 23,ncbieaa "FF*LSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",sncbieaa "--------------------------------M--M---------------M------------"}}

That's it.
Pierre


Parsing a genetic code using flex and bison. Part 1/2

In this post I'll show how to write a lexer and a parser using Flex and Bison for parsing the genetic codes at NCBI. I'll write a non-reentrant parser that is to say that the generated code contains some global variables and cannot be safely executed concurrently. The input file is an ASN1 file that looks like this:
--*************************************************************************
-- This is the NCBI genetic code table
-- (...)
--**************************************************************************
Genetic-code-table ::= {
{
name "Standard" ,
name "SGC0" ,
id 1 ,
ncbieaa "FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "---M---------------M---------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
},
{
(...)
} ,
{
name "Thraustochytrium Mitochondrial" ,
id 23 ,
ncbieaa "FF*LSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "--------------------------------M--M---------------M------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
}
}
Each genetic code will be stored as a linked list of C structures called GeneticCode:
file gc.h:
#ifndef GENETIC_CODE_H
#define GENETIC_CODE_H
typedef struct geneticCode
{
char* name1;
char* name2;
int id;
char* ncbieaa;
char* sncbieaa;
struct geneticCode *next;
}GeneticCodeStruct,*GeneticCode;
#endif
.The LEXER generated by flex converts the input into a sequence of tokens, and the parser generated by bison converts a grammar description into a C program.

Your browser does not support the <CANVAS> element !



The Parser

The grammar of Bison says that the lexical context might carry a string (the code ID) , a string (ncbieaa,ncbieaa...) or a GeneticCode:
%union {
char* s;
int integer;
GeneticCode gCode;
}
(...)
%token<integer> INTEGER
%token<s> TEXT
A genetic code is a series of tokens consisting in one or two names, the id, the ncbiaa and the sncbiaa strings. Every time this context is found, a new GeneticCode is allocated and its fields are filled:

%type<gCode> code
(...)
code: OPEN
NAME TEXT COMMA
optional_name
ID INTEGER COMMA
NCBIEAA TEXT COMMA
SNCBIEAA TEXT
CLOSE
{
$$ = malloc(sizeof(GeneticCodeStruct));
if($$==NULL) exit(EXIT_FAILURE);
$$->name1=$3;
$$->name2=$5;
$$->id=$7;
$$->ncbieaa=$10;
$$->sncbieaa=$13;
}
;
An optional name is 'nothing' or the following tokens: 'NAME TEXT COMMA'
optional_name:/* nothing */
{
$$=NULL;
}
| NAME TEXT COMMA
{
$$=$2;
}

A series of genetic codes is one genetic code, or a series of genetic codes followed by one genetic code:
codes: code
{
$$=$1;
}
| codes COMMA code
{
$$=$3;
$$->next=$1;
}
;
At last, the input constists in the tokens 'GENETIC_CODE_TABLE ASSIGN OPEN' and 'CLOSE' , surrounding a series of genetic codes. Each time, an input has been parsed, the linked list is scanned so each genetic code is printed to stdout.
input: GENETIC_CODE_TABLE ASSIGN OPEN codes CLOSE
{
GeneticCode gc=$4;
fputs("Genetic-code-table ::= {",stdout);
while(gc!=NULL)
{
printf("{name \"%s\",", gc->name1);
if(gc->name2!=NULL) printf("name \"%s\",", gc->name2);
printf("id %d,ncbieaa \"%s\",sncbieaa \"%s\"}",
gc->id, gc->ncbieaa, gc->sncbieaa
);
if(gc->next!=NULL) fputc(',',stdout);
gc=gc->next;
}
fputc('}',stdout);
/** free memory here.... */
};

The Lexer

The lexers breaks the input into tokens. In this lexer, comments are ignored:
\-\-.*\n ;/* ignore comments */
... keywords are returned
Genetic\-code\-table return GENETIC_CODE_TABLE;
name return NAME;
id return ID;
ncbieaa return NCBIEAA;
sncbieaa return SNCBIEAA;
\:\:= return ASSIGN;
, return COMMA;
\{ return OPEN;
\} return CLOSE;
... and the tokens carrying a context (integers, strings...) are decoded:
[0-9]+ {gc_lval.integer=atoi(yytext) ;return INTEGER;}
\"[^\"]+\" {
//copy the string without the quotes
gc_lval.s= malloc(yyleng-1);
if(gc_lval.s==NULL) exit(EXIT_FAILURE);
strncpy(gc_lval.s,&yytext[1],yyleng-2);
return TEXT;
}

All in one here is the flex file: gc.l
%option prefix="gc_"
%option noyywrap


%{
#include <stdio.h>
#include <stdlib.h>
#include "gc.h"
#include "gc.tab.h" //generated by bison
%}

%%
\-\-.*\n ;/* ignore comments */

Genetic\-code\-table return GENETIC_CODE_TABLE;
name return NAME;
id return ID;
ncbieaa return NCBIEAA;
sncbieaa return SNCBIEAA;
[0-9]+ {gc_lval.integer=atoi(yytext) ;return INTEGER;}
\:\:= return ASSIGN;
, return COMMA;
\{ return OPEN;
\} return CLOSE;
\"[^\"]+\" {
gc_lval.s= malloc(yyleng-1);
if(gc_lval.s==NULL) exit(EXIT_FAILURE);
strncpy(gc_lval.s,&yytext[1],yyleng-2);
return TEXT;
}
[\n\t ] ;//ignore blanks
. return yytext[0];

%%
... the bison file gc.y:
%{
#include <stdio.h>
#include <stdlib.h>
#include "gc.h"
%}

%union {
char* s;
int integer;
GeneticCode gCode;
}

%name-prefix="gc_"
%defines
%error-verbose

%token GENETIC_CODE_TABLE NAME OPEN CLOSE COMMA ASSIGN NCBIEAA SNCBIEAA ID
%token<integer> INTEGER
%token<s> TEXT

%type<gCode> code
%type<gCode> codes
%type<s> optional_name
%start input

%{
void yyerror(const char* message)
{
fprintf(stderr,"ERROR:%s\n",message);
}

%}

%%

input: GENETIC_CODE_TABLE ASSIGN OPEN codes CLOSE
{
GeneticCode gc=$4;
fputs("Genetic-code-table ::= {",stdout);
while(gc!=NULL)
{
printf("{name \"%s\",", gc->name1);
if(gc->name2!=NULL) printf("name \"%s\",", gc->name2);
printf("id %d,ncbieaa \"%s\",sncbieaa \"%s\"}",
gc->id, gc->ncbieaa, gc->sncbieaa
);
if(gc->next!=NULL) fputc(',',stdout);
gc=gc->next;
}
fputc('}',stdout);
/** free memory here.... */
};

codes: code
{
$$=$1;
}
| codes COMMA code
{
$$=$3;
$$->next=$1;
}
;


code: OPEN
NAME TEXT COMMA
optional_name
ID INTEGER COMMA
NCBIEAA TEXT COMMA
SNCBIEAA TEXT
CLOSE
{
$$ = malloc(sizeof(GeneticCodeStruct));
if($$==NULL) exit(EXIT_FAILURE);
$$->name1=$3;
$$->name2=$5;
$$->id=$7;
$$->ncbieaa=$10;
$$->sncbieaa=$13;
}
;
optional_name:/* nothing */
{
$$=NULL;
}
| NAME TEXT COMMA
{
$$=$2;
}
%%

int main(int argc,char** argv)
{
yyparse();
return 0;
}
Compiling:
flex gc.l
bison gc.y
gcc -o a.out gc.tab.c lex.gc_.c
Testing:
cat gc.ptr |./a.out | ./a.out |./a.out |./a.out

Genetic-code-table ::= {{name "Standard",name "SGC0",id 1,ncbieaa "FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",sncbieaa "---M---------------M---------------M----------------------------"},
{name "Vertebrate Mitochondrial",name "SGC1",id 2,ncbieaa "FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG",sncbieaa "--------------------------------MMMM---------------M------------"},
(...)
,{name "Thraustochytrium Mitochondrial",id 23,ncbieaa "FF*LSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",sncbieaa "--------------------------------M--M---------------M------------"}}

That's it.
Pierre

In the next post I'll write a reentrant parser using flex and bison.

07 December 2009

Playing with SOAP. Implementing a WebService for the LocusTree Server

Image via wikipediaIn a previous post I've described the LocusTree server and showed how the wsimport command can be used to generate the java code that will query a Web-Service. Today, I've implemented a few web services in the LocusTree server but I wrote the entire code generating the SOAP messages rather than using the Java API for Web Services (JAXWS-API) because 1) I wanted to learn about the SOAP internals 2) I wanted to return a big volume of data to the client by writing a stream of data rather than building a xml response and then echoing the xml tree (DOM).
Ok. In this example I'm going to describe a WebService returning a list of chromosomes for a given organism-id:

The WSDL file.

The signature of our function is something like: getChromosomesByOrganismId(int orgId). In the WSDL file (the file describing our web services), the operation is called getChromosomes . The input for this function will be a tns:getChromosomes and the object returned by this function is a tns:GetChromosomesResponse. The prefix 'tns' is a reference to a xml schema that is will to defined later.
<portType name="LocusTree">
<operation name="getChromosomes">
<input message="tns:getChromosomes"/>
<output message="tns:GetChromosomesResponse"/>
</operation>
</portType>

We now define those two messages (input and output parameters) for this web service. The input value (the organism-id) is defined in an external xml schema as an element named 'tns:getChromosomes'. The ouput value (a list of chromosomes) is defined in an external xml schema as an element named 'tns:Chromosomes'.
<message name="getChromosomes">
<part name="parameters" element="tns:getChromosomes"/>
</message>
<message name="GetChromosomesResponse">
<part name="parameters" element="tns:Chromosomes"/>
</message>

But where can we find this external schema ? It is referenced in the WSDL file under the <types> element. The following 'types' says that the schema describing our objects is available at 'http://localhost:8080//locustree/static/ws/schema.xsd'
<types>
<xsd:schema>
<xsd:import namespace="http://webservices.cephb.fr/locustree/" schemaLocation="http://localhost:8080//locustree/static/ws/schema.xsd"/>
</xsd:schema>
</types>
The Http protocol will be used to send and receive the SOAP messages, so we have to bind our method 'getChromosomesByOrganismsId' to this protocol.
<soap:binding transport="http://schemas.xmlsoap.org/soap/http"
style="document"/>
<operation name="getChromosomes">
<soap:operation/>
<input>
<soap:body use="literal"/>
</input>
<output>
<soap:body use="literal"/>
</output>
</operation>
</binding>
Finally, we tell the client where the server is located
<service name="LocusTreeService">
<port name="LocusTreeSePort" binding="tns:LocusTreePortBinding">
<soap:address location="http://localhost:8080//locustree/locustree/soap"/>
</port>
</service>

The XSD Schema

This XML schema describes the structures that will be used and returned by the server.We need to describe what is...

A Chromosome

A Chromosome is a structure holding an ID, a name, a length, an organism-id etc...
<xs:complexType name="Chromosome">
<xs:annotation>
<xs:documentation>A Chromosome</xs:documentation>
</xs:annotation>
<xs:sequence>
<xs:element name="id" type="xs:int" nillable="false" />
<xs:element name="organismId" type="xs:int" nillable="false" />
<xs:element name="name" type="xs:string" nillable="false"/>
<xs:element name="length" type="xs:int" nillable="false"/>
<xs:element name="metadata" type="xs:string"/>
</xs:sequence>
</xs:complexType>

A List of Chromosomes

.. is just a sequence of Chromosomes
<xs:complexType name="Chromosomes">
<xs:annotation>
<xs:documentation>Set of Chromosomes</xs:documentation>
</xs:annotation>
<xs:sequence>
<xs:element ref="tns:Chromosome" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>

GetChromosome

This is the structure that is used as a parameter for our web service. It just holds an organism-id.
<xs:complexType name="getChromosomes">
<xs:annotation>
<xs:documentation>return the chromosomes for a given organism </xs:documentation>
</xs:annotation>
<xs:sequence>
<xs:element name="organismId" type="xs:int" nillable="false">
<xs:annotation>
<xs:documentation>The Organism Id</xs:documentation>
</xs:annotation>
</xs:element>
</xs:sequence>
</xs:complexType>



All in one, here is the WSDL file:
<definitions
xmlns="http://schemas.xmlsoap.org/wsdl/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/"
xmlns:tns="http://webservices.cephb.fr/locustree/"
targetNamespace="http://webservices.cephb.fr/locustree/"
name="LocusTreeWebServices">

<types>
<xsd:schema>
<xsd:import namespace="http://webservices.cephb.fr/locustree/" schemaLocation="http://localhost:8080//locustree/static/ws/schema.xsd"/>
</xsd:schema>
</types>
<message name="getChromosomes">
<part name="parameters" element="tns:getChromosomes"/>
</message>
<message name="GetChromosomesResponse">
<part name="parameters" element="tns:Chromosomes"/>
</message>
<portType name="LocusTree">
<operation name="getChromosomes">
<input message="tns:getChromosomes"/>
<output message="tns:GetChromosomesResponse"/>
</operation>
</portType>
<binding name="LocusTreePortBinding" type="tns:LocusTree">
<soap:binding transport="http://schemas.xmlsoap.org/soap/http" style="document"/>
<operation name="getChromosomes">
<soap:operation/>
<input>
<soap:body use="literal"/>
</input>
<output>
<soap:body use="literal"/>
</output>
</operation>
</binding>
<service name="LocusTreeService">
<port name="LocusTreeSePort" binding="tns:LocusTreePortBinding">
<soap:address location="http://localhost:8080//locustree/locustree/soap"/>
</port>
</service>
</definitions>

...and the XSD/Schema file:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:tns="http://webservices.cephb.fr/locustree/"
targetNamespace="http://webservices.cephb.fr/locustree/"
elementFormDefault="qualified">


<xs:annotation>
<xs:documentation>XML schema for LocusTreeWebServices</xs:documentation>
</xs:annotation>

<xs:element name="Chromosomes" type="tns:Chromosomes"/>
<xs:element name="Chromosome" type="tns:Chromosome"/>
<xs:element name="getChromosomes" type="tns:getChromosomes"/>

<xs:complexType name="getChromosomes">
<xs:annotation>
<xs:documentation>return the chromosomes for a given organism </xs:documentation>
</xs:annotation>
<xs:sequence>
<xs:element name="organismId" type="xs:int" nillable="false">
<xs:annotation>
<xs:documentation>The Organism Id</xs:documentation>
</xs:annotation>
</xs:element>
</xs:sequence>
</xs:complexType>

<xs:complexType name="Chromosomes">
<xs:annotation>
<xs:documentation>Set of Organisms</xs:documentation>
</xs:annotation>
<xs:sequence>
<xs:element ref="tns:Chromosome" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>

<xs:complexType name="Chromosome">
<xs:annotation>
<xs:documentation>A Chromosome</xs:documentation>
</xs:annotation>
<xs:sequence>
<xs:element name="id" type="xs:int" nillable="false"/>
<xs:element name="organismId" type="xs:int" nillable="false"/>
<xs:element name="name" type="xs:string" nillable="false"/>
<xs:element name="length" type="xs:int" nillable="false"/>
<xs:element name="metadata" type="xs:string"/>
</xs:sequence>
</xs:complexType>


</xs:schema>

Generating the client

The stubs on the client side are generated using the ${JAVA_HOME}/bin/wsimport command:
> wsimport -keep http://localhost:8080/locustree/locustree/soap?wsdl
parsing WSDL...
generating code...
compiling code...
> find fr
fr/cephb/webservices/locustree/Chromosomes.java
fr/cephb/webservices/locustree/ObjectFactory.java
fr/cephb/webservices/locustree/LocusTreeService.java
fr/cephb/webservices/locustree/LocusTree.java
fr/cephb/webservices/locustree/GetChromosomes.java
fr/cephb/webservices/locustree/Chromosome.java
(...)
> more fr/cephb/webservices/locustree/LocusTree.java
package fr.cephb.webservices.locustree;
(...)
public interface LocusTree
{
(...)
public List<Chromosome> getChromosomes(int organismId);
}
Ok, the function was successfully generated. Let's test it with a tiny java program:

file Test.java
import fr.cephb.webservices.locustree.*;

public class Test
{
public static void main(String args[])
{
LocusTreeService service=new LocusTreeService();
LocusTree locustree=service.getLocusTreeSePort();
final int organismId=36;
for(Chromosome chrom:locustree.getChromosomes(organismId))
{
System.out.println(
chrom.getId()+"\t"+
chrom.getName()+"\t"+
chrom.getOrganismId()+"\t"+
chrom.getLength()
);
}

}
}

Compiling and running:
javac -cp . Test.java
java -cp . Test
1 chr1 36 247249719
2 chr2 36 242951149
3 chr3 36 199501827
4 chr4 36 191273063
5 chr5 36 180857866
6 chr6 36 170899992
7 chr7 36 158821424
8 chr8 36 146274826
9 chr9 36 140273252
10 chr10 36 135374737
11 chr11 36 134452384
12 chr12 36 132349534
13 chr13 36 114142980
14 chr14 36 106368585
15 chr15 36 100338915
(...)

SOAP internals


Here is the XML/SOAP query for getChromosomes sent to the sever via a POST query.
<S:Envelope xmlns:S="http://schemas.xmlsoap.org/soap/envelope/">
<S:Body>
<getChromosomes xmlns='http://webservices.cephb.fr/locustree/'>
<organismId>36</organismId>
</getChromosomes>
</S:Body>
</S:Envelope>
This can be checked using curl:
curl \
-X POST\
-H "Content-Type: text/xml" \
-d '<?xml version="1.0" ?>;<S:Envelope xmlns:S="http://schemas.xmlsoap.org/soap/envelope/"><S:Body><getChromosomes xmlns="http://webservices.cephb.fr/locustree/"><organismId>36</organismId></getChromosomes></S:Body></S:Envelope>' \
'http://localhost:8080/locustree/locustree/soap'
And here is the (my) response from the server:
<Envelope xmlns="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance" >
<Body>
<ceph:Chromosomes>
<ceph:Chromosome>
<ceph:id>1</ceph:id>
<ceph:organismId>36</ceph:organismId>
<ceph:name>chr1</ceph:name>
<ceph:length>247249719</ceph:length>
<ceph:metadata>{'type':'autosomal','size':247249719}</ceph:metadata>
</ceph:Chromosome>
<ceph:Chromosome>
<ceph:id>2</ceph:id>
<ceph:organismId>36</ceph:organismId>
<ceph:name>chr2</ceph:name>
<ceph:length>242951149</ceph:length>
<ceph:metadata>{'type':'autosomal','size':242951149}</ceph:metadata>
</ceph:Chromosome>
<ceph:Chromosome>
<ceph:id>3</ceph:id>
<ceph:organismId>36</ceph:organismId>
<ceph:name>chr3</ceph:name>
<ceph:length>199501827</ceph:length>
<ceph:metadata>{'type':'autosomal','size':199501827}</ceph:metadata>
</ceph:Chromosome>
<ceph:Chromosome>
<ceph:id>4</ceph:id>
<ceph:organismId>36</ceph:organismId>
<ceph:name>chr4</ceph:name>
<ceph:length>191273063</ceph:length>
<ceph:metadata>{'type':'autosomal','size':191273063}</ceph:metadata>
</ceph:Chromosome>
(...)
</ceph:Chromosomes>
</Body>
</Envelope>

On the server/servlet side I've decoded the SOAP query using javax.xml.soap.MessageFactory;. It looks like that
(...)
MimeHeaders headers=new MimeHeaders();
Enumeration<?> e=req.getHeaderNames();
(... copy the http headers to 'headers' ...);
SOAPMessage message=getMessageFactory().createMessage(headers,req.getInputStream());
SOAPBody body=message.getSOAPBody();
Iterator<?> iter=body.getChildElements();
while(iter.hasNext())
{
SOAPElement child =SOAPElement.class.cast(iter.next());
Name name= child.getElementName();
if(!name.getURI().equals(child.getNamespaceURI())) continue;
if(name.getLocalName().equals("getChromosomes"))
{
processGetChromosomes(w,message,child,req,res);
return;
}
}
And I'm streaming the response using the XML Streaming API (StaX).
(...)
w.writeStartElement(pfx, "Chromosomes", getTargetNamespace());
w.writeAttribute(XMLConstants.XMLNS_ATTRIBUTE,XMLConstants.XML_NS_URI,pfx,getTargetNamespace());
for(ChromInfo ci:model.getChromsomesByOrganismId(getTransaction(), organismId))
{
w.writeStartElement(pfx, "Chromosome", getTargetNamespace());
w.writeStartElement(pfx, "id", getTargetNamespace());
w.writeCharacters(String.valueOf(ci.getId()));
w.writeEndElement();
w.writeStartElement(pfx, "organismId", getTargetNamespace());
w.writeCharacters(String.valueOf(ci.getOrganismId()));
w.writeEndElement();
(...)
w.writeEndElement();
}
w.writeEndElement();(...)

And as a final note, I'll cite this tweet I received today from Paul Joseph Davis :-)


@yokofakun Everytime someone uses SOAP, an angel cries.
Mon Dec 07 16:50:41



That's it !
Pierre