on December 13, 2010 by in Features of Taverna, Comments (3)

Lists and iterations

In many types of workflows you might find the need to perform iterations over items in a list. For instance, if the first service in your workflow connects to BioMart to retrieve all sequences in a certain region of a chromosome, and you want to connect this to a second service to perform a BLAST sequence alignment on each of the sequences, you’ll need Taverna to iterate over the returned sequences.

In the Biomart and BLAST workflow, the hsapiens_gene_ensembl BioMart service will return a list of genome sequences from a region on Human chromosome 22. After running the workflow, this can be inspected on the output port transcript_exon_intron. The next service blast_ddbj takes a single input query (and the constants for BLAST parameters program and database) and returns a BLAST report of performing a sequence alignment against the DDBJ rodent gene database.

 hsapiens_gene_ensembl to blast_ddbj

By simply connecting these services together, Taverna will recognize the depth mismatch between the list and the expected single argument, and perform implicit iteration. Taverna will execute blast_ddbj multiple times, once for each element in the list, and return a new list of BLAST reports to the text_blast_out output port.

While running the workflow and inspecting the text_blast_out output, you can see individual BLAST report appear as soon as they are returned. The Progress report shows how many iterations have been done, and how many are still queued.

Workflow results "text_blast_out" - List with 6 values, Value 5: BLASTN 2.2.4 (AUG-08-2010)  Reference: Altschul, Stephen F...

If provenance capture is enabled, it is possible to inspect the individual iterations by selecting the blast_ddbj service in the progress report. Each iteration is listed with its individual input and output values, together with the time and the duration of the invocation.

Progress report, blast_ddbj. Queued iterations 12, Iterations done 8, Average time 33.4s. Intermediate values for service blast_ddbj, iteration 6 started on 2010-12-13 11:48:19, ended 2010-12-31 11:48:52 (33.7s). Iteration 6 'query' value: "ATCCACTT..."

Pipelining

The list output that is created by this implicit iteration can be used as the basis for further iterations for the next steps in the workflow. Taverna takes advantage of individual service outputs being available before the full iteration is finished, pipelining the list items to start iterations over the next services downstream.

hsapiens_gene_ensembl to blast_ddbj.query. blsat_ddbj.result to Concatenate_gene_id.

This means that in the modified Biomart and BLAST with concatinated gene id worfklow, the Concatenate_gene_id local worker (which adds the Ensembl gene ID to the BLAST report) is iterating at the same time as blast_ddbj, processing each BLAST report as soon as it is available. This means that the overall execution of larger workflows can be much faster than if each iteration was done in isolation before starting the downstream iterations. Pipelining also allows you to see bits of the final results before the workflow is complete.

Configuring list handling

The Biomart and Blast with concatination workflow highlights an example of how Taverna deals with iterations over multiple input ports. In the Concatenate_gene_id service, both string1 and string2 receives a list while expecting a list. The default list handling Taverna will do is the so-called cross-product, which is to combine every string1 with every string2.

Cross product(1234,abc): 1a 1b 1c, 2a 2b 2c, 3a 3b 3c, 4a 4b 4c. Dot product(1234,abc): 1a, 2b, 3c, 4d

In this workflow that behaviour is not desirable, as it would combine every gene ID with every BLAST report. Instead the list handling on Concatenate_gene_id has been configured to perform a dot product, combining the first element of string1 with the first element of string2, second string1 with second string2, etc.

Details, List handling, Dot product: string2 string1

You can read more about Taverna’s lists and iterations in the Taverna documentation, including how to deal with more complex iterations such as combining cross and dot product.

Tags: , , , ,

3 Comments

  1. Looping | The Taverna Knowledge Blog

    December 13, 2010 @ 4:11 pm

    […] languages one often needs to iterate over a set of numbers, objects or strings. Taverna does these iterations implicitly; if you connect a service which outputs a list to a a service which input port expects a single […]

  2. Parallel service invocations | The Taverna Knowledge Blog

    December 13, 2010 @ 7:09 pm

    […] from the hsapiens_gene_ensembl service. As soon as the Ensembl service outputs its values, iterations over both Blast services will start in […]

  3. Analysing Quantitative Trait Loci data | The Taverna Knowledge Blog

    February 6, 2011 @ 2:22 pm

    […] format of an array, or list, of input values. More information on these data types can be found at: Lists and Iterations. The resulting output may be a list of outputs, with each output containing a list of gene or […]

Leave a comment

Login