on December 13, 2010 by Stian Soiland-Reyes in Features of Taverna, Comments (0)

Looping

Taverna workflows are inherently data-driven workflows, where data returned from one service is pushed directly to downstream services. A Taverna workflow definition does not linearly say when a service should be invoked, but where its data should come from and go to. This philosophy lets the user focus on how services are connected together, and the Taverna execution takes care of invoking services as soon as the required inputs are ready.

In iterative and object oriented programming languages one often needs to iterate over a set of numbers, objects or strings. Taverna does these iterations implicitly; if you connect a service which outputs a list to a a service which input port expects a single item, implicit iteration will invoke the second service for each of the elements of the list, and create new lists on the outputs.

There are however situations where you don’t have or know the values to iterate over, but where you want some steps of your workflow to be repeated until a certain condition is true, like a do...while construct in programming. 

When could looping be required in a workflow?

One typical use case for when you need looping is for invoking asynchronous services, that is a web service or similar where you have a pattern of first submitting the job with input parameters, which returns you a job ID, secondly, check the status of the job using that ID. You need to keep checking the status of the job as long as the job is in an active state (running), and finally when the job is in a final state, you get the results for the given job ID. The EBI Interproscan example workflow shows how this can be used in practice for the asynchronous EBI interproscan service.

Simplified the typical workflow pattern is:

  1. Submit job – returns jobID
  2. Check status of jobID until state is “complete”
  3. Get result of jobID

 

This example workflow simulates this asynchronous service pattern.

Using loops in Taverna

To achieve this pattern in Taverna, a control link is added from getResults to checkStatus. This ensures getResults is not executed until checkStatus has completed, even if no explicit datalink connects the two services.

Secondly, looping has been configured for the checkStatus service so that the service is re-invoked until the output is satisfactory. A delay is normally set to avoid overloading the web service.

Taverna’s documentation on loops details how to use looping in your workflow, including other types of comparisons such as numbers and regular expressions. It is also possible to customize the looping condition with a Beanshell script for more advanced comparisons.

Looping over nested workflows

As nested workflows are also services in Taverna, it is also possible to create looping over several services at once. This can be useful for more complex scenarios with multiple services which needs to be executed again.

Nested workflow looping is also useful for iterative approaches where you want to perform an analysis, modify a parameter and then try again until a quality assessment determines the results to be “good enough”.

As in software programming it is very important to remember that looping needs a base case to avoid looping forever, so something will have to change, either an external state (like the job status in checkStatus) or modified input parameters which produces different outputs.

Taverna can do this kind of looping by ticking Enable output port to input port feedback in the looping configuration.

In each iteration, the service input ports (which names match a service output port) will be populated with the output from the previous iteration. In this example workflow for calculating square roots, the output from first iteration’s root will be used as the new input root on the second iteration. The initial_root string constant is required for the initial iteration, while the constant number is used for every invocation as there is no nested workflow output port called number.

It is important that all service inputs and outputs are connected in the parent workflow in order for the looping mechanism to be able to inspect and feedback the values.

Nested workflows can also themselves determine if they should be looped, by adding an output port called say looping, and test for this output port in the loop configuration. You can connect this output to a Beanshell script or web service which can determine if the nested workflow should be executed again.

Tags: , , ,

No Comments

Leave a comment

Login