Provenance and Intermediate Results
Provenance is all about remembering the past. What you did, why you did it and when it happened. In a scientific context you may need to provide results to back up claims in papers.
Taverna can be used to carry out in-silico experiments using hundreds, thousands or more data items. Keeping track of these is not feasible without some sort of help. The Taverna Provenance system is used to assist you and provide you with experimental ‘memory’
In Taverna this ‘memory’ is recorded automatically for you. Each of the ‘services’, ie the little boxes in the workflow diagram, keeps track of what goes through it and remembers all the lineage of the inputs and outputs. As a piece of data goes through it it flows through something called the ‘dispatch stack’. This stack controls how the service should behave regards the data, whether it should loop, retry, what is should do in the event of a failure and what it should do with provenance capture.
Accessing the previous results is very simple. You switch to the results panel, select the workflow you want to look at and click on the service you want to view the provenance for. Any inputs and outputs are then displayed using the standard results display. Taverna also keeps track of all the workflows you have run and lists them along with the run times to the left of the results panel. Clicking on one of them will display the diagram and allow you to view the results for any of its service.
Here is the results panel using http://www.myexperiment.org/workflows/1367.html
If we click on the concatenate two string service like this (you can see it highlighted when you click on it):
The provenance is fetched for that service and displayed in the bottom of the results panel:
It tells us what the results are for and when they were recorded. The red triangles are for inputs and the green triangle for output. Here we can see that the output for this ‘intermediate sevice’ was red cat. The results for each iteration are shown – for more info on iterations see here
To return to displaying the normal results click on the Show Workflow results button:
The Refresh Intermediate results button can be used during a workflow run. Click on a service to fetch the current intermediate results and then click on this button to fetch any more that have been captured by the provenance system.
The default is to record all data and its lineage in memory which means that it will not persist between taverna sessions. If you need it to persist then you need to open up the preferences menu and untick the in-memory storage option.
This means that it will persist to an apache derby database on the machine it is running on. You may notice some slow down of Taverna since using a disk for storage is nowhere near as fast as memory. The actual database can be accessed by looking in the taverna home directory. On a mac it will be /Users/your-user-name/Library/Application Support/taverna-2.2.0/t2-database. There have been rare occasions where this gets corrupted and this folder needs to be deleted.
You can delete the intermediate results for a workflow (or several) by selecting the workflow(s) in the workflow runs box (it will then load the workflow diagram first selected) and clicking remove:
Extending the Provenance system
If you wanted to store the provenance elsewhere you could create your own provenance layer and use it to, for example, send results to a blog, tweet about them, send them to a cloud store somewhere or something else entirely. To do this you would have to write your own plugin and create an implementation of the net.sf.taverna.t2.provenance.connector.net.sf.taverna.t2.provenance.ProvenanceConnector
Look at this code to see an example of the Taverna default.
This guide has lots of info on developing a Taverna plugin.
More detailed info about the underlying provenance model and APIs from here