on November 29, 2010 by Katy Wolstencroft in Introduction, Comments (0)

Preface to the Taverna Knowledge-Blog

The volume of biological data available in the public domain is growing continuously [1] and has dramatically altered the fields of bioinformatics and biology. Researchers have moved from studying single genes or proteins, to studying whole genomes, proteomes and biological systems [2].

In order to perform experiments in large-scale, high-throughput areas of research, such as, proteomics, microarray analysis and Systems Biology, scientists must be able to access, analyse and co-ordinate the available network of data and analysis resources. Co-ordinations of these data and analysis tools form experiments on the data; they often provide hypotheses through the exploration of distributed and heterogeneous data [3].

Workflow management systems, such as Taverna, Kepler and Pipeline Pilot, support these in silico experiments by allowing access to distributed and heterogeneous resources from the scientist’s desktop, automating the chaining together of complex analyses over complex datasets. In addition, they can support the scientist in locating suitable resources and optimising the efficiency of long-running computationally intensive tasks.

Workflows exist in the wider context of scientific data management. Scientists need to retrospectively analyse workflow results and compare different workflow invocations with one another. Consequently, many workflow management systems also provide provenance collection components, so data carries with it a record of how and why it was produced.

There are several textbooks that discuss the use of workflows in science, but these text books address this issue from a technological point of view, from the perspective of the computer scientist. This Knowledge Blog (K-blog), however, focuses on the use of Taverna workflows from the perspective of the end-user, the bioinformaticians analysing their data. It also focuses on the practical implications of conducting workflow experiments using concrete examples from the bioinformatics research community.

Inspired from the authors’ own experiences with using workflows and semantic web technologies for managing bioinformatics research, these K-blogs are a guide to understanding the principles and techniques for building large-scale workflow experiments as well as analysing and managing their resulting data.

These K-Blogs are aimed at people in the field of bioinformatics, but other readers with life-science or computational backgrounds will also find it accessible, due to the combination of theory and practical examples throughout.

Using workflow technologies for bioinformatics is an emerging methodology. This K-Blog will serve as a ‘handbook’ or reference manual for people exploring their use in research and also as a “textbook” for an intermediate level class in bioinformatics.

This book is applicable to people designing and executing workflow experiments, bioinformatics service providers who wish to offer their services as web services, and people who would like to embed workflows in existing tools or tools into workflows.

How to use this knowledge blog

The Taverna K-Blog forms a collection of short articles that describe:

  • The background to Taverna [4] and related projects such as BioCatalogue and myExperiment;

  • Guides to creating workflows;

  • Case studies in the use of workflows;

  • Features of the Taverna workflow workbench and associated tools.

Each article has a “see also” section that will guide the reader to related articles, which should all be inter-linked in other ways. Articles are tagged and searchable to help readers find the information they need.

The Taverna K-Blog is a growing resource. We welcome comments and input, and we welcome offers of articles. As the K-Blog is in ongoing development, we also welcome suggests and requests for articles that will most help the Taverna K-Blog audience.

[1] G.R. Cochrane, and M.Y. Galperin, The 2010 Nucleic Acids Research Database Issue and online Database Collection: a community of data resources. Nucleic Acids Res 38 D1-4.

[2] P. Kohl, E.J. Crampin, T.A. Quinn, and D. Noble, Systems biology: an approach. Clin Pharmacol Ther 88 25-33.

[3] C. Goble, R. Stevens, D. Hull, K. Wolstencroft, and R. Lopez, Data curation + process curation=data integration + science. Brief Bioinform 9 (2008) 506-17.

[4] D. Hull, K. Wolstencroft, R. Stevens, C. Goble, M.R. Pocock, P. Li, and T. Oinn, Taverna: a tool for building and running workflows of services. Nucleic Acids Res 34 (2006) W729-32.

No Comments

Leave a comment