Abstract
Life sciences have moved rapidly into big data thanks to new parallel methods for gene expression, genomewide association, proteomics and whole genome DNA sequencing. The scale of these methods is growing faster than predicted by Moores law. This has introduced new challenges and needs for methods for specifying computation protocols for e.g. Next-Generation Sequencing (NGS) and genome-wide association study (GWAS) imputation analyses and running these on a large scale is a complicated task, due to the many steps involved, long runtimes, heterogeneous computational resources and large files. The process becomes error-prone when dealing with hundreds of samples, such as in genomic analysis facilities, if it is performed without an integrated workflow framework and data management system. From recent projects we learnt that bioinformaticians do not want to invest much time in learning advanced grid or cluster scheduling tools, preferring to concentrate on their analyses, be closer to old-fashion shell scripts that they can fully control and have some automatic mechanisms taking care of all submission and monitoring details. We present a lightweight workflow declaration and execution system to address these needs, built on top of the MOLGENIS framework for data tracking. We describe lessons learnt when scaling running NGS and imputation analyses from computational clusters to grids and show application of our solution, in particular, in the nation-wide "Genome of the Netherlands" project (GoNL, 700TB of data and about 200.000 computing hours).
| Original language | English |
|---|---|
| Journal | CEUR Workshop Proceedings |
| Volume | 993 |
| Publication status | Published - 2013 |
| Externally published | Yes |
| Event | 5th International Workshop on Science Gateways, IWSG 2013 - Zurich, Switzerland Duration: 3 Jun 2013 → 5 Jun 2013 |
Fingerprint
Dive into the research topics of 'Scaling bio-analyses from computational clusters to grids'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver