Scaling bio-analyses from computational clusters to grids

Heorhiy Byelas, Martijn Dijkstra, Pieter Neerincx, Freerk Van Dijk, Alexandros Kanterakis, Patrick Deelen, Morris Swertz

Onderzoeksoutput: Bijdrage aan tijdschriftCongresartikelpeer review

1 Citaat (Scopus)

Samenvatting

Life sciences have moved rapidly into big data thanks to new parallel methods for gene expression, genomewide association, proteomics and whole genome DNA sequencing. The scale of these methods is growing faster than predicted by Moores law. This has introduced new challenges and needs for methods for specifying computation protocols for e.g. Next-Generation Sequencing (NGS) and genome-wide association study (GWAS) imputation analyses and running these on a large scale is a complicated task, due to the many steps involved, long runtimes, heterogeneous computational resources and large files. The process becomes error-prone when dealing with hundreds of samples, such as in genomic analysis facilities, if it is performed without an integrated workflow framework and data management system. From recent projects we learnt that bioinformaticians do not want to invest much time in learning advanced grid or cluster scheduling tools, preferring to concentrate on their analyses, be closer to old-fashion shell scripts that they can fully control and have some automatic mechanisms taking care of all submission and monitoring details. We present a lightweight workflow declaration and execution system to address these needs, built on top of the MOLGENIS framework for data tracking. We describe lessons learnt when scaling running NGS and imputation analyses from computational clusters to grids and show application of our solution, in particular, in the nation-wide "Genome of the Netherlands" project (GoNL, 700TB of data and about 200.000 computing hours).

Originele taal-2Engels
TijdschriftCEUR Workshop Proceedings
Volume993
StatusGepubliceerd - 2013
Extern gepubliceerdJa
Evenement5th International Workshop on Science Gateways, IWSG 2013 - Zurich, Zwitserland
Duur: 3 jun. 20135 jun. 2013

Vingerafdruk

Duik in de onderzoeksthema's van 'Scaling bio-analyses from computational clusters to grids'. Samen vormen ze een unieke vingerafdruk.

Citeer dit