Skip to main navigation Skip to search Skip to main content

Scaling bio-analyses from computational clusters to grids

  • Heorhiy Byelas
  • , Martijn Dijkstra
  • , Pieter Neerincx
  • , Freerk Van Dijk
  • , Alexandros Kanterakis
  • , Patrick Deelen
  • , Morris Swertz

Research output: Contribution to journalConference articlepeer-review

1 Citation (Scopus)

Abstract

Life sciences have moved rapidly into big data thanks to new parallel methods for gene expression, genomewide association, proteomics and whole genome DNA sequencing. The scale of these methods is growing faster than predicted by Moores law. This has introduced new challenges and needs for methods for specifying computation protocols for e.g. Next-Generation Sequencing (NGS) and genome-wide association study (GWAS) imputation analyses and running these on a large scale is a complicated task, due to the many steps involved, long runtimes, heterogeneous computational resources and large files. The process becomes error-prone when dealing with hundreds of samples, such as in genomic analysis facilities, if it is performed without an integrated workflow framework and data management system. From recent projects we learnt that bioinformaticians do not want to invest much time in learning advanced grid or cluster scheduling tools, preferring to concentrate on their analyses, be closer to old-fashion shell scripts that they can fully control and have some automatic mechanisms taking care of all submission and monitoring details. We present a lightweight workflow declaration and execution system to address these needs, built on top of the MOLGENIS framework for data tracking. We describe lessons learnt when scaling running NGS and imputation analyses from computational clusters to grids and show application of our solution, in particular, in the nation-wide "Genome of the Netherlands" project (GoNL, 700TB of data and about 200.000 computing hours).

Original languageEnglish
JournalCEUR Workshop Proceedings
Volume993
Publication statusPublished - 2013
Externally publishedYes
Event5th International Workshop on Science Gateways, IWSG 2013 - Zurich, Switzerland
Duration: 3 Jun 20135 Jun 2013

Fingerprint

Dive into the research topics of 'Scaling bio-analyses from computational clusters to grids'. Together they form a unique fingerprint.

Cite this