Abstract
Despite the meteoric rise of single cell RNA-seq, only a few preprocessing pipelines exist that are able to perform all steps from the original fastq files to a gene expression table ready for further analysis. Here we present Sharq, a versatile preprocessing pipeline designed to work with plate-based 3'-end protocols that include Unique Molecular Identifiers (UMIs). Sharq performs stringent step-wise trimming of reads, assigns them to features according to a flexible hierarchical model, and uses the barcode and UMI information to avoid amplification biases and produce gene expression tables. Additionally, Sharq provides an extensive plate diagnostics report for quality control and troubleshooting, including that of spatial artefacts. The diagnostics report includes measures of the quality of the individual plate wells as well as a robust assessment which of them contain material from live cells. Collectively, the innovative approaches presented here provide a valuable tool for processing and quality control of single cell RNA-seq data.
Original language | English |
---|---|
Journal | bioRxiv |
DOIs | |
Publication status | Published - 2018 |
Keywords
- Barcode
- Bioinformatics
- Biology
- FASTQ format
- Gene expression
- Hierarchical database model
- Identifier
- RNA-Seq
- Trimming
- Troubleshooting