TY - JOUR
T1 - CHETAH
T2 - a selective, hierarchical cell type identification method for single-cell RNA sequencing
AU - de Kanter, Jurrian K.
AU - Lijnzaad, Philip
AU - Candelli, Tito
AU - Margaritis, Thanasis
AU - Holstege, Frank C.P.
N1 - Publisher Copyright:
© The Author(s) 2019.
PY - 2019/9/19
Y1 - 2019/9/19
N2 - Cell type identification is essential for single-cell RNA sequencing (scRNA-seq) studies, currently transforming the life sciences. CHETAH (CHaracterization of cEll Types Aided by Hierarchical classification) is an accurate cell type identification algorithm that is rapid and selective, including the possibility of intermediate or unassigned categories. Evidence for assignment is based on a classification tree of previously available scRNA-seq reference data and includes a confidence score based on the variance in gene expression per cell type. For cell types represented in the reference data, CHETAH's accuracy is as good as existing methods. Its specificity is superior when cells of an unknown type are encountered, such as malignant cells in tumor samples which it pinpoints as intermediate or unassigned. Although designed for tumor samples in particular, the use of unassigned and intermediate types is also valuable in other exploratory studies. This is exemplified in pancreas datasets where CHETAH highlights cell populations not well represented in the reference dataset, including cells with profiles that lie on a continuum between that of acinar and ductal cell types. Having the possibility of unassigned and intermediate cell types is pivotal for preventing misclassification and can yield important biological information for previously unexplored tissues.
AB - Cell type identification is essential for single-cell RNA sequencing (scRNA-seq) studies, currently transforming the life sciences. CHETAH (CHaracterization of cEll Types Aided by Hierarchical classification) is an accurate cell type identification algorithm that is rapid and selective, including the possibility of intermediate or unassigned categories. Evidence for assignment is based on a classification tree of previously available scRNA-seq reference data and includes a confidence score based on the variance in gene expression per cell type. For cell types represented in the reference data, CHETAH's accuracy is as good as existing methods. Its specificity is superior when cells of an unknown type are encountered, such as malignant cells in tumor samples which it pinpoints as intermediate or unassigned. Although designed for tumor samples in particular, the use of unassigned and intermediate types is also valuable in other exploratory studies. This is exemplified in pancreas datasets where CHETAH highlights cell populations not well represented in the reference dataset, including cells with profiles that lie on a continuum between that of acinar and ductal cell types. Having the possibility of unassigned and intermediate cell types is pivotal for preventing misclassification and can yield important biological information for previously unexplored tissues.
UR - http://www.scopus.com/inward/record.url?scp=85073313004&partnerID=8YFLogxK
U2 - 10.1093/NAR/GKZ543
DO - 10.1093/NAR/GKZ543
M3 - Article
C2 - 31226206
AN - SCOPUS:85073313004
SN - 0305-1048
VL - 47
SP - E95
JO - Nucleic Acids Research
JF - Nucleic Acids Research
IS - 16
ER -