TY - JOUR
T1 - General power and sample size calculations for high-dimensional genomic data
AU - Van Iterson, Maarten
AU - Van De Wiel, Mark A.
AU - Boer, Judith M.
AU - De Menezes, Renée X.
N1 - Funding Information:
Acknowledgments: The work was conducted within the Centre for Medical Systems Biology (CMSB), established by the Netherlands Genomics initiative/Netherlands Organisation for Scientific Research (NGI/NWO) and part of the BioRange programme of the Netherlands Bioinformatics Centre (NBIC), which is supported by a BSIK grant through the Netherlands Genomics Initiative (NGI).
PY - 2013/8
Y1 - 2013/8
N2 - In the design of microarray or next-generation sequencing experiments it is crucial to choose the appropriate number of biological replicates. As often the number of differentially expressed genes and their effect sizes are small and too few replicates will lead to insufficient power to detect these. On the other hand, too many replicates unnecessary leads to high experimental costs. Power and sample size analysis can guide experimentalist in choosing the appropriate number of biological replicates. Several methods for power and sample size analysis have recently been proposed for microarray data. However, most of these are restricted to two group comparisons and require user-defined effect sizes. Here we propose a pilot-data based method for power and sample size analysis which can handle more general experimental designs and uses pilotdata to obtain estimates of the effect sizes. The method can also handle?2 distributed test statistics which enables power and sample size calculations for a much wider class of models, including high-dimensional generalized linear models which are used, e.g., for RNA-seq data analysis. The performance of the method is evaluated using simulated and experimental data from several microarray and next-generation sequencing experiments. Furthermore, we compare our proposed method for estimation of the density of effect sizes from pilot data with a recent proposed method specific for two group comparisons.
AB - In the design of microarray or next-generation sequencing experiments it is crucial to choose the appropriate number of biological replicates. As often the number of differentially expressed genes and their effect sizes are small and too few replicates will lead to insufficient power to detect these. On the other hand, too many replicates unnecessary leads to high experimental costs. Power and sample size analysis can guide experimentalist in choosing the appropriate number of biological replicates. Several methods for power and sample size analysis have recently been proposed for microarray data. However, most of these are restricted to two group comparisons and require user-defined effect sizes. Here we propose a pilot-data based method for power and sample size analysis which can handle more general experimental designs and uses pilotdata to obtain estimates of the effect sizes. The method can also handle?2 distributed test statistics which enables power and sample size calculations for a much wider class of models, including high-dimensional generalized linear models which are used, e.g., for RNA-seq data analysis. The performance of the method is evaluated using simulated and experimental data from several microarray and next-generation sequencing experiments. Furthermore, we compare our proposed method for estimation of the density of effect sizes from pilot data with a recent proposed method specific for two group comparisons.
KW - Density of effect-sizes
KW - Discrete inverse problem
KW - High-dimensional generalized linear models
KW - Non-negative Conjugate Gradients algorithm
UR - http://www.scopus.com/inward/record.url?scp=84881507695&partnerID=8YFLogxK
U2 - 10.1515/sagmb-2012-0046
DO - 10.1515/sagmb-2012-0046
M3 - Article
C2 - 23934609
AN - SCOPUS:84881507695
SN - 1544-6115
VL - 12
SP - 449
EP - 467
JO - Statistical Applications in Genetics and Molecular Biology
JF - Statistical Applications in Genetics and Molecular Biology
IS - 4
ER -