TY - JOUR
T1 - Filtering, FDR and power
AU - van Iterson, Maarten
AU - Boer, Judith M.
AU - Menezes, Renée X.
N1 - Funding Information:
We thank P.A.C. ‘t Hoen, J.J. Goeman, J.T. den Dunnen, M. van de Wiel and W. van Wieringen for fruitful discussions, and comments and suggestions of two anonymous reviewers which helped clarify the text. This study was supported by a grant from the Centre for Medical Systems Biology within the framework of the Netherlands Genomics Initiative (NGI)/Netherlands Organisation for Scientific Research (NWO). This work is part of the BioRange bioinformatics research program supported by the Netherlands Bioinformatics Centre (NBIC).
PY - 2010/9/7
Y1 - 2010/9/7
N2 - Background: In high-dimensional data analysis such as differential gene expression analysis, people often use filtering methods like fold-change or variance filters in an attempt to reduce the multiple testing penalty and improve power. However, filtering may introduce a bias on the multiple testing correction. The precise amount of bias depends on many quantities, such as fraction of probes filtered out, filter statistic and test statistic used.Results: We show that a biased multiple testing correction results if non-differentially expressed probes are not filtered out with equal probability from the entire range of p-values. We illustrate our results using both a simulation study and an experimental dataset, where the FDR is shown to be biased mostly by filters that are associated with the hypothesis being tested, such as the fold change. Filters that induce little bias on the FDR yield less additional power of detecting differentially expressed genes. Finally, we propose a statistical test that can be used in practice to determine whether any chosen filter introduces bias on the FDR estimate used, given a general experimental setup.Conclusions: Filtering out of probes must be used with care as it may bias the multiple testing correction. Researchers can use our test for FDR bias to guide their choice of filter and amount of filtering in practice.
AB - Background: In high-dimensional data analysis such as differential gene expression analysis, people often use filtering methods like fold-change or variance filters in an attempt to reduce the multiple testing penalty and improve power. However, filtering may introduce a bias on the multiple testing correction. The precise amount of bias depends on many quantities, such as fraction of probes filtered out, filter statistic and test statistic used.Results: We show that a biased multiple testing correction results if non-differentially expressed probes are not filtered out with equal probability from the entire range of p-values. We illustrate our results using both a simulation study and an experimental dataset, where the FDR is shown to be biased mostly by filters that are associated with the hypothesis being tested, such as the fold change. Filters that induce little bias on the FDR yield less additional power of detecting differentially expressed genes. Finally, we propose a statistical test that can be used in practice to determine whether any chosen filter introduces bias on the FDR estimate used, given a general experimental setup.Conclusions: Filtering out of probes must be used with care as it may bias the multiple testing correction. Researchers can use our test for FDR bias to guide their choice of filter and amount of filtering in practice.
UR - http://www.scopus.com/inward/record.url?scp=77956479195&partnerID=8YFLogxK
U2 - 10.1186/1471-2105-11-450
DO - 10.1186/1471-2105-11-450
M3 - Article
C2 - 20822518
AN - SCOPUS:77956479195
SN - 1471-2105
VL - 11
JO - BMC Bioinformatics
JF - BMC Bioinformatics
M1 - 450
ER -