TY - JOUR
T1 - Optimized design and assessment of whole genome tiling arrays
AU - Gräf, Stefan
AU - Nielsen, Fiona G.G.
AU - Kurtz, Stefan
AU - Huynen, Martijn A.
AU - Birney, Ewan
AU - Stunnenberg, Henk
AU - Flicek, Paul
N1 - Funding Information:
The authors would like to thank Paul Bertone, Kyle Munn, Todd Richmond, Xinmin Zhang, Srinka Ghosh, Chris Davies, Vera van Noort and Lene M. Favrholdt for helpful discussions at several points during the project. This work is partially supported by the EU FP6 HEROIC project.
PY - 2007/7/1
Y1 - 2007/7/1
N2 - Motivation: Recent advances in microarray technologies have made it feasible to interrogate whole genomes with tiling arrays and this technique is rapidly becoming one of the most important high-throughput functional genomics assays. For large mammalian genomes, analyzing oligonucleotide tiling array data is complicated by the presence of non-unique sequences on the array, which increases the overall noise in the data and may lead to false positive results due to cross-hybridization. The ability to create custom microarrays using maskless array synthesis has led us to consider ways to optimize array design characteristics for improving data quality and analysis. We have identified a number of design parameters to be optimized including uniqueness of the probe sequences within the whole genome, melting temperature and self-hybridization potential. Results: We introduce the uniqueness score, U, a novel quality measure for oligonucleotide probes and present a method to quickly compute it. We show that U is equivalent to the number of shortest unique substrings in the probe and describe an efficient greedy algorithm to design mammalian whole genome tiling arrays using probes that maximize U. Using the mouse genome, we demonstrate how several optimizations influence the tiling array design characteristics. With a sensible set of parameters, our designs cover 78% of the mouse genome including many regions previously considered 'untilable' due to the presence of repetitive sequence. Finally, we compare our whole genome tiling array designs with commercially available designs.
AB - Motivation: Recent advances in microarray technologies have made it feasible to interrogate whole genomes with tiling arrays and this technique is rapidly becoming one of the most important high-throughput functional genomics assays. For large mammalian genomes, analyzing oligonucleotide tiling array data is complicated by the presence of non-unique sequences on the array, which increases the overall noise in the data and may lead to false positive results due to cross-hybridization. The ability to create custom microarrays using maskless array synthesis has led us to consider ways to optimize array design characteristics for improving data quality and analysis. We have identified a number of design parameters to be optimized including uniqueness of the probe sequences within the whole genome, melting temperature and self-hybridization potential. Results: We introduce the uniqueness score, U, a novel quality measure for oligonucleotide probes and present a method to quickly compute it. We show that U is equivalent to the number of shortest unique substrings in the probe and describe an efficient greedy algorithm to design mammalian whole genome tiling arrays using probes that maximize U. Using the mouse genome, we demonstrate how several optimizations influence the tiling array design characteristics. With a sensible set of parameters, our designs cover 78% of the mouse genome including many regions previously considered 'untilable' due to the presence of repetitive sequence. Finally, we compare our whole genome tiling array designs with commercially available designs.
UR - http://www.scopus.com/inward/record.url?scp=34547840248&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btm200
DO - 10.1093/bioinformatics/btm200
M3 - Article
C2 - 17646297
AN - SCOPUS:34547840248
SN - 1367-4803
VL - 23
SP - i195-i204
JO - Bioinformatics
JF - Bioinformatics
IS - 13
ER -