TY - JOUR
T1 - Evaluating experimental bias and completeness in comparative phosphoproteomics analysis
AU - Boekhorst, Jos
AU - Boersema, Paul J.
AU - Tops, Bastiaan B.J.
AU - van Breukelen, Bas
AU - Heck, Albert J.R.
AU - Snel, Berend
N1 - Funding Information:
This work was carried out within the research programme of the Netherlands Consortium for Systems Biology (NCSB), which is part of the Netherlands Genomics Initiative/Netherlands Organization for Scientific Research, and within the research programme of the Centre of BioSystems Genomics (CBSG), which is part of the Netherlands Genomics Initiative/Netherlands Organization for Scientific Research.
PY - 2011
Y1 - 2011
N2 - Unraveling the functional dynamics of phosphorylation networks is a crucial step in understanding the way in which biological networks form a living cell. Recently there has been an enormous increase in the number of measured phosphorylation events. Nevertheless, comparative and integrative analysis of phosphoproteomes is confounded by incomplete coverage and biases introduced by different experimental workflows. As a result, we cannot differentiate whether phosphosites indentified in only one or two samples are the result of condition or species specific phosphorylation, or reflect missing data. Here, we evaluate the impact of incomplete phosphoproteomics datasets on comparative analysis, and we present bioinformatics strategies to quantify the impact of different experimental workflows on measured phosphoproteomes. We show that plotting the saturation in observed phosphosites in replicates provides a reproducible picture of the extent of a particular phosphoproteome. Still, we are still far away from a complete picture of the total human phosphoproteome. The impact of different experimental techniques on the similarity between phosphoproteomes can be estimated by comparing datasets from different experimental pipelines to a common reference. Our results show that comparative analysis is most powerful when datasets have been generated using the same experimental workflow. We show this experimentally by measuring the tyrosine phosphoproteome from Caenorhabditis elegans and comparing it to the tyrosine phosphoproteome of HeLa cells, resulting in an overlap of about 4%. This overlap between very different organisms represents a three-fold increase when compared to dataset of older studies, wherein different workflows were used. The strategies we suggest enable an estimation of the impact of differences in experimental workflows on the overlap between datasets. This will allow us to perform comparative analyses not only on datasets specifically generated for this purpose, but also to extract insights through comparative analysis of the ever-increasing wealth of publically available phosphorylation data.
AB - Unraveling the functional dynamics of phosphorylation networks is a crucial step in understanding the way in which biological networks form a living cell. Recently there has been an enormous increase in the number of measured phosphorylation events. Nevertheless, comparative and integrative analysis of phosphoproteomes is confounded by incomplete coverage and biases introduced by different experimental workflows. As a result, we cannot differentiate whether phosphosites indentified in only one or two samples are the result of condition or species specific phosphorylation, or reflect missing data. Here, we evaluate the impact of incomplete phosphoproteomics datasets on comparative analysis, and we present bioinformatics strategies to quantify the impact of different experimental workflows on measured phosphoproteomes. We show that plotting the saturation in observed phosphosites in replicates provides a reproducible picture of the extent of a particular phosphoproteome. Still, we are still far away from a complete picture of the total human phosphoproteome. The impact of different experimental techniques on the similarity between phosphoproteomes can be estimated by comparing datasets from different experimental pipelines to a common reference. Our results show that comparative analysis is most powerful when datasets have been generated using the same experimental workflow. We show this experimentally by measuring the tyrosine phosphoproteome from Caenorhabditis elegans and comparing it to the tyrosine phosphoproteome of HeLa cells, resulting in an overlap of about 4%. This overlap between very different organisms represents a three-fold increase when compared to dataset of older studies, wherein different workflows were used. The strategies we suggest enable an estimation of the impact of differences in experimental workflows on the overlap between datasets. This will allow us to perform comparative analyses not only on datasets specifically generated for this purpose, but also to extract insights through comparative analysis of the ever-increasing wealth of publically available phosphorylation data.
UR - http://www.scopus.com/inward/record.url?scp=80051516310&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0023276
DO - 10.1371/journal.pone.0023276
M3 - Article
C2 - 21853102
AN - SCOPUS:80051516310
SN - 1932-6203
VL - 6
JO - PLoS ONE
JF - PLoS ONE
IS - 8
M1 - e23276
ER -