TY - JOUR

T1 - optGpSampler

T2 - An improved tool for uniformly sampling the solution-space of genome-scale metabolic networks

AU - Megchelenbrink, Wout

AU - Huynen, Martijn

AU - Marchiori, Elena

PY - 2014/2/14

Y1 - 2014/2/14

N2 - Constraint-based models of metabolic networks are typically underdetermined, because they contain more reactions than metabolites. Therefore the solutions to this system do not consist of unique flux rates for each reaction, but rather a space of possible flux rates. By uniformly sampling this space, an estimated probability distribution for each reaction's flux in the network can be obtained. However, sampling a high dimensional network is time-consuming. Furthermore, the constraints imposed on the network give rise to an irregularly shaped solution space. Therefore more tailored, efficient sampling methods are needed. We propose an efficient sampling algorithm (called optGpSampler), which implements the Artificial Centering Hit-and-Run algorithm in a different manner than the sampling algorithm implemented in the COBRA Toolbox for metabolic network analysis, here called gpSampler. Results of extensive experiments on different genome-scale metabolic networks show that optGpSampler is up to 40 times faster than gpSampler. Application of existing convergence diagnostics on small network reconstructions indicate that optGpSampler converges roughly ten times faster than gpSampler towards similar sampling distributions. For networks of higher dimension (i.e. containing more than 500 reactions), we observed significantly better convergence of optGpSampler and a large deviation between the samples generated by the two algorithms. Availability: optGpSampler for Matlab and Python is available for non-commercial use at: http://cs.ru.nl/∼wmegchel/optGpSampler/.

AB - Constraint-based models of metabolic networks are typically underdetermined, because they contain more reactions than metabolites. Therefore the solutions to this system do not consist of unique flux rates for each reaction, but rather a space of possible flux rates. By uniformly sampling this space, an estimated probability distribution for each reaction's flux in the network can be obtained. However, sampling a high dimensional network is time-consuming. Furthermore, the constraints imposed on the network give rise to an irregularly shaped solution space. Therefore more tailored, efficient sampling methods are needed. We propose an efficient sampling algorithm (called optGpSampler), which implements the Artificial Centering Hit-and-Run algorithm in a different manner than the sampling algorithm implemented in the COBRA Toolbox for metabolic network analysis, here called gpSampler. Results of extensive experiments on different genome-scale metabolic networks show that optGpSampler is up to 40 times faster than gpSampler. Application of existing convergence diagnostics on small network reconstructions indicate that optGpSampler converges roughly ten times faster than gpSampler towards similar sampling distributions. For networks of higher dimension (i.e. containing more than 500 reactions), we observed significantly better convergence of optGpSampler and a large deviation between the samples generated by the two algorithms. Availability: optGpSampler for Matlab and Python is available for non-commercial use at: http://cs.ru.nl/∼wmegchel/optGpSampler/.

UR - http://www.scopus.com/inward/record.url?scp=84895896805&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0086587

DO - 10.1371/journal.pone.0086587

M3 - Article

C2 - 24551039

AN - SCOPUS:84895896805

SN - 1932-6203

VL - 9

JO - PLoS ONE

JF - PLoS ONE

IS - 2

M1 - e86587

ER -