TY - JOUR
T1 - A computational framework to explore large-scale biosynthetic diversity
AU - Navarro-Muñoz, Jorge C.
AU - Selem-Mojica, Nelly
AU - Mullowney, Michael W.
AU - Kautsar, Satria A.
AU - Tryon, James H.
AU - Parkinson, Elizabeth I.
AU - De Los Santos, Emmanuel L.C.
AU - Yeong, Marley
AU - Cruz-Morales, Pablo
AU - Abubucker, Sahar
AU - Roeters, Arne
AU - Lokhorst, Wouter
AU - Fernandez-Guerra, Antonio
AU - Cappelini, Luciana Teresa Dias
AU - Goering, Anthony W.
AU - Thomson, Regan J.
AU - Metcalf, William W.
AU - Kelleher, Neil L.
AU - Barona-Gomez, Francisco
AU - Medema, Marnix H.
N1 - Publisher Copyright:
© 2019, The Author(s), under exclusive licence to Springer Nature America, Inc.
PY - 2020/1/1
Y1 - 2020/1/1
N2 - Genome mining has become a key technology to exploit natural product diversity. Although initially performed on a single-genome basis, the process is now being scaled up to mine entire genera, strain collections and microbiomes. However, no bioinformatic framework is currently available for effectively analyzing datasets of this size and complexity. In the present study, a streamlined computational workflow is provided, consisting of two new software tools: the ‘biosynthetic gene similarity clustering and prospecting engine’ (BiG-SCAPE), which facilitates fast and interactive sequence similarity network analysis of biosynthetic gene clusters and gene cluster families; and the ‘core analysis of syntenic orthologues to prioritize natural product gene clusters’ (CORASON), which elucidates phylogenetic relationships within and across these families. BiG-SCAPE is validated by correlating its output to metabolomic data across 363 actinobacterial strains and the discovery potential of CORASON is demonstrated by comprehensively mapping biosynthetic diversity across a range of detoxin/rimosamide-related gene cluster families, culminating in the characterization of seven detoxin analogues.
AB - Genome mining has become a key technology to exploit natural product diversity. Although initially performed on a single-genome basis, the process is now being scaled up to mine entire genera, strain collections and microbiomes. However, no bioinformatic framework is currently available for effectively analyzing datasets of this size and complexity. In the present study, a streamlined computational workflow is provided, consisting of two new software tools: the ‘biosynthetic gene similarity clustering and prospecting engine’ (BiG-SCAPE), which facilitates fast and interactive sequence similarity network analysis of biosynthetic gene clusters and gene cluster families; and the ‘core analysis of syntenic orthologues to prioritize natural product gene clusters’ (CORASON), which elucidates phylogenetic relationships within and across these families. BiG-SCAPE is validated by correlating its output to metabolomic data across 363 actinobacterial strains and the discovery potential of CORASON is demonstrated by comprehensively mapping biosynthetic diversity across a range of detoxin/rimosamide-related gene cluster families, culminating in the characterization of seven detoxin analogues.
KW - Actinobacteria/genetics
KW - Algorithms
KW - Biological Products
KW - Biosynthetic Pathways/genetics
KW - Cluster Analysis
KW - Computational Biology/methods
KW - Data Mining/methods
KW - Genome, Bacterial
KW - Genomics
KW - Metabolomics
KW - Microbiota
KW - Multigene Family
KW - Phylogeny
KW - Reproducibility of Results
KW - Software
UR - http://www.scopus.com/inward/record.url?scp=85075417325&partnerID=8YFLogxK
U2 - 10.1038/s41589-019-0400-9
DO - 10.1038/s41589-019-0400-9
M3 - Article
C2 - 31768033
SN - 1552-4450
VL - 16
SP - 60
EP - 68
JO - Nature Chemical Biology
JF - Nature Chemical Biology
IS - 1
ER -