skip to primary navigationskip to content
 

Machine Learning and Complex Networks

Head of Group: Dr Gitanjali Yadav


yadav,gita

The current information revolution provides an opportunity for open source biology unlike ever before. We use comparative genomics and stochastic simulations as virtual experiments in which we mimic selected biological processes in silico to study their emergent properties. We are using this method for addressing complex problems in the area of synthetic phytochemistry and food security, as explained below.

 

Knowledge Based Synthetic Phytochemistry

The Plant Terpenome: Plants react to environmental challenges with a sophisticated and flexible chemical production line - the so called 'Terpenome', unique in its ability to create an incredibly diverse variety of bioactive compounds from a finite set of scaffolds, in the form of a chemical Lego-Kit. We are exploring the biosynthetic potential within plant genomes, to train a simulator for the combinatorial complexity associated with production of secondary metabolites. This will be achieved through a combination of genomics and structural bioinformatics, followed by development of machine learning approaches to predict the diversity of combinatorial products that can potentially be generated. This automated 'genotype to chemotype' simulator would provide insights into the evolution of phytochemical diversity in the plant kingdom, apart from feeding into new models of chemical modularity and phytomedicine via synthetic biology.

Invasive Species Genomics: We are particularly interested in exploring the role played by volatile organic compounds in mediating biological invasions by alien plant species, better known as the 'Novel Weapons Hypothesis'. Invasion genomics is a relatively new discipline that investigates patterns of genetic variation in invasive species and their ecological and evolutionary consequences. In particular, alteration in the rich suite of species specific phytochemical fingerprints between native and invaded habitats, despite being documented, has not been quantitatively assessed, nor linked to the ecological sophistication of invasive species. The role of heritable genomic indices of a species in invasion success has not even begun to be explored systematically. In collaboration with field ecologists, we are approaching invasion biology from a genomics standpoint, wherein we compare Phytochemical dynamics within and across habitats, in order to build models for understanding invasion success at both above-ground and plant-soil interaction level. The Cambridge University Herbarium offers an opportunity to expand this study beyond the British landscape, to cover very wide taxonomic, geographical and temporal scales through extensive herbarium collections.

Complex Network Analysis

Biological Network Perturbations: We have recently developed a method to explore resilience and susceptibility of complex biological systems through induction of a variety of random and targeted perturbations in hundreds of social, ecological and molecular networks, followed by investigation of their responses under attack. The method currently simulates single and grouped knock-out of nodes on undirected networks, and measures several topological measures of centrality after each perturbation. We are working towards improvement of this algorithm to simulate knock-downs using directed and weighted networks, in order to predict network robustness or vulnerability in real-world scenarios.

Gene Regulatory Networks: In collaboration with the Griffiths and Hibberd groups, under the TIGR2ESS program, we are exploring the connection between photosynthetic efficiency and sustainable crop productivity. The aim is to enhance photosynthesis by identifying components of the algal Carbon Concentrating Mechanism (CCM) for expression in higher plants. This is being investigated through comparative genomic analysis of biophysical CCM in the model green alga Chlamydomonas reinhardtii, to generate GRNs using available high throughput datasets such as sub-cellular proteomes, DNA-protein interactomes, as well as high resolution light and dark stage transcriptomes along the Chlamydomonas diurnal cycle.