Viral genomic surveillance in the era of million-sequence trees
Year of award: 2024
Grantholders
Theo Sanderson
London School of Hygiene & Tropical Medicine, United Kingdom
Project summary
Pathogen genomes begin to reveal insights into spread and evolution when assembled into a phylogenetic tree. The SARS-CoV-2 pandemic has led to a new scale of viral genomic data that has challenged traditional computational tools for analysing and exploring trees, but which when harnessed offers striking new avenues for understanding pathogen biology: these virus sequences are the data from a natural experiment that reveal the fitness of tens of thousands of virus genotypes, and provide insight into the forces shaping viral evolution. In this project, we will expand a toolkit I have developed for large-scale phylogenetics in this new era of sequencing, and apply it to a range of viral pathogens. We will build regularly updated large trees for these species, both serving the research community and providing a platform for our own work. We will use deep-learning to transform these large-scale trees into maps of viral fitness landscapes, including modelling epistasis. Additionally we will use the trees to monitor the forces shaping virual evolution, including the impact of external factors like mutagenic drugs and host shifts. This will enable systematic surveillance of viral evolution, offering early warnings of significant changes that could affect global public health.