Our group has had a long association with the MRC/UVRI Uganda Research Unit on AIDS. Several years ago this resulted in a collaborative publication which estimated the age of the introduction of HIV strains of the two main subtypes in that country, A, and D (Yebra et al. 2015). The collaboration has since yielded publications on transmission networks in fishing villages (Bbosa et al. 2019), as well as the General Population Cohort in Masaka District (Ssemwanga et al. 2020).
This collaboration received a significant boost when together with UCL, Imperial College, the Sanger Institute and the Africa Centre we established the Pangea-HIV Consortium "Pangea_HIV", funded by the Bill and Melinda Gates Foundation with the specific aim of characterising HIV epidemics in sub-Saharan Africa in the same depth as we had been able to for the UK. This led to the generation of extensive datasets of near full-genome sequences. Heather Grant has been analysing the sequences generated in Pangea-HIV from samples submitted by UVRI. As expected, they were a mix of subtypes A and D, but there were far more recombinants than in the earlier studies based on the clinical pol fragment. Analysis of this large complex dataset occupied Heather for the first 2 years of her PhD studies and a major publication focusing on HIV recombinants appeared in 2020 from her work (Grant et al. 2020).
Concurrently with the work on the recombinants, a collection of samples obtained from the very early stages of the HIV epidemic in Uganda by Dr J Wilson Carswell re-appeared (thanks to Dr Pat Cane) after literally decades of storage at Porton Down. Initial attempts to generate full-length genomes within the Pangea-HIV programme were largely unsuccessful owing to the duration of storage but the capture-based sequencing approach of our colleague Prof Judy Breuer's laboratory at UCL has been highly successful at generating over 100 full-length genomes from these 35 year-old samples from across Uganda. This work has been greatly aided by Heather's fruitful collaboration with Dr Carswell himself. These samples will help to address the question of how HIV subtype D invaded an already infected population. First results have suggested that early subtype D sequences do not leave descendents among the subsequent samples but almost form a separate clade, unlike early subtype A strains which mix among recent strains in a phylogenetic analysis. It is possible that early subtype D strains were particularly virulent and over time this has been moderated.
We have also been collaborating with MRC/UVRI and UCSF to analyse sequences generated from dried blood spots collected during the "SEARCH" trial. This work is being undertaken by Emma Pujol Hodge, a student on the "Hosts Pathogens and Global Health" Doctoral Training Programme funded by the Wellcome Trust, now in the first year of her PhD studies. In later years we expect Emma will use modelling approaches to further investigate the impact of ART rollout in structured populations and will make use of another output of the PANGEA-HIV programme for that - the DSPS-HIV simulator.
One of the main goals of PANGEA_HIV was to use phylogenetic and molecular epidemiology techniques to better characterise HIV epidemics in sub-Saharan Africa. In order to evaluate the performance of current phylogenetic analyses at estimating epidemiological parameters, a comparison exercise was devised, based on computer simulations of HIV evolution within epidemics.
Using two separate models to simulate realistic HIV epidemics in an African setting, phylogenetic and sequence data was simulated according to a variety of different parameters, such as the number of infections imported from surrounding villages, the infectiousness during the acute stage, and the speed of treatment roll-out. This dataset was made available for research groups to analyse using their chosen method, and the resulting estimates were compared against the true values from the simulation (Ratman et al. 2016).
The Leigh Brown group provided one set of simulated HIV data for the comparison exercise. Samantha Lycett's stochastic, agent based model, the Discrete Spatial Phylo
Simulator (DSPS), was extensively modified by Emma Hodcroft to enable it to simulate realistic
HIV epidemics Discrete Spatial Phylo
Simulator-HIV (DSPS-HIV). The model calculates disease progression and transmission risk based on viral
load, population growth has been incorporated, and contact networks and treatment are highly customizable.
We have used the large scale population database of HIV sequences maintained by the UK Collaborative Group on HIV Drug Resistance to estimate the patterns of HIV transmission among different communities in the UK. The structure of the sexual contact network is a key issue in the epidemiology of sexually transmitted infections. As HIV is only transmitted with low efficiency compared to many STIs, the transmission network structure is more readily reconstructed from the viral genotypes than from interview data. Using the approach of molecular phylodynamics to analyse anonymized HIV genotypes from MSM (men who have sex with men) in a London clinic, we originally found that 25% of patients with a link to any other were linked to 6 or more individuals. In these clusters, almost 25% of transmissions occurred within 6 months of first infection (Lewis et al. 2008).
We extended this work using a phylodynamic approach to estimate the parameters of the network structure within which HIV is spreading among MSM, exploring the well-known "power law" effect in greater detail (Leigh Brown et al. 2011). We found that the distribution of cluster size ("degree distribution") is such that a randomly distributed intervention would never stop the epidemic. This level of detailed knowledge can provide important insights into delivery of interventions such as pre-exposure prophylaxis.
Moving to HIV-1 subtypes A and C, which in the UK are predominantly associated with heterosexual transmission, the picture was quite different. Large clusters were far less frequent and there was very little evidence of transmission in acute infection (Hughes et al. 2009).
The early analyses were labour intensive and to accommodate increasing dataset size, Sam Lycett and Manon Ragonnet automated the process by developing the Cluster Picker tool (Ragonnet et al, 2013). Applying this tool, now cited over 150 times, Manon found that the few large clusters that were previously found in these subtypes have arisen through "crossover" of these strains into MSM (Ragonnet-Cronin et al. 2015). In a later investigation of the MSM HIV epidemic in the UK, Manon discovered a distinct subset of individuals existed who self-identified as heterosexual but whose virus clustered exclusively with men. Her detailed analysis of the properties of these transmission clusters showed that these individuals ("potentially non-disclosed MSM", or pnMSM) were never central, but frequently found on the periphery of their clusters (Ragonnet-Cronin et al. 2018b). Finally, the largest single HIV outbreak in the UK for decades, which occurred among persons who inject drugs in Glasgow from 2014 onwards, was demonstrated by Manon to have a point-source origin and very rapid dynamics in a collaboration with colleagues from Glasgow NHS Board and Health Protection Scotland (Ragonnet-Cronin et al. 2018a).
There has been significant recent debate about the role of the viral genome in determining the rate of progression to AIDS and death. This has been studied using plasma viral load which provides a convenient and robust surrogate marker, leading to the claim that the "heritability" of virulence is high. Emma Hodcroft has been addressing this question using the sequences collected through the UK Collaborative Group on HIV Drug Resistance together with linked viral load data collected through the UK Collaborative HIV Cohort Study (UK CHIC). Emma has exploited the expertise on quantitative genetics available in Edinburgh to apply a novel approach to the question, allowing her to simultaneously analyse the genetic contribution of over 8000 viral genotypes to plasma viral load based on the relationships revealed by phylogenetic analysis of their sequences. This work, published in PLoS Pathogens (Hodcroft et al, 2014) showed that in fact viral genotype contributes relatively little to the variation in plasma viral load among infected individuals.
Since publishing her study Emma took part in the 2014 "3 Minute Thesis" challenge, and won the Edinburgh University competition from which she went on to take part in the UK national final and the Universitas 21 international final. Emma's 3 Minute Thesis presentation can be seen here.