The group actively develops novel statistical and computational methods for analysing large datasets ("Big Data") with a particular focus on genomics and the use of Bayesian methods.
Tumour heterogeneity describes the genetic diversity both within (intra-tumour heterogeniety) and between tumours (inter-tumour heterogeneity). Genetic differences within and between tumours give rise to different disease outcomes and patient responses to therapies. Understanding and characterising tumour heterogeneity is therefore important in developing clinical approaches that are tailored to a patient (individualised medicine).
Our group is interested in developing statistical methods to analyse genome sequencing data that comes from heterogeneous tumour samples using advanced machine learning.
In particular, we have been working extensively with the Ovarian Cancer Laboratory at the University of Oxford, headed by Professor Ahmed Ahmed.
Read our research paper and media coverage of our work:
Single Cell Genomics
Advances in single cell technology now allow large-scale experimentation on single cells in a high-throughput fashion providing new insight into cellular function.
Heterogeneity, both biological and technical, confound the simple interpretation of single cell data and sophisticated statistical methods are required to handle different sources of noise and signal.
Our group is working on approaches using Bayesian hierarchical modelling to integrate information from multiple sources across different spatiotemporal scales in order to better understand cellular function and dynamics.
Statistical methodology for Big Data
Bayesian methods are now ubiquitous in statistical modeling applications across a wide range of disciplines. A major challenge for Bayesian approaches is the significant computation required for exact inference in large models applied to big datasets (terabyte or more scale). For this type of data, Markov Chain Monte Carlo (MCMC) simulation approaches are not feasible despite recent advances in massively parallel computational hardware (e.g. graphics programming units or GPUs). Instead, it is necessary to develop approximate methods that are able to give "good" answers that, whilst not guaranteed to be exact or optimal, are sufficient for downstream decision processes and further scientific inquiry.
Our group is developing approximate inference methods to fit complex statistical models to large datasets that respect the practical computational and time limitations that govern real-life scientific studies. We are also interested in using decision-theoretic ideas to produce meaningful results for scientists via loss functions that are tailored to the task at hand.
Our recent work in this area has featured in the leading statistical and machine learning journals and conferences:
Summary We developed a statistical method for the characterization of genomic aberrations in single nucleotide polymorphism microarray data acquired from cancer genomes. Our approach allows us to model the joint effect of polyploidy, normal DNA contamination and intra-tumour heterogeneity within a single unified Bayesian framework. We demonstrate the efficacy of our method on numerous datasets including laboratory generated mixtures of normal-cancer cell lines and real primary tumours. Joint work with Oliver Sieber.
Summary This paper presents a case study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. Joint work with Anthony Lee.
Summary In this paper we proposed a decision theoretic approach for identifying optimal segmentations in Hidden Markov models under a novel class of Markov loss functions. The result is generic and applicable to any probabilistic model on a sequence, such as Hidden Markov models, change point or product partition models. Joint work with Chris Holmes.
Summary In this work, we proposed a flexible non-parametric specification of the emission distribution for a Hidden Markov model and introduced a novel methodology for carrying out the computations. Whereas current approaches use a finite mixture model, we argue in favour of an infinite mixture model given by a mixture of Dirichlet processes to provide increased robustness to noise distributions that are not adequately captured by standard parametric distributions. Joint work with Omiros Papaspiliopoulos.