Our research is primarily concerned with the development of statistical and machine learning methodologies with a particular focus on applications in the biomedical sciences.

A complete list of publications can be found on Google Scholar and our research software is available on Github


Tumour heterogeneity describes the genetic diversity both within (intra-tumour heterogeniety) and between tumours (inter-tumour heterogeneity). Genetic differences within and between tumours give rise to different disease outcomes and patient responses to therapies. Understanding and characterising tumour heterogeneity is therefore important in developing clinical approaches that are tailored to a patient (individualised medicine).

Our group is interested in developing statistical methods to analyse genome sequencing data that comes from heterogeneous tumour samples using advanced machine learning.

Read more about our cancer research and media coverage of our work:

Single Cell Sequencing

Advances in single cell technology now allow large-scale experimentation on single cells in a high-throughput fashion providing new insight into cellular function. Heterogeneity, both biological and technical, confound the simple interpretation of single cell data and sophisticated statistical methods are required to handle different sources of noise and signal.

Our group is working on approaches using Bayesian hierarchical modelling to integrate information from multiple sources across different spatiotemporal scales in order to better understand cellular function and dynamics.

Recent publications:

Bayesian Statistical Machine Learning

Bayesian methods are now ubiquitous in statistical modeling applications across a wide range of disciplines. A major challenge for Bayesian approaches is the significant computation required for exact inference in large models applied to big datasets (terabyte or more scale). For this type of data, Markov Chain Monte Carlo (MCMC) simulation approaches are not feasible despite recent advances in massively parallel computational hardware (e.g. graphics programming units or GPUs). Instead, it is necessary to develop approximate methods that are able to give “good” answers that, whilst not guaranteed to be exact or optimal, are sufficient for downstream decision processes and further scientific inquiry.

Our group is developing approximate inference methods to fit complex statistical models to large datasets that respect the practical computational and time limitations that govern real-life scientific studies. We are also interested in using decision-theoretic ideas to produce meaningful results for scientists via loss functions that are tailored to the task at hand.

Our recent work in this area has featured in the leading statistical and machine learning journals and conferences: