Proposal
Project Title: clinAIcan - developing clinical applications of artificial intelligence for cancer
Track Record [max 2 pages]
I am an Artificial Intelligence (AI) researcher with a focus on developing computational and statistical methodologies for problems within the biomedical and health sciences.
Research Contributions. In Statistics and Machine Learning, my major contributions have included i) expanding the statistical toolkit for Markov models, ii) Bayesian nonparametric latent variable modelling for high-dimensional data and iii) novel methodology for bioinformatics. A significant feature of my work is the synthesis of statistical and machine learning principles. I routinely published in both the major machine learning conferences (NeurIPS, ICML, AISTATS) [@aueb2014hamming; @law2017testing; @rukat2017bayesian; @rukat2018probabilistic; @martens2019decomposing; @martens2019ens; @martens2020nd; @martens2020trans] as well as the most prominent statistical journals (JASA, JRSSB) [@yau2011bayesian; @titsias2016statistical; @titsias2017hamming] cross-fertilising ideas from both domains. These accomplishments have been made while my primary role has been as a Principal Investigator within medical faculties with responsibilities for a genomic medicine research programme that studies human disease - in particular, cancer. This has involved supervision of experimental scientists, experimental design as well as direct funding of these efforts. I have therefore developed an array of integrated collaborations with biological and clinical research groups within which my team has been embedded to carry out this work.
Academic Citizenship. For the last five years, I have focused on my young family which has limited my availability for international travel and collaborations. During this period I have focused on supporting early career researchers in my group and to maximise my national identity through participation in a number of UK initiatives. This platform gives me a strong basis to utilise this Fellowship to accelerate my progress to an international leader in AI.
Genomics England. I founded and co-lead the Quantitative Methods, Machine Learning and Functional Genomics Clinical Interpretation Partnership (GeCIP) for the Genomics England (GEL) 100,000 Genomes Project (100KGP) with Professor Martin Tobin (Leicester). This involves coordinating and reviewing 100KGP research activities across a national network of researchers on behalf of GEL as well as membership of the scientific steering committee giving me deep insight into the developing genomic medicine landscape in England. My work also involves advocacy for genomic medicine, for example, I have recently contributed to the PHG Foundation report on "Artificial intelligence for genomic medicine" as well as organising an upcoming workshop on Machine Learning and AI with GSK (due to held in July 2020).
Health Data Research UK/Alan Turing Institute. In 2019, I co-led (with Professor Peter Diggle) a successful application on behalf of the Health Data Research UK (HDRUK) and The Alan Turing Institute (Turing) for a Wellcome PhD Programme in Health Data Science. This is a unique multi-institutional PhD programme1 that is founded on HDRUK’s One Institute vision of collective and collaborative working in health data science. I am work package co-lead for artificial intelligence and multimodal data integration in the HDRUK Multi-omics Consortium with Dr Michael Inouye (Cambridge). I am also Principal Investigator of a Turing Health Programme project, funded by a Strategic Priorities Fund AI for Science and Government award, which provides matched support for the work.
FLIER. I am a member of the Taskforce, consisting of senior academic, government, NHS and industry representatives, that developed the Academy of Medical Sciences (AMS) Future Leaders in Enterprise and Research (FLIER) Leadership Programme. Here, I was able to utilise my analytical and influencing skills to shape the format of the programme, challenging our senior leaders to improve their understanding of the needs of those in middle career stages and to boost the inclusivity and diversity ambitions of this programme, in particular, by advocating the needs of women in science.
Other Activities. I am a member of the steering group for the international CONSORT/SPIRIT-AI consortium where we are developing new guidelines for the evaluation of AI-driven health interventions in randomised control trials [@liu2019reporting]. I am applying knowledge from this initiative as part of my contribution to the Phase 4 review panel for the AI in Health and Care Award run by the AAC in partnership with NHSX and NIHR. I also sit on the MRC Methodology Research Panel where I have contributed to the development of MRC’s data science strategy for its next funding cycle.
Researcher Development. I have been fortunate to have trained and mentored a fantastic group of students and postdoctoral fellows in the last 8 years. Many have gone on to substantially AI and/or Data Science related careers including in industry (2), independent research fellows (4) or with academic faculty-level appointments (2). I am a strong believer in developing early independence opportunities for my research team. I also sit on the Turing Institute Research Training and Development Panel and and Advisory Group and act as a regular Academic Mentor for Doctoral Students.
Research environment. I was recently appointed as Chair in Artificial Intelligence in Manchester as part of a university initiative to substantially increase AI capability. This growth plan has also seen the parallel recruitment of Professor Samuel Kaski, who has joined Manchester as part of a joint professorship with Aalto University. Prof Kaski is Director of the Artificial Intelligence Center of Finland which hosts a European Laboratory for Learning and Intelligent Systems (ELLIS) unit. Together with Professor Magnus Rattray (academic mentor, see Institutional Support Letter), Director of the Manchester Institute for Data Science and AI and ELLIS Health Programme Fellow, we have ambitious plans to make Manchester one of the most exciting UK hubs for AI activity, particularly in Health. For example, the Christabel Pankhurst Institute for Health Technology, is a new £25M investment between the University and regional partners to provide a technology translation and development hub for digital health which will be led by Prof Kaski.
In cancer, the Manchester Cancer Research Centre (MCRC) (its Director Professor Rob Bristow will be an academic mentor for my fellowship, see Institutional Support Letter) is a joint initiative between The University of Manchester, Cancer Research UK and The Christie NHS Foundation Trust. It has since been established as the cancer research arm of the Manchester Academic Health Science Centre (MAHSC), which is a strategic partnership between the University and six NHS Trusts across Greater Manchester. The Cancer Research UK Manchester Institute (one of only two “major" CRUK centres) and The Christie Hospital, the largest single site cancer centre in Europe and the first UK centre to be accredited as a comprehensive cancer centre treating more than 44,000 patients a year, are at the heart of the MCRC. The Paterson Redevelopment Project will lead to the construction of a new world-leading cancer research institute by 2030. Manchester is also part of the CRUK International Alliance for Cancer Early Detection (ACED). There are few locations in the UK with this combination of AI ambition and core strengths in cancer and despite the immense challenges posed by covid-19 for the university sector, the University of Manchester has reaffirmed their institutional support for this fellowship proposal (1x 3yr Postdoc, 1x PhD, 0.5x RSE).
Proposed Research and Context
Background
17pt
[“In the spring of 2012, I received the devastating diagnosis of ovarian cancer - the ‘silent killer’ Why me? Why anyone? Like thousands of women before me, I searched for reasons WHY whilst trying to come to terms with the illness.”]{.roman} — Patient 11152
\vspace{4pt} \vspace{0.2cm} Ovarian Cancer. Ovarian cancer (OvCa) affects approximately 7,000 women per year in the UK, with half over the age of 65. OvCa is hard to detect, often presenting at an advanced stage making it extremely difficult to treat. There has been little progress with improving mortality rates in the last 20 years with 5-year survival rates as low as 10-15% for the most common high-grade serous subtype (HGSOC). In order to unlock the disease, international efforts have focused on understanding the molecular determinants of the condition through new technologies including -omics [@bowtell2010genesis].
My contributions have included being amongst the first to use multi-region whole genome sequencing (WGS) to extensively characterise the genetic heterogeneity within a single OvCa patient (Patient 11152) and showing that an individual can harbour both chemotherapy-sensitive and resistant cell populations (Figure [fig:patient_11152]{reference-type=“ref” reference=“fig:patient_11152”}(a)). We also identified an early premalignant mutation in the SOX2 gene, which was found to be frequently overexpressed in the fallopian tubes (FT) of OvCa patients [@hellner2016premalignant]. We have used single cell RNA sequencing to survey the cellular FT landscape of OvCa and showed that gene expression patterns in the tumours can be traced back to those of certain FT cell types [@hu2020repertoire]. This has allowed us to postulate that OvCa’s can derive from different cell types within the FT and that the cell-of-origin can drive significant differences in survival outcomes. More recently, we have developed novel sequencing technologies to examine minimal residual disease (MRD) tumor samples that persist after initial first-round treatments and repopulate the disease leading to recurrence [@karaminejadranjbar2020highly]. This led us to characterise genomic alterations, specific to one metastatic tumour sample from Patient 11152 that had previously been uncharacterised by standard sequencing. This provided evidence of the ongoing disease evolution of the disease within the patient.
\centering {width=“85%”}
[[fig:patient_11152]]{#fig:patient_11152 label=“fig:patient_11152”}
Vision. Throughout this work, I have been struck by the many layers of molecular complexity underlying OvCa that have been revealed by advanced ‘omics technologies (multi-omics). Complexity that we have begun to painstakingly unravel and enable clinicians and patients, like 11185, to better understand why their disease happened and how it could be treated (Figure [fig:patient_11152]{reference-type=“ref” reference=“fig:patient_11152”}(b)). If this could be routinely undertaken, not just for one patient, but every patient, we would begin to realise the ambitions of cancer precision medicine: to treat patients at the right time in the right way. In this fellowship proposal, I aim to investigate the fundamental AI technologies that would enable the realisation of this ambition.
Objectives
“An understanding of the dynamics of cancer evolution might lead to improvement in clinical outcomes, as it enables prognoses to be accurately determined and ’evolution-aware’ patient management to be applied." — Turajlic et. al. (2019), Nature Reviews Genetics.
\vspace{0.5cm} Overview. Central to the realisation of this vision is the need for a model of molecular cancer progression. With a research team assembled through this fellowship, given patient data, we would like to be able to reconstruct the series of molecular events that have led to their current state of disease and to use this model to predict the future course of their disease under different treatment regimes (Figure [fig:overview]{reference-type=“ref” reference=“fig:overview”}). My intention is not to study cancer evolution in itself but to leverage the extensive pool of existing knowledge derived over the last decade of cancer ‘omics research to build computable models that could enable us to apply this knowledge in a clinical setting [@swanton2020take]. I will centre my thinking around ovarian cancer but the methodologies will be applicable to all cancers and potentially other diseases that will be explored with a broad body of project partners that I have assembled for this proposal.
\centering {#fig:overview width=“60%”}
Existing work. Longitudinal sampling represents the gold standard for acquiring data to measure disease progression through repeated sampling over time. However, in cancer this is rarely possible since patients typically only present for diagnosis once signs and symptoms of the disease have occurred and subsequent treatments will immediately perturb the disease therefore preventing direct observation of the natural evolution of the disease. We have overcome this limitation by constructing models that exploit the asynchronous properties of cancer patients within cross-sectional cohorts to recover pseudotemporal information about the dynamics of cancer evolution [@campbell2018uncovering].
\centering {#fig:pst width=“90%”}
Latent Variable Models. In terms of modelling, given a collection of high-dimensional observation vectors ${ {\bf y}_1, \dots, {\bf y}_N }$ for $N$ individuals (Figure [fig:pst]{reference-type=“ref” reference=“fig:pst”}(a)), we assume that the data are noisy observations that sit on embedded non-linear manifolds within the high-dimensional observational space. These manifolds each correspond to a different disease trajectory and distance along a manifold relates to the degree of molecular progression (Figure [fig:pst]{reference-type=“ref” reference=“fig:pst”}(b)). This can be captured by a hierarchical latent variable model: $$\begin{aligned} {\bf y}_n &= {\bf x}_n + {\bm \epsilon}_n, ~ n=1,\dots,N, \ {\bf x}_n &= {\bf f}({\bf z}_n(\tau_n), {\bf c}_n),\end{aligned}$$ where ${\bf x}_n$ corresponds to the noise-free observation, ${\bf z}_n$ are the coordinates in a low-dimensional latent space which is mapped to the high-dimensional observation space via ${\bf f}$ and $\tau_n$ is a unidimensional latent quantity denoting a notion of “distance" along the manifold which we utilise as a measure of disease progression. Our work has shown that such measures highlight molecular heterogeneity amongst cancer patients that would be considered to be identical under standard clinicopathlogical variables (Figure [fig:existing]{reference-type=“ref” reference=“fig:existing”}(c)) Additional covariates of relevance are captured by ${\bf c}_n$.
Decomposable nonlinear dimensionality reduction. In statistical machine learning, where ${\bf f}$ is treated as an unknown function, popular models include the Gaussian Process [@lawrence2004gaussian] or deep neural networks particularly in the form of Variational Autoencoders (VAEs) [@kingma2014stochastic]. We have recently extended both frameworks [@martens2019decomposing; @martens2020nd] to include explicit additive structures on the mappings: $$\begin{aligned} x_{pn} = f_{c}^p({\bf c}n) + f{z}^p({\bf z}n(\tau_n)) + f{cz}^p({\bf z}_n(\tau_n), {\bf c}_n), ~ p=1,\dots,P,\end{aligned}$$ which allow us to more precisely define the relationships between each feature, the latent quantities and the covariates providing interpretable structure which means ${\bf f}$ is not simply a black-box (Figure [fig:pst]{reference-type=“ref” reference=“fig:pst”}(c)). This feature-level interpretability means we can understand the model in terms of how original features vary along trajectories allowing experimental validation and the possibility of biomarker tests to be developed to measure and confirm patient disease progression (Figure [fig:existing]{reference-type=“ref” reference=“fig:existing”}(a)). We have also extended these models to incorporate scale- and translation-invariance allowing us to group features exhibiting similar functional behaviour [@martens2020trans] (Figure [fig:existing]{reference-type=“ref” reference=“fig:existing”}(b)). This enables the identification of “pathways" or coordinated biological behaviour across many different genes or proteins. These constraints have the benefit of promoting adherence to biological reality whilst vastly reducing the feasible model space meaning it is possible to learn effectively from the relatively small data sets common in cancer studies such as the OXO-PCR clinical trial (Figure [fig:existing]{reference-type=“ref” reference=“fig:existing”}(c)).2
\centering {#fig:existing width=“85%”}
Work Packages. Building upon this existing work, I propose two AI-focused work packages to undertake detailed and novel research into models to account for (i) treatment intervention and (ii) genetic evolution. I then outline a decision making process in the third work package to determine the specific clinical applications that we will build using these technologies leveraging the interest and in-kind contributions (total value: > £8M) of the large body of project partners assembled in support. Note, for conciseness, I will not specifically describe challenges related to the use of specific data modalities (e.g. bulk vs single cell data). I believe my track record provides evidence of my capability to address these issues.
\vspace{0.25cm} {\it WP1: Mechanistically-driven latent variable models of treatment interventions [Years 1-3]} . Our existing methodology has been beneficial for modelling the “natural evolution" of untreated cancers but are not directly amenable to data collected from post-treatment samples where the treatment may have perturbed the progression of the disease (Figure [fig:treatment]{reference-type=“ref” reference=“fig:treatment”}(a)). Our goal is to construct models to allow quantification of interventional effects at the molecular level. However, the a priori knowledge of the absolute molecular effects of the intervention is often poorly characterised and may only be defined in qualitative terms, i.e. a drug acts to down-regulate the expression of a particular pathway. This type of knowledge could be expressed in terms of derivatives motivating a mechanistic approach: $$\begin{aligned} x_{pn}(\tau_n) &= \int_{-\infty}^{\tau_n} g^p({\bf x}_n(t), {\bf z}_n(t), t) d t, ~ p=1,\dots,P,\end{aligned}$$ where each observed feature is governed by a system of ordinary (or potentially stochastic) differential equations (ODEs) $g$ whose integration limits depend on an unobserved molecular time $\tau_n$. I am therefore seeking to develop a novel class of mechanistically-driven dimensionality reduction methods (Figure [fig:treatment]{reference-type=“ref” reference=“fig:treatment”}(c)) where we wish to identify a low-dimensional latent embedding for the data where the mapping from the latent to observable space is governed by a dynamic model whose form is not necessarily known but can be parameterised through deep neural networks [@chen2018neural].
\centering {#fig:treatment width=“85%”}
We will initially focus on ODEs that take the form: $$\begin{aligned} \frac{ \partial x_p }{ \partial \tau } &= g_p^p(x_p(\tau), {\bf z}(\tau)) + \sum_{q \neq p} g_q^p(x_q(\tau), {\bf z}(\tau)) + \sum_{q} g_{pq}^p(x_p(\tau), x_q(\tau), {\bf z}(\tau)) + \underbrace{I_p(x_p(\tau), z(\tau))}_{\mbox{\scriptsize Treatment}},\end{aligned}$$ which relates the time-evolution of each feature to itself, influences from other features, feature-feature interactions and finally an intervention term to account for treatment effects. The latter is of interest since current cancer treatments are administered ignorant of high-resolution molecular factors and therefore characterising the molecular effects of current treatments is of interest. This setup will allow the proposed work to build upon and integrate with hybrid mechanistic-machine learning models for characterising drug perturbations on cancer cell lines that have established network models of molecular interactions. These can be used as prior information to regularise our models [@oates2014causal; @korkut2015perturbation; @stewart2017integrating; @frohlich2018efficient; @yuan2019interpretable]. Note that these previous efforts are framed in a tightly regulated setting under highly controlled experimental conditions. Our objective is to address patient disease evolution and heterogeneity which are factors not covered by previous studies.
This work will also integrate causal approaches to establish connections beween our mechanistic with structural causal models to model the behaviour of systems under interventions [@mooij2013ordinary; @pfister2019learning; @peters2020causal]. We will develop novel variants these concepts specific to understanding causal relations in the presence of uncertainty in the molecular effects of interventions. We will work with academic partners (see Data and Experimental support) to experimentally validate computational predictions.
\vspace{0.25cm} {\it WP2: Reinforcement learning cancer evolution [Years 1-3]} . Genome instability is a hallmark of many cancers, including OvCa, which is said to be “copy number driven" [@macintyre2018copy; @berger2018comprehensive; @the2020pan]. Loss of genomic integrity in cancer cells leads to the accumulation of chromosomal abnormalities - segments of DNA which are lost or duplicated (Figure [fig:rl]{reference-type=“ref” reference=“fig:rl”}(a)). Remarkably, whilst copy number evolution has been extensively studied [@de2014spatial; @wang2014clonal; @gao2016punctuated] and genomic instability is strongly linked to cancer outcomes [@hieronymus2018tumor], there are currently no generative probabilistic models that describe this phenomenon. Closely related work includes the identification of tumour-specific phylogenies [@caravagna2018detecting; @satas2020scarlet] or the use of Gillespie simulations [@mcfarland2014tug; @lopez2020interplay] but these approaches do not elucidate the low-dimensional representations we require to encapsulate complex evolution processes that might give rise to a so-called evo/eco-index [@maley2017classifying].
The modelling challenge lies in the fact that each copy number alteration event that be described by a triplet of information: start position, end position, alteration type. It is not a simple vectorisable data type. Successive copy number alterations can overlap leading to a different type of model requirement compared to that described in WP1 which is more suited to (near)continuous data types (gene expression, proteins, metabolomics) rather than discrete DNA sequence data which requires its own “arithmetic".
Recently, my group has begun to develop novel deep reinforcement learning (Deep RL) approaches for learning probabilistic models of genomic evolution [@feng2020]. The principle is that we can capture evolutionary pressures through a policy governing the Markov transitions between genomic states (Figure [fig:rl]{reference-type=“ref” reference=“fig:rl”}(b)). Evolution is treated as an agent which takes actions against the genome causing it to mutate. The genome returns rewards which are related to the compatibility of that action with real world tumour copy number profiles. Using RL we learn policies that allow us to generate evolutionary event sequences which can give rise to biologically realistic genomic copy number profiles (Figure [fig:rl]{reference-type=“ref” reference=“fig:rl”}(c)). These trained models can then be used to retrospective infer evolutionary histories given a tumour profile or to forward project the evolution of the genome for predictive purposes. Importantly, in order for such a model to be tractable, we need to incorporate prior knowledge of actual biological mechanisms and constraints. Here, for example, we require the notion of positivity (copy numbers cannot be negative), whole genome duplications (WGD) [@lopez2020interplay; @cross2020evolutionary] are a special action that cause genomic content to double, invariant to arbitrary permutations of the chromosome labelling, etc. Interestingly, to overcome the limited number of real tumour profiles available (e.g. 300 from the TCGA OvCa dataset), we have adopted notions of self-play made prominent by recent high-profile applications, such as AlphaGo, to train our models [@silver2017mastering].
\centering {#fig:rl width=“90%”}
The goal of this work package is to further develop a reinforcement learning approach for genomic evolution that has the scope and scalability for later clinical applications. Since the number of possible mutation types and the genome is enormous, we will investigate scalable approaches to allow whole genome analysis with high-resolution (our current implementation is only operable at coarse resolutions of a 1-5 megabases). This will include, for instance, investigating the use of deep representation approaches for describing large action and state spaces [@chandak2019learning] that can be used to derive analogous evolutionary progression time measures (to $\tau$ in WP1). We will also explore novel variations on ideas such as action elimination [@zahavy2018learn] or attention mechanisms to hone in on copy number hotspots to reduce computational burden [@vaswani2017attention]. Further, in order to capture evolutionary divergence, we will extend the approach to allow tree-structured mixtures of policies to model tumour heterogeneity, integrating with the extesive pre-existing work in this area, within a framework that integrates other mutation types, notably point mutations (single nucleotide variants, SNVs), giving rise to an ambitious attempt to jointly model dual aspects of the evolving cancer genome.
\vspace{0.25cm} {\it WP3: Developing clinical applications [Years 4-5]} . This work package will be broken down into distinct phases consisting of an initial scoping exercise followed by three cycles of development (see Work Plan). During the scoping exercise, the capabilities offered by the outputs of WP1 and WP2 will be introduced to the potential user groups (clinician, patients, pharma) via a series of engagement events. We will then identify a small number of realisable exemplar applications that - in consultation with the Steering Group (see Impact) - will be developed during the remaining project period. This work maybe carried out wholly by my research team or in collaboration with the project partners leveraging in-kind support. Each development phase will involve an agreed set of objectives that are intended to be realised over a 3-6 month period. Each phase will end with an evaluation period where next steps are agreed. A final series of prototype evaluations will occur at the end of the Fellowship period. This approach is specifically designed to leverage the opportunity afforded by this fellowship scheme’s for flexible resource utilisation in Years 3-5 and utilises the project partners to help define the most impactful applications of our technologies and actively engage in steering the project in Years 4-5.
There are however two possible applications that I can anticipate. First, a clinical decision support (CDS) system to support clinicians involved in Molecular Tumour Boards (MTBs) in the interpretation of complex ‘omics. This would be in supported within my own institution via the digital Experimental Cancer Medicine Team3 who have an ongoing programme of constructing decision support systems to facilitate local MTB activities. We would leverage this existing infrastructure to facilitate fast-track the movement of our research to clinical applicability. Second, working with pharmaceutical partners, we could integrate our methods into advanced clinical trial designs that combine ‘omics profiling and AI-based progression modelling to account for treatment and outcome heterogeneity effects. This would be an advance on current genomics-based cancer clinical trials which are typically based on targeted gene panels or genomic regions and lack the resolution to account for molecular heterogeneity within and between patients.
Data and Experimental support. This work will be conducted in collaboration with a number of clinical/academic labs (see Letters of Support). All laboratories have offered substantial in-kind support in access to pre- and post-treatment molecular data sets from model systems (cell lines, organoids, mice) and human patients using a range of technologies (e.g WGS, RNAseq, IMC, methylation, nanopore, single-cell). In addition, bioinformatics support and laboratory technician time have been included to generate novel data sets where investigations require additional data or novel biological insights have been uncovered.
National Importance
The importance of this proposed work is underlined by the NHS Long Term Plan which states that “linking and correlating genomics, clinical data and data from patients provides routes to new treatments, diagnostic patterns and information to help patients make informed decisions about their care." [@england2019nhs; @health2019topol] Toward this, national initiatives such as the GEL 100KGP, the NHS Genomic Laboratory Hubs and the ISCF Accelerating Detection of Disease project are now beginning to embed physical infrastructure within the health service to provide routine -omics analysis capability. Within 10-15 years, we will expect to see a revolution in the way in which the molecular characterisation of disease, particularly cancer, will impact on routine clinical practice with a matching requirement for intelligent analytics to support its implementation. Further, there are a large number of clinical cancer multi-omics initiatives occurring internationally (e.g. The Tumor Profiler Study [@irmisch2020tumor]) so the UK is not alone in this endeavour. Given this background, I believe this is the right time to shape the analytical frameworks that will accompany the molecular technologies to make use of the emerging data from this modernisation. My combination of interests, expertise and integration into some of the aforementioned initiatives therefore presents an opportunity to become world leader in AI for clinical ‘omics.
Impact [max 2 pages]
This research aims to advance a body of core AI technologies for clinical applications of multi-omics for cancer progression modelling. In order to maximise the impact of such developments, I have assembled a body of charity (OCA), academic (Birmingham, Oxford, Kings/Crick, Karolinska, UC Davis) and industry-based project partners (Roche, Astra Zeneca, NEC Labs, Hummingbird Diagnostics) that will support the translation of these ideas. The commitment of the partners is evidenced by over £8M of specified in-kind contributions to the proposed project. I will maintain relationships with project partners via three main mechanisms:
(i) Stakeholder Events. I will host annual stakeholder engagement meetings involving all project partners to review the progression of the fellowship. Each meeting will be themed to focus on specific areas of focus (Patient Engagement, Artificial Intelligence, Pharmaceuticals, Clinicians and Patients) and timed to match project milestones. External participants will be invited ensuring a broad representation of stakeholders beyond those directly supporting this application.
(ii) Secondments and Exchanges. I have incorporated up to six months of potential secondment/exchange time for members of the my research team to participate in visits to project partners. These opportunities will provide career enhancement experiences for the researchers themselves, whilst at the same time providing a more intimate opportunity to exchange and share ideas and progress.
(iii) Co-design and Steering Group. A steering group will be assembled from patient and project partner representatives. They will oversee the main project start, mid-term and completion reviews. The group will provide guidance and advice to myself and the project team about the development of the project particularly with the areas of impact to focus in toward the latter stages of the project (WP3). The Steering Group chairs will be Institutional mentors, Professors Rattray and Bristow.
In addition to impacts on clinical applications of cancer modelling as outlined in the Work Packages, there are further specific areas of wider engagement that I will develop:
Patient and public engagement. With Ovarian Cancer Action, I will engage with patient stakeholders to raise awareness of the importance of AI in the interpretation of complex ‘omics data and its role within an emerging new paradigm for how cancer is diagnosed and treated. I will also work with OCA to engage patient groups to understand the patient perspective of our work. Does the ability to provide vastly more detailed information about an individual’s disease lead to mental overload? Does it help to remove the mystique about a diagnosis? How can we better support clinician-patient interactions? This will underlie one of the themed stakeholder events described above and leverage my pre-existing work with OCA and the Oxford Ovarian Cancer Lab.
Early Detection. The generative models that we will develop will simplify the process of “rolling back" to an earlier point in the tumour development enabling putative early molecular changes to be identified. This is of particular interest in Manchester which is part of the CRUK-supported £55M International Alliance for Cancer Early Detection (ACED).4 All research conducted during this fellowship will seek to align and exploit further funding opportunities and partnerships within ACED. Further, I have also engaged with industry partner Hummingbird Diagnostics, who are developing liquid biopsy diagnostics for early disease detection based on highly disease-specific miRNA biomarkers. This partnership has the potential to see our AI technologies supporting the actual development of novel biomarker tests.
Other Data Modalities. Two areas that this proposal has not addressed are imaging and real-world evidence (RWE) using large-scale clinical cancer data sets. Recent examples have highlighted the importance of novel AI in these areas. [@alaa2018autoprognosis; @shen2020learning; @bica2020real; @abduljabbar2020geospatial]. As a member of the Strategy Advisory Group for the Health Programme at The Alan Turing Institute, I am currently advising on a potential strategic partnership between the Turing and a global pharmaceutical company. This partnership would provide opportunities for research collaboration, access to domain expertise and valuable sources of RWE data linking -omics to clinical trials. If this application is successful, I would use the fellowship to engage with this partnership, if and when Turing successfully concludes negotiations5. Further, in medical imaging, I have engaged with the EU ERA “Advancing Breast Cancer histopathology towards AI-based Personalised medicine" consortium (ABCAP), led by the Karolinska Institutet (see Support Letter), who are developing AI methodology for automating screening of routine histopathology images. In collaboration we would co-develop joint models for molecular and image-based disease progression characterisation.
Equality, diversity and inclusion
Career development. The project has been designed to ensure all researchers have equal opportunity to take leadership over work components as well as training to further their career potential. Each work package has been designed to be led by each of the three PDRAs (see Work Plan). Thesis projects for PhD students would be shaped to work across work packages giving students the opportunity to develop a unique synthesis of ideas whilst allowing PDRAs the opportunity to share in mentorship and supervisory responsibilities. All PDRAs and students will have the opportunity to have up to six months experience outside of the main project to develop collaborations with external stakeholders. Organisation of themed stakeholder events will be led by the research team members.
Ovarian Cancer and Gender. For OvCa, it is evident that for a disease which, predominantly impacts women only, a female voice should be integral to every aspect of its delivery.6 Working with Ovarian Cancer Action and our institutional Athena Swan team, we will work to actively promote the research opportunities to female researchers (noting that they are under-represented in AI as well). Every team member will receive training from the charity to ensure they develop a deep understanding of the disease and its wider impacts. These individuals would develop as an Ambassador for the project and be central to our dissemination plans both internally and externally.
Integration with external activities. As co-Director of the HDRUK-Turing PhD Programme, I am actively involved in developing innovative approaches to improving research culture including EDI. This is required by Wellcome for the award of the programme but also couples with the missions of the two national institutes. I will endeavour to leverage the broader scale of EDI activities associated with this programme within my fellowship to obtain the critical mass needed to inspire real change.
Responsible research and innovation. Genomic medicine is motivated by the understanding that human disease is driven by molecular factors, including genetics, and treatments should be adapted to these features which may differ from patient to patient. Stratification by molecular features can lead to differential treatment strategies that could incidentally correlate with sub-groups of the human population including sub-group designations which associate with sex or ethnicity. There is increasing evidence that the molecular traits of cancer are related to ancestry [@li2020genomic]. Sensitivity and an understanding of the wider implications of these issues must be at the heart of all work in this area and the development of artificial intelligence techniques is no exception. Towards this goal, I have initiated a specific collaboration with Professor Carvajal-Carmona at the UC Davis Comprehensive Cancer Centre (see Support Letter) who have an extensive track record of work in genetics and cancer research in ethnic minorities groups and other causes of disparity in California and Latin America. Recently, they have been supported by NIH to establish patient-derived animal and cellular cancer models for ‘omics profiling specifically from ethnic minority groups. There are no such equivalent projects in the UK and therefore this exemplar stands out as an opportunity for my team and partners to learn best practice and considerations for responsible engagement across communities.
Impact of covid-19
Due to the impact of covid-19 on the UK health system and economy, I have understandably not attempted to obtain direct support from NHS organisations at this challenging time. In addition, OCA are unable to commit financial support for this research at this time as had been previously discussed during the outline stage. If the fellowship is awarded, I will re-engage with discussions at a later date. Finally, over the last 10 weeks, I have spent the majority of my time providing home schooling and childcare for my daughter with only limited time available for this application alongside regular academic responsibilities. I am not alone in facing these challenges but wish to voice the impact on covid-19 on productivity on all applicants with care responsibilities during these challenging and unprecedented times.
2
\printbibliography 0.75
\endgroup
-
Consisting of Queen’s Belfast, Birmingham, Cambridge, Edinburgh, Manchester, Oxford and University College London ↩︎
-
URL: https://www.cancerresearchuk.org/about-cancer/find-a-clinical-trial/a-study-help-doctors-predict-who-will-respond-well-paclitaxel-chemotherapy-cancer-ovary-oxo-pcr-01 ↩︎
-
ACED: https://www.cancerresearchuk.org/funding-for-researchers/research-opportunities-in-early-detection-and-diagnosis/international-alliance-for-cancer-early-detection ↩︎
-
Anticipated agreement date: Q4 2020 ↩︎
-
Men can carry heritable genetic mutations that lead to increased risk of ovarian cancer. ↩︎