Fundamentals of AI CDT

Development site for the EIT FOAI CDT

View the Project on GitHub cwcyau/foai-cdt

Title

Multiscale Foundation Models

Challenge

Many domains (genes in DNA, chapters in books) exhibit long-scale hierarchical dependencies that are poorly captured by existing autoregressive foundational models.

Description

Develop and evaluate hierarchical or multiscale masked autoencoder (MAE) architectures for learning rich representations from large biomolecular datasets (e.g., genomics, proteomics), focusing on capturing long-range dependencies and functional motifs.

Skills Required

Deep learning, sequence modelling, MAEs

Skills to be Developed

Multiscale model design, application to biological data

Relevant Background Reading

  1. Diffusion-LM
  2. Structured Denoising Diffusion Models in Discrete State-Spaces
  3. Latent Diffusion Model for DNA Sequence Generation
  4. https://arxiv.org/abs/2111.06377