Deep learning suggests that gene expression is encoded in all parts of a co-evolving interacting gene regulatory structure

Jan Zrimec1, Christoph S Börlin1,2, Filip Buric1, Azam Sheikh Muhammad3, Rhongzen Chen3, Verena Siewers1,2, Vilhelm Verendel3, Jens Nielsen1,2, Mats Töpel4,5, Aleksej Zelezniak6,7

  1. Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden.
  2. Novo Nordisk Foundation Center for Biosustainability, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden.
  3. Computer Science and Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden.
  4. Department of Marine Sciences, University of Gothenburg, Box 461, SE-405 30, Gothenburg, Sweden.
  5. Gothenburg Global Biodiversity Center (GGBC), Box 461, 40530, Gothenburg, Sweden.
  6. Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden. aleksej.zelezniak@chalmers.se.
  7. Science for Life Laboratory, Tomtebodavägen 23a, SE-171 65, Stockholm, Sweden. aleksej.zelezniak@chalmers.se.

Abstract

Understanding the genetic regulatory code governing gene expression is an important challenge in molecular biology. However, how individual coding and non-coding regions of the gene regulatory structure interact and contribute to mRNA expression levels remains unclear. Here we apply deep learning on over 20,000 mRNA datasets to examine the genetic regulatory code controlling mRNA abundance in 7 model organisms ranging from bacteria to Human. In all organisms, we can predict mRNA abundance directly from DNA sequence, with up to 82% of the variation of transcript levels encoded in the gene regulatory structure. By searching for DNA regulatory motifs across the gene regulatory structure, we discover that motif interactions could explain the whole dynamic range of mRNA levels. Co-evolution across coding and non-coding regions suggests that it is not single motifs or regions, but the entire gene regulatory structure and specific combination of regulatory elements that define gene expression levels.

Presented By Jan Zrimec | ORCID iD