Exons, introns, UTR, CDS:

  • A gene contains a lot of sequence segments that don’t maps to amino acids (aren’t codons). The regions that codes protein are called exons, they are short segments separated by non-coding sequences called introns. exons and pre mRNA Image source: https://www.genome.gov/sites/default/files/media/images/2022-05/Exon.jpg
  • A gene is transcribed (via 1:1 mapping) to a pre mRNA. The pre mRNA contains all the non-coding sequences.
  • Splicing: The introns then get cut out of the pre mRNA, and only exons remains in the matured / messenger RNA (mRNA).
  • mRNA starts with untranslated regions (UTR) at 5’ end → exons/coding sequences (CDS) → 3’ UTR, then ends with a poly A tail.
  • Alternative splicing: the exons can be shuffled around and ends up in a different order than they are in the original DNA sequences. This allows one gene to code multiple proteins.
  • Over 90% of human genes undergo alternative splicing, meaning knowing the coding region does not directly lead to protein sequence.