BSMS205 · Genetics

Gene Regulation

Chapter 29 · Part V · Functional Genetics
Today's central question

How do we measure
gene regulation?

Roadmap

  1. RNA-seq · gene expression
  2. Single-cell RNA-seq · cellular heterogeneity
  3. Histone marks · the chromatin code
  4. ChIP-seq · protein-DNA interactions
  5. ATAC-seq · open chromatin
  6. CUT&RUN · high resolution
  7. Integrated epigenomics · putting it together
§ 1

RNA-seq

RNA-seq workflow

  1. Extract total RNA from cells or tissue
  2. Reverse transcribe → cDNA · fragment
  3. Add adapters · sequence (50–150 bp reads)
  4. Align to genome · count reads per gene
  5. Normalise · differential expression analysis

Bulk RNA-seq · the big picture

Bulk RNA-seq workflow
RNA from many cells averaged together · output: counts per gene.
Source: Microbe Notes.

Normalisation · why we need it

MethodWhat it corrects
CPMSequencing depth
FPKM / RPKMDepth + gene length
TPMDepth + length (better cross-sample)
DESeq2 normalisationDepth + RNA composition for DE testing
§ 2

Single-Cell RNA-seq

The droplet trick

  • Encapsulate each cell in a tiny oil droplet
  • Each droplet has a gel bead with a unique barcode
  • Cell lyses in droplet · mRNA tagged with cell barcode during RT
  • Pool · sequence · de-multiplex by barcode

Bulk vs single-cell

Single-cell RNA-seq workflow
Each cell barcoded individually · output: cell × gene matrix.
Source: Microbe Notes.

What scRNA-seq reveals · cell identity

Cell typeMarker genes
T cellsCD3 · CD4 · CD8
B cellsCD19 · MS4A1
Excitatory neuronsSLC17A7
Inhibitory neuronsGAD1
AstrocytesGFAP
MicrogliaCX3CR1
§ 3

The Histone Code

The chromatin code

MarkMeaning
H3K4me3Active promoter
H3K27acActive enhancer or promoter
H3K4me1Enhancer (active or poised)
H3K36me3Active gene body · elongation
H3K27me3Polycomb repression
H3K9me3Constitutive heterochromatin

Combining marks · chromatin states

  • Active promoter · H3K4me3 + H3K27ac
  • Active enhancer · H3K4me1 + H3K27ac
  • Poised enhancer · H3K4me1 (no H3K27ac)
  • Polycomb-repressed · H3K27me3
  • Heterochromatin · H3K9me3
§ 4

ChIP-seq

ChIP-seq protocol

  1. Crosslink proteins to DNA with formaldehyde
  2. Fragment chromatin (~200–500 bp)
  3. Add antibody · pull down protein-DNA complexes
  4. Reverse crosslinks · purify DNA · sequence
  5. Map reads · call peaks

The result · peak profiles

ChIP-seq workflow
Antibody enriches DNA fragments bound by target protein → peaks at binding sites.
Source: Microbe Notes.
§ 5

ATAC-seq

ATAC-seq · the Tn5 trick

  • Tn5 transposase loaded with sequencing adapters
  • Cuts DNA only in accessible chromatin
  • Inserts adapters in the same step (tagmentation)
  • Result: open regions sequenced · closed regions invisible

ATAC-seq output

ATAC-seq workflow
Tn5 cuts only at open chromatin · sharp peaks at accessible regulatory regions.
Source: Microbe Notes.
§ 6

CUT&RUN

CUT&RUN · targeted cleavage

  • No crosslinking · no sonication
  • Antibody binds target · pA-MNase fusion binds antibody
  • Calcium activates MNase → cuts only at the binding site
  • Tiny fragments diffuse out · sequenced

Why CUT&RUN beats ChIP-seq

CUT&RUN workflow
Sharper peaks · lower background · 100–10,000 cells (vs millions for ChIP-seq).
Source: BioRender.
§ 7

Integrated Epigenomics

Each method · one layer

MethodWhat it measures
RNA-seqGene expression
scRNA-seqSingle-cell expression
ChIP-seqProtein-DNA binding
ATAC-seqChromatin accessibility
CUT&RUNHigh-res protein binding
Hi-C3D contacts

An active enhancer · multi-track signature

  • ATAC-seq: open
  • H3K27ac ChIP-seq: active mark
  • H3K4me1 ChIP-seq: enhancer mark
  • RNA-seq nearby gene: highly expressed

A repressed gene · the opposite signature

  • ATAC-seq: closed
  • H3K27me3 ChIP-seq: repressive mark
  • H3K4me3: absent
  • RNA-seq: no expression
§ 8

Why It Matters
for Genetics

Most disease variants are regulatory

  • ~93% of GWAS hits are non-coding
  • Often disrupt enhancer activity
  • Small expression changes → real disease risk

Type 2 diabetes example

  • Most T2D risk variants in pancreatic islet enhancers
  • Reduce insulin gene expression by 10–20%
  • Each variant: tiny effect
  • ~600 variants combined → significant disease risk

Therapeutic targets

  • Insufficient expression → CRISPRa (Ch. 27)
  • Excess expression → CRISPRi
  • Wrong epigenetic state → epigenetic editors
  • Therapy follows the regulatory architecture
§ 9

Summary

What to take away

  • RNA-seq — gene expression genome-wide
  • scRNA-seq — cell-type heterogeneity
  • ChIP-seq — protein-DNA interactions
  • ATAC-seq — open chromatin · fast · low input
  • CUT&RUN — high resolution protein binding
  • Together → complete regulatory architecture · disease interpretation · therapy design
Next lecture · the final chapter

Connecting variants to molecular phenotypes

Chapter 30 · QTLs · Connecting Alleles to Molecular Traits