BSMS205 · Genetics

Forward Genetics

Chapter 25 · Part V · Functional Genetics
The detective question

You have 4 million variants
in one patient's genome.
Which one is the culprit?

Forward vs reverse genetics

Forward

  • Phenotype → gene
  • Start with disease
  • Search for cause

Reverse

  • Gene → phenotype
  • Start with gene
  • Manipulate · observe

Today: forward. Next chapter: reverse.

Roadmap for today

  1. Evolution of gene hunting · 1860s to today
  2. GWAS · finding common variants
  3. What GWAS tells you · and what it doesn't
  4. Burden tests · finding rare variants
  5. GWAS vs burden tests · two sides of one coin
  6. Case study · schizophrenia architecture
§ 1

The Evolution
of Gene Hunting

1860s – 1900s · classical genetics

  • Mendel: pedigrees and breeding
  • Morgan: chromosome mapping in flies
  • Tools: cross-breeding · pedigree analysis
  • Output: inheritance patterns

1980s – 1990s · linkage analysis

  • Track chromosomal regions through families
  • Microsatellite markers · cM resolution
  • Major successes: CFTR · HTT · DMD
  • Limited to rare, high-penetrance Mendelian disease

2000s – 2010s · the GWAS era

  • Microarrays: genotype millions of SNPs at scale
  • 10,000 to 1,000,000 individuals
  • Common variants only · MAF > 1%
  • 100+ loci per complex trait

2010s – present · sequencing era

  • Whole-exome and whole-genome sequencing
  • Burden tests: aggregate rare variants per gene
  • Reveals rare-variant architecture
  • 10,000 to 100,000 individuals
§ 2

GWAS · Genome-Wide
Association Studies

The simple question

Across many people, are certain genetic variants more common in individuals with a disease than in those without?

The procedure

  1. Recruit cases (e.g. 50,000 with diabetes) and matched controls
  2. Genotype ~1 million SNPs per person
  3. For every SNP: test frequency in cases vs controls
  4. Apply genome-wide significance threshold: p < 5 × 10⁻⁸
  5. Report list of associated loci

Real example · type 2 diabetes

  • Suzuki et al. 2024, Nature
  • 1.4 million participants
  • 600+ risk loci identified
  • Each individual locus shifts risk by ~5–10%
  • Together: pathways for insulin secretion, beta-cell function

The summary statistics

FieldMeaning
BETAEffect size · log odds ratio
SEStandard error
PP-value · genome-wide threshold 5e-8
FREQAllele frequency

Five applications of summary statistics

  • Polygenic risk scores — combine thousands of variants into one number
  • Genetic correlation — find shared architecture between traits
  • Mendelian randomisation — natural experiments for causation
  • Fine-mapping — narrow loci to credible causal variants
  • Gene prioritisation — combine with eQTL data
§ 3

What GWAS
Doesn't Tell You

Limitation 1 · most hits are not in genes

93%
of GWAS hits are in non-coding regions
  • Variant might be 500 kb away from the nearest gene
  • Which gene does it regulate? Need extra experiments

Limitation 2 · linkage disequilibrium

  • Multiple variants are correlated at any locus
  • Statistical hit ≠ causal variant
  • Need fine-mapping to narrow down

Limitation 3 · common variants only

  • GWAS tests one variant at a time
  • Sample size of one cannot pass significance
  • Misses rare, high-impact mutations entirely
  • For rare variants, switch to burden tests
§ 4

Burden Tests ·
Hunting Rare Variants

The rare-variant problem

  • A variant is found in 1 person out of 50,000
  • Sample size of one · no per-variant statistics
  • But many such variants together may converge on one gene

The burden test idea

Don't test variants. Test genes.
  • For each gene · count damaging variants in cases vs controls
  • Aggregate across the gene
  • Excess in cases → gene is a risk factor

How burden tests work

  1. Sequence cases and controls (~10,000 + 10,000)
  2. Filter to damaging variants — LoF, predicted-deleterious missense
  3. Filter to rare — MAF < 0.1%
  4. Per gene · count carriers in cases vs controls
  5. Statistical test · genome-wide threshold p < 2.5e-6 (~20,000 genes)

Real example · autism risk genes

GeneCases LoFControls LoFOdds ratio
CHD835218.5
SCN2A2839.8
SYNGAP122123.1

Satterstrom et al. 2020, Cell

Where burden tests shine

  • De novo mutations — recurrent across unrelated cases
  • Ultra-rare private variants — aggregated across a gene
  • Recessive diseases — compound heterozygotes / homozygotes
  • Gene intolerance — depleted in healthy controls (gnomAD pLI)
§ 5

GWAS vs Burden ·
Two Sides of One Coin

Two methods · two regimes

FeatureGWASBurden
Variant frequencyCommon (>1%)Rare (<0.1%)
Effect sizeSmall (OR 1.05–1.3)Large (OR 2–50)
Variant locationMostly non-codingMostly coding
MechanismRegulationProtein disruption
Sample size50K – 1M5K – 50K

Volume knob vs light switch

GWAS · volume knob

  • Many small adjustments
  • Doesn't break the system
  • Tunes expression up or down

Burden · light switch

  • Single dramatic flip
  • Knocks out the protein
  • System changes drastically
§ 6

Case Study ·
Schizophrenia

GWAS findings

  • 287 genome-wide significant loci
  • Each variant: OR 1.03 – 1.15 (3 – 15% risk)
  • Together: ~20% of liability variance
  • 95% in non-coding regions
  • Trubetskoy et al. 2022, Nature

Burden findings

  • 10 high-confidence risk genes
  • Each LoF mutation: OR 3 – 50 (3- to 50-fold risk)
  • Frequency: ~0.01 – 0.1% of cases per gene
  • Top genes: SETD1A · GRIN2A · GRIA3 · TRIO
  • Singh et al. 2022, Nature

Biological convergence

Both common and rare variants converge on the same pathways.
  • Synaptic function — glutamate signalling
  • Chromatin regulation — gene expression control
  • Neurodevelopment — early cortical patterning
§ 7

From Association
to Causation

Association is not causation

  • GWAS or burden hit → candidate gene
  • Need to verify that the gene actually causes phenotype
  • What does it do? How does it act? Can we rescue it?
  • The next half is reverse genetics
§ 8

Summary

What to take away

  • Forward genetics: phenotype → gene
  • GWAS finds common, small-effect, mostly non-coding variants
  • Burden tests find rare, large-effect, protein-coding variants
  • The two methods complement, not compete
  • Together → full architecture of disease (Schizophrenia)
  • Association requires reverse genetics for mechanism
Next lecture

From candidate gene
to mechanism

Chapter 26 · Reverse Genetics — From Gene to Function