BSMS205 · Genetics
Forward Genetics
Chapter 25 · Part V · Functional Genetics
The detective question
You have 4 million variants
in one patient's genome.
Which one is the culprit?
Forward vs reverse genetics
Forward
- Phenotype → gene
- Start with disease
- Search for cause
Reverse
- Gene → phenotype
- Start with gene
- Manipulate · observe
Today: forward. Next chapter: reverse.
Roadmap for today
- Evolution of gene hunting · 1860s to today
- GWAS · finding common variants
- What GWAS tells you · and what it doesn't
- Burden tests · finding rare variants
- GWAS vs burden tests · two sides of one coin
- Case study · schizophrenia architecture
§ 1
The Evolution
of Gene Hunting
1860s – 1900s · classical genetics
- Mendel: pedigrees and breeding
- Morgan: chromosome mapping in flies
- Tools: cross-breeding · pedigree analysis
- Output: inheritance patterns
1980s – 1990s · linkage analysis
- Track chromosomal regions through families
- Microsatellite markers · cM resolution
- Major successes: CFTR · HTT · DMD
- Limited to rare, high-penetrance Mendelian disease
2000s – 2010s · the GWAS era
- Microarrays: genotype millions of SNPs at scale
- 10,000 to 1,000,000 individuals
- Common variants only · MAF > 1%
- 100+ loci per complex trait
2010s – present · sequencing era
- Whole-exome and whole-genome sequencing
- Burden tests: aggregate rare variants per gene
- Reveals rare-variant architecture
- 10,000 to 100,000 individuals
§ 2
GWAS · Genome-Wide
Association Studies
The simple question
Across many people, are certain genetic variants more common in individuals with a disease than in those without?
The procedure
- Recruit cases (e.g. 50,000 with diabetes) and matched controls
- Genotype ~1 million SNPs per person
- For every SNP: test frequency in cases vs controls
- Apply genome-wide significance threshold: p < 5 × 10⁻⁸
- Report list of associated loci
Real example · type 2 diabetes
- Suzuki et al. 2024, Nature
- 1.4 million participants
- 600+ risk loci identified
- Each individual locus shifts risk by ~5–10%
- Together: pathways for insulin secretion, beta-cell function
The summary statistics
| Field | Meaning |
| BETA | Effect size · log odds ratio |
| SE | Standard error |
| P | P-value · genome-wide threshold 5e-8 |
| FREQ | Allele frequency |
Five applications of summary statistics
- Polygenic risk scores — combine thousands of variants into one number
- Genetic correlation — find shared architecture between traits
- Mendelian randomisation — natural experiments for causation
- Fine-mapping — narrow loci to credible causal variants
- Gene prioritisation — combine with eQTL data
§ 3
What GWAS
Doesn't Tell You
Limitation 1 · most hits are not in genes
93%
of GWAS hits are in non-coding regions
- Variant might be 500 kb away from the nearest gene
- Which gene does it regulate? Need extra experiments
Limitation 2 · linkage disequilibrium
- Multiple variants are correlated at any locus
- Statistical hit ≠ causal variant
- Need fine-mapping to narrow down
Limitation 3 · common variants only
- GWAS tests one variant at a time
- Sample size of one cannot pass significance
- Misses rare, high-impact mutations entirely
- For rare variants, switch to burden tests
§ 4
Burden Tests ·
Hunting Rare Variants
The rare-variant problem
- A variant is found in 1 person out of 50,000
- Sample size of one · no per-variant statistics
- But many such variants together may converge on one gene
The burden test idea
Don't test variants. Test genes.
- For each gene · count damaging variants in cases vs controls
- Aggregate across the gene
- Excess in cases → gene is a risk factor
How burden tests work
- Sequence cases and controls (~10,000 + 10,000)
- Filter to damaging variants — LoF, predicted-deleterious missense
- Filter to rare — MAF < 0.1%
- Per gene · count carriers in cases vs controls
- Statistical test · genome-wide threshold p < 2.5e-6 (~20,000 genes)
Real example · autism risk genes
| Gene | Cases LoF | Controls LoF | Odds ratio |
| CHD8 | 35 | 2 | 18.5 |
| SCN2A | 28 | 3 | 9.8 |
| SYNGAP1 | 22 | 1 | 23.1 |
Satterstrom et al. 2020, Cell
Where burden tests shine
- De novo mutations — recurrent across unrelated cases
- Ultra-rare private variants — aggregated across a gene
- Recessive diseases — compound heterozygotes / homozygotes
- Gene intolerance — depleted in healthy controls (gnomAD pLI)
§ 5
GWAS vs Burden ·
Two Sides of One Coin
Two methods · two regimes
| Feature | GWAS | Burden |
| Variant frequency | Common (>1%) | Rare (<0.1%) |
| Effect size | Small (OR 1.05–1.3) | Large (OR 2–50) |
| Variant location | Mostly non-coding | Mostly coding |
| Mechanism | Regulation | Protein disruption |
| Sample size | 50K – 1M | 5K – 50K |
Volume knob vs light switch
GWAS · volume knob
- Many small adjustments
- Doesn't break the system
- Tunes expression up or down
Burden · light switch
- Single dramatic flip
- Knocks out the protein
- System changes drastically
§ 6
Case Study ·
Schizophrenia
GWAS findings
- 287 genome-wide significant loci
- Each variant: OR 1.03 – 1.15 (3 – 15% risk)
- Together: ~20% of liability variance
- 95% in non-coding regions
- Trubetskoy et al. 2022, Nature
Burden findings
- 10 high-confidence risk genes
- Each LoF mutation: OR 3 – 50 (3- to 50-fold risk)
- Frequency: ~0.01 – 0.1% of cases per gene
- Top genes: SETD1A · GRIN2A · GRIA3 · TRIO
- Singh et al. 2022, Nature
Biological convergence
Both common and rare variants converge on the same pathways.
- Synaptic function — glutamate signalling
- Chromatin regulation — gene expression control
- Neurodevelopment — early cortical patterning
§ 7
From Association
to Causation
Association is not causation
- GWAS or burden hit → candidate gene
- Need to verify that the gene actually causes phenotype
- What does it do? How does it act? Can we rescue it?
- The next half is reverse genetics
What to take away
- Forward genetics: phenotype → gene
- GWAS finds common, small-effect, mostly non-coding variants
- Burden tests find rare, large-effect, protein-coding variants
- The two methods complement, not compete
- Together → full architecture of disease (Schizophrenia)
- Association requires reverse genetics for mechanism
Next lecture
From candidate gene
to mechanism
Chapter 26 · Reverse Genetics — From Gene to Function