BSMS205 · Genetics

Haplo-
insufficiency

Chapter 11 · Part II · Variation

A question to start with

When is half
not enough?

Two copies · why we usually have a buffer

Diploid: one allele from Mom, one from Dad
For most genes, one copy covers normal demand
The other copy = biological backup
But not always — some genes are tight on dose

The cake analogy

A recipe needs two cups of sugar.
With one, you get something cake-like.
It just doesn't taste right.

Haploinsufficient genes follow the two-cup recipe.

From Chapter 10 → here

Last time

Dominant alleles · one bad copy is enough
Many mechanisms

Today

One mechanism in detail:
Haploinsufficiency — dose-sensitive genes

Roadmap for today

Definitions · haploinsufficient vs haplosufficient
Why selection rare-ifies bad PTVs
Measuring intolerance · pLI
The next-gen score · LOEUF
Case study · SCN2A
Why this matters in the clinic
Summary & what comes next

§ 1

Defining
the Terms

Haploinsufficient · one copy is not enough

One allele knocked out (often a PTV)
Remaining copy cannot keep up
Result: disease or abnormal trait
These genes are dose-sensitive

Cut the protein in half → the cell can't compensate.

Haplosufficient · one copy is fine

One working copy → enough protein
The cell tolerates the missing allele
Most genes behave this way
This is the default state for diploids

LoF-tolerant · even both copies can go

Some genes are even more forgiving
You can lose one or both copies without harm
Often have backup paralogs elsewhere
Or simply not critical for fitness

The dose-sensitivity spectrum

Class	One copy lost	Two copies lost	Examples
Haploinsufficient	Disease	Often lethal	Transcription factors, channels
Haplosufficient	No phenotype	Recessive disease	Most disease genes
LoF-tolerant	No phenotype	Often no phenotype	Olfactory receptors

A surprising number

~100

PTVs · in every healthy person

Healthy adults each carry about 100 PTVs
Most sit in LoF-tolerant genes
A PTV by itself is not a death sentence
What matters is which gene got hit

§ 2

Selection
Removes the Worst

Natural selection · in one sentence

Variants that reduce fitness
get filtered out across generations.

Fitness = ability to survive and reproduce
Helpful variants spread; harmful ones vanish
Slow-motion quality control on the genome

Purifying selection · the cleanup crew

Type of natural selection that removes harmful variants
Most active on essential genes
PTVs in haploinsufficient genes → strongly disfavored
PTVs in LoF-tolerant genes → no pressure to remove

The signal we exploit

If a gene has far fewer PTVs
than expected by chance,
selection has been removing them.

Few PTVs in healthy people = essential gene
Many PTVs = tolerant gene
This is the basis of every constraint metric

§ 3

Measuring with
pLI

The ExAC study · 2016

60,706 adult exomes · the largest at the time
Adults without severe developmental disorders
Healthy population = baseline
Lek et al. 2016, Nature

What pLI means

pLI = probability that a gene is loss-of-function intolerant

Score from 0 to 1
pLI ≥ 0.9 → highly intolerant · likely haploinsufficient
pLI ≈ 0 → tolerates PTVs · LoF-tolerant
A binary-style readout: intolerant or not

How pLI is calculated

compare observed PTVs ↔ expected PTVs

Expected: from gene length × background mutation rate
Observed: actual PTVs found in 60,706 exomes
Big shortfall → strong selection against PTVs
Big shortfall → high pLI

3,230 highly intolerant genes

3,230

genes with pLI ≥ 0.9

~16% of all human genes
Enriched in ribosome assembly
Enriched in chromatin regulation
Enriched in cell cycle control

pLI flags known disease genes

pLI distributions across functional gene categories — **Figure 1.** Known haploinsufficient (dominant) disease genes pile up at high pLI; recessive disease genes spread more broadly. pLI captures real biology. Source: Lek et al. 2016, *Nature*. CC-BY 4.0.

Why use healthy adults?

Haploinsufficient genes → developmental disorders
Healthy adults already survived and may have reproduced
Selection has already worked on this sample
The signal you see = what survived selection

PTVs are mostly singletons

Allele frequency distribution of PTVs in ExAC — **Figure 2.** Most PTVs in ExAC appear in only one person — the signature of purifying selection keeping harmful variants rare. Source: Lek et al. 2016, *Nature*. CC-BY 4.0.

§ 4

The Refined Score:
LOEUF

The gnomAD study · 2020

141,456 individuals · more than 2× ExAC
Both exomes and whole genomes
Adults without severe developmental disorders
Karczewski et al. 2020, Nature

What LOEUF means

LOEUF = Loss-of-function Observed/Expected Upper-bound Fraction

Continuous score (no hard threshold)
Lower = more intolerant
LOEUF < 0.35 ≈ likely haploinsufficient
Adjusted for statistical uncertainty

pLI vs LOEUF · what changed

pLI

Yes / no test
Cutoff at 0.9
Misses moderate intolerance

LOEUF

Sliding scale
Lower = more intolerant
Captures shades of gray

The LOFTEE upgrade

LOFTEE = LoF Transcript Effect Estimator
Filters out fake PTVs · sequencing errors, alt transcripts
Output: 443,769 high-confidence PTVs
Cleaner input → more reliable score

LOEUF lines up with biology

LOEUF distributions across gene categories — **Figure 3.** Genes with low LOEUF are essential in mouse knockouts and enriched for human disease. External validation that LOEUF captures real biological importance. Source: Karczewski et al. 2020, *Nature*. CC-BY 4.0.

§ 5

Case Study:
SCN2A

SCN2A · the dosage profile

165.8

expected PTVs

observed PTVs

o/e = 0.11 · pLI = 1.0 · LOEUF very low

SCN2A on gnomAD · the actual page

**Figure 4.** The gnomAD constraint table for SCN2A. pLoF row: expected 165.8, observed 18, o/e = 0.11, pLI = 1.0. Missense Z-score = 8.73 also shows depletion. Source: gnomAD Browser, `ENSG00000136531`.

The clinical reality of SCN2A

~1,500–2,000 known affected individuals worldwide
Severe epilepsy and neurodevelopmental disorders
De novo PTVs (not inherited) → fitness essentially zero
That is why the constraint signal is so strong

§ 6

Why It Matters
in the Clinic

Variant prioritization in diagnosis

Patient has a rare PTV in Gene X
pLI = 1.0 / LOEUF = 0.1 → very suspicious
pLI = 0 / LOEUF = 1.5 → probably not the cause
Constraint scores = front-line filter for clinicians

Drug target prioritization

Drug target = often a gene we want to turn off
Lower LOEUF → likely more side effects if inhibited
Higher LOEUF → maybe safer to inhibit
Constraint helps rank candidates early

The PCSK9 twist · LoF can be good

Some humans are born with broken PCSK9.
They have low cholesterol and less heart disease.

Drug companies designed PCSK9 inhibitors
Essentially: "PTVs in a pill"
Constraint data → therapeutic opportunities

§ 7

Summary

What to take away

Haploinsufficiency = one copy is not enough · dose-sensitive
Healthy people carry ~100 PTVs · mostly in tolerant genes
Purifying selection removes harmful PTVs over generations
pLI ≥ 0.9 or LOEUF < 0.35 → likely haploinsufficient
SCN2A: 165.8 expected, 18 observed, pLI 1.0
Constraint scores → diagnosis · drug targets · therapy

Next lecture

Now what if
one bad copy is tolerated —
and you need two?

Chapter 12 · Recessive Alleles