BSMS205 · Genetics

Structural Variations
in Human Genomes

Chapter 13 · Part II · Variation
A question to start with

What if a whole
paragraph moves?

From single letters to whole chunks

SNVs & indels

  • 1 base changed
  • "cat" → "bat"
  • Easy to detect

Structural variants

  • ≥ 50 bp rearranged
  • Whole paragraph moves
  • Hard to detect
The number that should surprise you
~7,439
structural variants per genome (median)
  • From gnomAD-SV · 14,891 genomes
  • More than twice the 1000 Genomes estimate
  • Most are silent — but some cause disease

Roadmap for today

  1. What is an SV — definition & types
  2. The gnomAD-SV catalog · 14,891 genomes
  3. How many SVs & what sizes
  4. Five mechanisms of SV formation
  5. Functional impact & selection
  6. Clinical syndromes from SVs
  7. Summary & bridge to Part III
§ 1

What Is a
Structural Variant?

The 50-bp cutoff

  • Below 50 bp → indel
  • 50 bp or larger → structural variant
  • SVs can stretch to millions of base pairs
  • Cutoff is operational — reflects detection limits
SNV = typo  ·  indel = word edit  ·  SV = paragraph rearrangement

Two big categories

Copy number altering

  • Deletions (DEL)
  • Duplications (DUP)
  • Insertions (INS)
  • MCNVs · multi-allelic CNVs

Copy number neutral

  • Inversions (INV)
  • Translocations (BND)
  • Complex SVs

The seven SV types · cheat sheet

CodeTypeWhat it doesCN?
DELDeletionRemoves a segmentAltering
DUPDuplicationExtra copy of a segmentAltering
INSInsertionNew DNA added (often Alu/L1/SVA)Altering
MCNVMulti-allelic CNVVariable copy count across peopleAltering
INVInversionSegment flipped 180°Neutral
BNDTranslocationSwap between chromosomesNeutral
CPXComplexMultiple rearrangements togetherEither

The Collins 2020 SV catalog

Classification of structural variant types from gnomAD-SV
Figure 1. Classification of SV types. Top row: copy number altering (DEL, DUP, INS, MCNV). Bottom row: copy number neutral and complex (INV, BND, CPX). Each type has distinct molecular signatures and functional consequences. Source: Collins et al. 2020, Nature.

Mobile elements · the inserted sequences

ElementLengthNote
Alu~300 bpMost abundant mobile element
SVA~2,100 bpSINE-VNTR-Alu composite
LINE1~6,000 bpLong Interspersed Nuclear Element 1

These three create distinctive size peaks in the SV distribution.

§ 2

The gnomAD-SV
Catalog

Building gnomAD-SV

  • 14,891 genomes analyzed
  • 54% non-European ancestry
  • African, Latino, East Asian, European groups
  • Diverse cohort → SV patterns differ across populations
The most comprehensive SV reference to date.
Collins et al. 2020, Nature.

How many SVs in the catalog?

433,371
distinct SVs across all genomes
7,439
median per individual

Earlier estimate: 3,441 per genome (1000 Genomes) — half as many.

Size matters · three telltale peaks

  • Most SVs are under 10 kb
  • Distinct peaks at 300 · 2,100 · 6,000 bp
  • These are Alu · SVA · LINE1 insertions
  • Mobile elements still actively reshape the genome

Half of all SVs are singletons

49.8%
of SVs found in only ONE person out of 14,237
  • Many are recent mutations
  • Many disrupt important genes → kept rare by selection
  • Larger SVs → rarer than smaller ones

Population structure · same as SNVs

  • PCA on 15,395 common SVs separates ancestry groups
  • African, European, East Asian, Latino — all distinct clusters
  • SV patterns reflect demographic history
  • Some SVs are common in one population, absent in another
§ 3

Five Mechanisms
of SV Formation

Five mechanisms · one figure

Five mechanisms of SV formation
Figure 2. Five mechanisms generate SVs. NAHR (homologous misalignment), NHEJ (broken-end gluing), MEI (copy-paste via RNA), FoSTeS (replication template switching), Chromothripsis (catastrophic shattering). Each leaves distinct breakpoint signatures.

Mechanism 1 · NAHR

  • Non-Allelic Homologous Recombination
  • Two similar repeats misalign during meiosis
  • Recombination between them → deletion + duplication
  • Common in segmental duplications
Homologous sequences that shouldn't recombine — do.

Mechanism 2 · NHEJ

  • Non-Homologous End Joining
  • Repair of double-strand breaks
  • Glue ends together without a template
  • Signature: microhomology (1–10 bp) or blunt junctions
Quick & dirty repair. Often joins the wrong ends.

Mechanism 3 · Mobile Element Insertion

  1. Mobile element transcribed to RNA
  2. RNA reverse-transcribed back to DNA
  3. New copy inserts at a new location

Signature: target-site duplications (2–20 bp) flanking the insert.

Mechanism 4 · FoSTeS

  • Fork Stalling and Template Switching
  • Replication fork stalls on difficult sequence
  • Machinery switches to a different template
  • Result: complex SVs with multiple breakpoints

Common in fragile sites — regions prone to replication stress.

Mechanism 5 · Chromothripsis

"Chromosome shattering."
  • Chromosome shatters into many fragments at once
  • Cell tries to reassemble — often chaotically
  • Result: many breakpoints clustered in one region
  • Rare in healthy people · common in cancer

Mechanisms · at a glance

MechanismTriggerTypical SV
NAHRMisaligned repeats in meiosisDEL · DUP · INV
NHEJDouble-strand breakBND · CPX
MEIActive retrotransposonINS (Alu / SVA / L1)
FoSTeSStalled replication forkCPX with microhomology
ChromothripsisCatastrophic shatteringClustered CPX
§ 4

Impact &
Selection

Gene dosage · the dominant effect

25–29%
of rare protein-truncating events come from SVs
  • Comparable to the impact of nonsense SNVs
  • Delete an exon → loss of function, same as a stop codon
  • Duplicate a gene → too much protein product

Selection acts hardest on coding SVs

Selection coefficients for SVs by genomic context
Figure 3. Selection coefficients for SVs across genomic contexts. CN-altering SVs in protein-coding genes (especially high-pLI genes) face strong negative selection. Noncoding SVs face weaker selection. Inversions sit in between. Source: Collins et al. 2020, Nature.

The same selection logic as SNVs

Genes intolerant to loss-of-function SNVs
are also intolerant to copy number changes.
  • High-pLI genes resist both nonsense SNVs and deletions
  • Whether by point mutation or by deletion — loss is loss
  • Selection does not care about the molecular mechanism

Structural rearrangements without dosage change

  • Inversions: gene broken at breakpoint
  • Translocations: regulatory regions moved away
  • Position effects: gene next to wrong enhancer
  • Effects subtler than complete loss — but real
§ 5

Clinical
Syndromes

DiGeorge syndrome · 22q11.2 deletion

  • Recurrent deletion at 22q11.2 · ~3 Mb
  • Deletes ~30–40 genes at once
  • Caused by NAHR between flanking segmental duplications
  • Heart defects · immune deficiency · facial features · learning differences
One of the most common microdeletion syndromes (~1 in 4,000 births).

Williams syndrome · 7q11.23 deletion

  • Recurrent deletion at 7q11.23 · ~1.5 Mb
  • Removes ~26 genes including ELN (elastin)
  • Same mechanism: NAHR between segmental duplications
  • Cardiovascular defects · cognitive profile · hypersociability

Why are SVs hard to detect?

  • Short reads can't span long repeats
  • Breakpoints often fall in repetitive regions
  • Duplications and inversions can hide in copy-rich regions
  • Long-read sequencing changed the game
Same problem that left 8% of the HGP unfinished — lifted by long reads.
§ 6

Summary

What to take away

  • SV = rearrangement of ≥ 50 bp · 7 types
  • ~7,439 SVs per genome · half are singletons
  • Five mechanisms: NAHR · NHEJ · MEI · FoSTeS · chromothripsis
  • SVs cause 25–29% of rare protein-truncating events
  • Same selection logic as SNVs — loss is loss

Bridge · from single genes to many variants

Part II · finishing here

  • SNVs · indels · SVs
  • Mendelian disorders
  • One variant → one disease

Part III · next

  • Many variants per trait
  • Complex traits · GWAS
  • Polygenic risk
Next lecture

What if a trait is shaped by
thousands of variants?

Part III · Complex Traits & GWAS