BSMS205 · Genetics

Structural Variations
in Human Genomes

Chapter 13 · Part II · Variation

A question to start with

What if a whole
paragraph moves?

From single letters to whole chunks

SNVs & indels

1 base changed
"cat" → "bat"
Easy to detect

Structural variants

≥ 50 bp rearranged
Whole paragraph moves
Hard to detect

The number that should surprise you

~7,439

structural variants per genome (median)

From gnomAD-SV · 14,891 genomes
More than twice the 1000 Genomes estimate
Most are silent — but some cause disease

Roadmap for today

What is an SV — definition & types
The gnomAD-SV catalog · 14,891 genomes
How many SVs & what sizes
Five mechanisms of SV formation
Functional impact & selection
Clinical syndromes from SVs
Summary & bridge to Part III

§ 1

What Is a
Structural Variant?

The 50-bp cutoff

Below 50 bp → indel
50 bp or larger → structural variant
SVs can stretch to millions of base pairs
Cutoff is operational — reflects detection limits

SNV = typo · indel = word edit · SV = paragraph rearrangement

Two big categories

Copy number altering

Deletions (DEL)
Duplications (DUP)
Insertions (INS)
MCNVs · multi-allelic CNVs

Copy number neutral

Inversions (INV)
Translocations (BND)
Complex SVs

The seven SV types · cheat sheet

Code	Type	What it does	CN?
DEL	Deletion	Removes a segment	Altering
DUP	Duplication	Extra copy of a segment	Altering
INS	Insertion	New DNA added (often Alu/L1/SVA)	Altering
MCNV	Multi-allelic CNV	Variable copy count across people	Altering
INV	Inversion	Segment flipped 180°	Neutral
BND	Translocation	Swap between chromosomes	Neutral
CPX	Complex	Multiple rearrangements together	Either

The Collins 2020 SV catalog

Classification of structural variant types from gnomAD-SV — **Figure 1.** Classification of SV types. Top row: copy number altering (DEL, DUP, INS, MCNV). Bottom row: copy number neutral and complex (INV, BND, CPX). Each type has distinct molecular signatures and functional consequences. Source: Collins et al. 2020, *Nature*.

Mobile elements · the inserted sequences

Element	Length	Note
Alu	~300 bp	Most abundant mobile element
SVA	~2,100 bp	SINE-VNTR-Alu composite
LINE1	~6,000 bp	Long Interspersed Nuclear Element 1

These three create distinctive size peaks in the SV distribution.

§ 2

The gnomAD-SV
Catalog

Building gnomAD-SV

14,891 genomes analyzed
54% non-European ancestry
African, Latino, East Asian, European groups
Diverse cohort → SV patterns differ across populations

The most comprehensive SV reference to date.
Collins et al. 2020, Nature.

How many SVs in the catalog?

433,371

distinct SVs across all genomes

7,439

median per individual

Earlier estimate: 3,441 per genome (1000 Genomes) — half as many.

Size matters · three telltale peaks

Most SVs are under 10 kb
Distinct peaks at 300 · 2,100 · 6,000 bp
These are Alu · SVA · LINE1 insertions
Mobile elements still actively reshape the genome

Half of all SVs are singletons

49.8%

of SVs found in only ONE person out of 14,237

Many are recent mutations
Many disrupt important genes → kept rare by selection
Larger SVs → rarer than smaller ones

Population structure · same as SNVs

PCA on 15,395 common SVs separates ancestry groups
African, European, East Asian, Latino — all distinct clusters
SV patterns reflect demographic history
Some SVs are common in one population, absent in another

§ 3

Five Mechanisms
of SV Formation

Five mechanisms · one figure

Mechanism 1 · NAHR

Non-Allelic Homologous Recombination
Two similar repeats misalign during meiosis
Recombination between them → deletion + duplication
Common in segmental duplications

Homologous sequences that shouldn't recombine — do.

Mechanism 2 · NHEJ

Non-Homologous End Joining
Repair of double-strand breaks
Glue ends together without a template
Signature: microhomology (1–10 bp) or blunt junctions

Quick & dirty repair. Often joins the wrong ends.

Mechanism 3 · Mobile Element Insertion

Mobile element transcribed to RNA
RNA reverse-transcribed back to DNA
New copy inserts at a new location

Signature: target-site duplications (2–20 bp) flanking the insert.

Mechanism 4 · FoSTeS

Fork Stalling and Template Switching
Replication fork stalls on difficult sequence
Machinery switches to a different template
Result: complex SVs with multiple breakpoints

Common in fragile sites — regions prone to replication stress.

Mechanism 5 · Chromothripsis

"Chromosome shattering."

Chromosome shatters into many fragments at once
Cell tries to reassemble — often chaotically
Result: many breakpoints clustered in one region
Rare in healthy people · common in cancer

Mechanisms · at a glance

Mechanism	Trigger	Typical SV
NAHR	Misaligned repeats in meiosis	DEL · DUP · INV
NHEJ	Double-strand break	BND · CPX
MEI	Active retrotransposon	INS (Alu / SVA / L1)
FoSTeS	Stalled replication fork	CPX with microhomology
Chromothripsis	Catastrophic shattering	Clustered CPX

§ 4

Impact &
Selection

Gene dosage · the dominant effect

25–29%

of rare protein-truncating events come from SVs

Comparable to the impact of nonsense SNVs
Delete an exon → loss of function, same as a stop codon
Duplicate a gene → too much protein product

Selection acts hardest on coding SVs

Selection coefficients for SVs by genomic context — **Figure 3.** Selection coefficients for SVs across genomic contexts. CN-altering SVs in protein-coding genes (especially high-pLI genes) face strong negative selection. Noncoding SVs face weaker selection. Inversions sit in between. Source: Collins et al. 2020, *Nature*.

The same selection logic as SNVs

Genes intolerant to loss-of-function SNVs
are also intolerant to copy number changes.

High-pLI genes resist both nonsense SNVs and deletions
Whether by point mutation or by deletion — loss is loss
Selection does not care about the molecular mechanism

Structural rearrangements without dosage change

Inversions: gene broken at breakpoint
Translocations: regulatory regions moved away
Position effects: gene next to wrong enhancer
Effects subtler than complete loss — but real

§ 5

Clinical
Syndromes

DiGeorge syndrome · 22q11.2 deletion

Recurrent deletion at 22q11.2 · ~3 Mb
Deletes ~30–40 genes at once
Caused by NAHR between flanking segmental duplications
Heart defects · immune deficiency · facial features · learning differences

One of the most common microdeletion syndromes (~1 in 4,000 births).

Williams syndrome · 7q11.23 deletion

Recurrent deletion at 7q11.23 · ~1.5 Mb
Removes ~26 genes including ELN (elastin)
Same mechanism: NAHR between segmental duplications
Cardiovascular defects · cognitive profile · hypersociability

Why are SVs hard to detect?

Short reads can't span long repeats
Breakpoints often fall in repetitive regions
Duplications and inversions can hide in copy-rich regions
Long-read sequencing changed the game

Same problem that left 8% of the HGP unfinished — lifted by long reads.

§ 6

Summary

What to take away

SV = rearrangement of ≥ 50 bp · 7 types
~7,439 SVs per genome · half are singletons
Five mechanisms: NAHR · NHEJ · MEI · FoSTeS · chromothripsis
SVs cause 25–29% of rare protein-truncating events
Same selection logic as SNVs — loss is loss

Five things to take away from chapter thirteen. One — a structural variant is any rearrangement of fifty base pairs or more, and there are seven types: deletion, duplication, insertion, multi-allelic C N V, inversion, translocation, and complex. Two — each of us carries about seven thousand four hundred S Vs, and roughly half of all S Vs in the gnomAD-S V catalog are singletons found in only one person. Three — five molecular mechanisms produce the vast majority of S Vs: N A H R, N H E J, mobile element insertion, fork stalling and template switching, and chromothripsis. Four — S Vs are functionally important, accounting for twenty-five to twenty-nine percent of rare protein-truncating events, comparable to nonsense S N Vs. Five — selection acts on S Vs the same way it acts on S N Vs: loss-of-function intolerant genes resist both. Hold those five points; the rest is detail.

Bridge · from single genes to many variants

Part II · finishing here

SNVs · indels · SVs
Mendelian disorders
One variant → one disease

Part III · next

Many variants per trait
Complex traits · GWAS
Polygenic risk

Next lecture

What if a trait is shaped by
thousands of variants?

Part III · Complex Traits & GWAS