BSMS205 · Genetics

The CHM13
Cell Line

Chapter 3 · Part I · The Human Genome
A puzzle to start with

Why couldn't they just
sequence a normal person?

The two-puzzle problem

Normal diploid

  • Two different chromosome copies
  • ~4–5 million heterozygous variants
  • Repeats: which copy did the read come from?

What an assembler wants

  • Both copies identical
  • No ambiguity in repeats
  • One puzzle, not two interleaved ones
Where the gaps lived
151,000,000
base pairs of unknown sequence in GRCh38
  • Mostly in repetitive regions
  • Centromeres · rDNA arrays · segmental duplications
  • Heterozygosity made them impossible to assemble

Roadmap for today

  1. Why heterozygosity breaks assembly
  2. What is a complete hydatidiform mole
  3. "Functionally haploid" · the magic property
  4. The hTERT trick · making cells immortal
  5. Quality control · ancestry · ethics
  6. Why CHM13 specifically · and its limits
  7. Summary & what comes next
§ 1

Why Heterozygosity
Breaks Assembly

One difference per thousand bases

1 / 1,000
bases differ between maternal and paternal copies
  • = about 0.1% of the genome
  • = ~4–5 million heterozygous variants
  • = the normal, healthy human state

Where the read came from · matters

  • Sequencing breaks DNA into millions of fragments
  • Computer reassembles by finding overlaps
  • In repeats: many fragments look nearly identical
  • Maternal? Paternal? Cannot tell → gaps or errors

The puzzle problem · in one figure

Heterozygosity makes assembly of repetitive regions ambiguous
Figure 1. Why heterozygosity breaks genome assembly. Reads from two similar-but-not-identical chromosome copies cannot be confidently assigned in repeats — leaving gaps or errors.

The GRCh38 compromise

  • Built from BAC clones · 100–200 kb pieces in bacteria
  • Pieces came from multiple individuals
  • Result: a mosaic of haplotypes
  • ~151 Mb of N's in repetitive regions
Mosaic + heterozygosity + short reads = unfinishable.
§ 2

What Is a
Hydatidiform Mole?

Normal fertilization · 23 + 23

  • Egg contributes 23 chromosomes from mom
  • Sperm contributes 23 chromosomes from dad
  • Zygote: diploid, 46 chromosomes
  • Two different versions of every chromosome

What goes wrong in a CHM

  • Egg loses its nucleus — no maternal DNA
  • Sperm fertilizes the empty egg
  • Sperm DNA duplicates itself (endoreduplication)
  • Result: 46 chromosomes — but all from dad

Normal vs CHM · in one figure

Normal fertilization vs complete hydatidiform mole formation
Figure 2. Normal fertilization (top) creates a diploid heterozygous genome. A complete hydatidiform mole (bottom) is diploid but homozygous — two identical paternal chromosome sets.

Why 46,XX · not 46,YY

  • Most CHMs have 46,XX karyotype
  • Sperm carrying X duplicates → XX
  • Or two X-bearing sperm fertilize one empty egg
  • YY is not viable — humans need at least one X
  • CHM13 is 46,XX · all from one father
A serious moment

The biological source matters

  • A CHM is a failed pregnancy
  • Tissue donated from a real patient in 1993
  • Used with consent and ethical review
  • The data set has a human story behind it
§ 3

"Functionally
Haploid"

Three states · clearly distinguished

TermChromosomesHet variantsExample
True haploid230 (no pair)Sperm, Egg
Diploid (normal)46~4–5 millionYou, me
Functionally haploid46~few thousandCHM13

Diploid in number. Haploid in information.

The "functionally haploid" picture

Functionally haploid explained: diploid in number, homozygous in sequence
Figure 3. CHM13 has 46 chromosomes (diploid number) but both copies are nearly identical (homozygous) — behaving like a haploid genome for assembly purposes.

Reduction in heterozygosity

Normal diploid

~4,500,000
het variants

CHM13

~few thousand
het variants

Reduction: > 99.99%. Below 0.01% of the genome.

Not perfect · but close enough

  • A few thousand residual heterozygous variants
  • One megabase-scale deletion in chr15 rDNA array
  • From rare endoreduplication errors + culture mutations
  • < 0.01% of the genome — assembly stays simple

Two puzzles vs one

Normal diploid: two similar puzzles
with pieces mixed in one box.
CHM13: one puzzle, twice over.
  • Centromere: 2 kb repeat × 1,400 copies
  • Diploid: mom 1,387 vs dad 1,421, all slightly different
  • CHM13: same count, same sequence — solvable
§ 4

The hTERT Trick
Making Cells Immortal

The Hayflick limit

  • Normal somatic cells divide 40–60 times
  • Then they stop · senescence
  • Limit set by telomeres · TTAGGG caps
  • Each division: telomere shortens by 50–200 bp

Why a limit even exists

Senescence is a tumor-suppressor:
damaged cells cannot divide forever.
  • Healthy in the body · protects against cancer
  • Bad for research · cells run out mid-project
  • T2T needed billions of identical cells over years

Telomerase to the rescue

  • Telomerase = enzyme that adds TTAGGG back
  • Two parts:
    TERT (catalytic protein)
    TERC (RNA template)
  • Naturally on in germ cells, stem cells
  • Off in most adult tissues — that's why we age

Cancer's shortcut

85–95%
of cancers reactivate telomerase
  • The other 5–15% use ALT
  • Both bypass the Hayflick limit
  • Cancer = immortalized + dysregulated

Adding hTERT · the engineered fix

  • Introduce human TERT gene via viral vector
  • Cells make telomerase continuously
  • Telomeres stay long and stable
  • Cells can divide indefinitely
  • Crucially: chromosomes don't change sequence

What hTERT bought T2T

  • Unlimited DNA across the whole project
  • Genetic stability through many passages
  • DNA from year 1 = DNA from year 5
  • Multiple sequencing platforms · same source material
Same cells. Same DNA. For years.
§ 5

Quality Control
Ancestry · Ethics

Karyotyping · two methods

G-banding

  • Stain → light/dark band patterns
  • Reveals chromosome structure
  • Detects translocations, deletions

Spectral karyotyping (SKY)

  • Each chromosome → unique color
  • Detects chromosome swaps at a glance
  • Confirms 46,XX with no abnormalities

The actual karyotype

CHM13 karyotype confirmed by SKY and G-banding
Figure 4. CHM13 karyotyping. (a) Spectral karyotyping (SKY) — each chromosome a different color. (b) G-banding — staining patterns confirm normal 46,XX. · Miga et al., Nature 2020, Ext. Data Fig. 1 (CC-BY 4.0).

Sequence-level QC

  • Uniform read coverage · no big deletions or duplications
  • Low heterozygosity confirmed everywhere
  • Same DNA sequence across multiple years
  • Stable across many passages

Ancestry · what's in the genome

  • Analyzed via maximum likelihood admixture
  • ~70–80% European ancestry
  • Small admixture: South Asian, East Asian, Native American
  • ~1–2% Neanderthal DNA · like most non-Africans

The admixture plot

Maximum likelihood admixture analysis of CHM13 ancestry
Figure 5. Maximum-likelihood admixture analysis. CHM13 (highlighted bar) is predominantly European, with smaller contributions from other reference populations. Reading the figure: each vertical bar = one individual; each color = one ancestral population component · Miga et al. 2020, Nature, Ext. Data Fig. 2 (CC-BY 4.0).

Does ancestry matter for T2T?

What CHM13 gives

  • Complete structural template
  • No heterozygosity confusion
  • Gap-free assembly

What it doesn't give

  • Population-specific variants
  • Diversity across humans
  • Heterozygous structure
A point worth making explicit

No single genome is "humanity"

  • The limit isn't CHM13's European ancestry per se
  • It's that any single genome is partial
  • An African or East Asian CHM would have the same issue
  • Solution: pangenome · many genomes together
§ 6

Why CHM13?
And Its Limits

Six reasons CHM13 won

  1. Already well-characterized since the 1990s
  2. Stable 46,XX karyotype across passages
  3. X-bearing → could finish X chromosome first
  4. hTERT immortalization worked cleanly
  5. High-molecular-weight DNA → ultra-long reads
  6. Community-supported: Genome in a Bottle, shared protocols

What T2T actually used CHM13 for

  • 30× PacBio HiFi · ~20 kb high-accuracy reads
  • 50× Oxford Nanopore ultra-long · > 100 kb
  • 100× Illumina short reads · for polishing
  • Plus Hi-C, Bionano, Strand-seq

CHM13 vs GRCh38 · the contrast

FeatureCHM13GRCh38
SourceOne CHM, duplicated paternalMosaic, multiple donors
Het variants~few thousand~4–5 M per individual
GapsZero (3.055 Bbp)~151 Mb
CentromeresAll 24 completeMostly absent
Acrocentric arms66.1 Mb resolvedAlmost entirely missing
Y chromosomeFrom HG002 in v2.0> 50% missing

What CHM13 cannot tell you

  • Population diversity · only one source genome
  • Phasing · which variants travel together on a chromosome
  • Compound heterozygotes · two different alleles per gene
  • Allele-specific expression · maternal vs paternal output
  • Heterozygous structural variants · common in real people

The chimeric reference problem

  • CHM13 is 46,XX · no Y
  • T2T-CHM13 v2.0 stitches in HG002 Y
  • The reference is now chimeric
  • Two different genetic backgrounds in one file
§ 7

Summary

What to take away

  • Heterozygosity in repeats prevented complete assembly
  • Complete hydatidiform mole = empty egg + duplicated sperm DNA
  • CHM13 is functionally haploid: 46 chromosomes, < 0.01% het
  • hTERT = unlimited, stable cells across years and platforms
  • One source genome → structural template, not full diversity
Next lecture

One genome is the map.
How do we capture everyone?

Chapter 4 · The Human Pangenome