BSMS205 · Genetics

Population Structure

Chapter 21 · Part IV · Population Genetics
Today's central question

Why do the same variants
have different frequencies
in different populations?

Variation is not randomly distributed

  • Same variant · common in one group · rare in another
  • Patterns are gradients, not sharp boundaries
  • History has left traces in our genomes
  • And we can read them

Why this matters

Scientifically

  • GWAS false positives if ancestry is not controlled
  • Pathogenic vs benign depends on population baseline

Ethically

  • Population &neq; "race"
  • Differences are gradients, not divisions

Roadmap for today

  1. What is population structure?
  2. Race · ethnicity · ancestry · population
  3. How structure arises (five mechanisms)
  4. Detecting it with PCA and UMAP
  5. Concrete selection examples
  6. Quantifying with FST
  7. Why it matters in practice
§ 1

What Is
Population Structure?

A precise definition

Population structure = systematic differences in allele frequencies among groups within a species.
  • Detectable, but subtle
  • Most variation is within populations, not between
  • Not about categorising humans — about reading history

The out-of-Africa context

  • Humans evolved in Africa
  • Migration out began roughly sixty to one hundred thousand years ago
  • Geographic separation limited gene flow
  • Once gene flow is limited, populations diverge

Three forces drive divergence

  • Mutation — new variants arise independently in each population
  • Drift — random frequency changes, stronger in smaller populations
  • Selection — different environments favour different alleles
The result: smooth gradients · never sharp boundaries.
§ 2

Race · Ethnicity ·
Ancestry · Population

Four terms · very different meanings

ConceptWhat it isNature
RaceSocially defined by perceived traitsSocial construct
EthnicityCultural identity (language, tradition)Cultural
AncestryGenetic lineage · where ancestors livedBiological · probabilistic
PopulationGroup with shared gene flowBiological · operational

Why "race" is not a genetic category

  • No set of genes cleanly separates humans into "races"
  • Skin colour is controlled by a handful of genes
  • Two people of the same "race" can be more different genetically
    than either is from someone of a different "race"

What geneticists use instead

Ancestry

  • Probabilistic
  • 50% European / 30% East Asian / 20% African
  • From allele frequency patterns

Population

  • Operational
  • Defined by gene flow
  • Used to correct GWAS bias
§ 3

How Structure
Arises

Geographic isolation and drift

  • Oceans, deserts, mountains limit gene flow
  • Once gene flow drops, drift accumulates differences
  • Longer isolation → more divergence
  • Even identical starting populations drift apart

Founder effects

  • New population from a small founding group
  • Carries only a fraction of ancestral diversity
  • Some alleles absent · others over-represented by chance
  • Reduced diversity persists as the population grows

Case study · Finnish Disease Heritage

  • Population descended from ~four thousand founders, around four thousand years ago
  • Seventeenth-century famine → one-third of population lost
  • Thirty-six recessive disorders enriched in Finland
  • Each disease: a single founder mutation drifted to high frequency

Peltonen et al. 1999 · Norio 2003

Population bottlenecks

  • Population size drops sharply — disease, famine, migration
  • Rare alleles lost by chance during the squeeze
  • Recovered population has reduced diversity
  • Genomic signature: fewer rare variants, longer LD blocks

Admixture · the mosaic genome

  • Previously separated populations interbreed
  • Offspring inherit blocks of each ancestry
  • Modern examples: African Americans, Latino populations
  • A recent signature · easily detected in modern genomes

Selection and local adaptation

  • Different environments favour different alleles
  • Strong local selection → sharp frequency differences
  • Three iconic examples today:
    • Lactose tolerance in dairying populations
    • Sickle cell in malaria zones
    • Altitude adaptation in Tibetans
§ 4

Detecting Structure
with PCA and UMAP

The data problem

  • Each person: millions of SNPs
  • Each SNP is a dimension
  • We cannot visualise millions of dimensions
  • Need to compress to two or three dimensions

PCA · Principal Component Analysis

  • Finds new axes that capture the most variance
  • PC1: direction of largest genetic spread
  • PC2: next largest, perpendicular to PC1
  • Linear — fast, interpretable, widely used

What PCA shows in humans

  • PC1 typically separates African vs non-African ancestry
    (the oldest and deepest human split)
  • PC2 typically separates European vs East Asian
  • PC3+ captures finer substructure within continents

UMAP · when PCA is too linear

  • PCA is linear · finds straight-line directions
  • Real structure is sometimes curved
  • UMAP preserves both global and local structure
  • Better for seeing fine-scale sub-populations

gnomAD · human diversity on one figure

gnomAD UMAP showing global human genetic diversity
gnomAD v3.1 · ~141,000 genomes · UMAP projection. Clusters are distinct but blend at the edges — continuous gradients, not sharp boundaries.
Source: gnomAD Broad Institute.

Why this tooling matters

  • Used as GWAS covariates — corrects ancestry bias
  • Reveals admixture and migration patterns
  • Demonstrates genetic variation is continuous, not categorical
§ 5

Selection in Action ·
Three Examples

Example 1 · Lactose tolerance

  • Variant rs4988235 upstream of LCT gene
  • Keeps lactase production active into adulthood
  • Northern Europe: seventy to ninety percent
  • East Asia: under ten percent
  • Under selection for ~seven thousand five hundred years

Tishkoff et al. 2007 · Bersaglieri et al. 2004

Example 2 · Sickle cell and malaria

  • HbS allele in HBB gene · single amino acid change
  • Sub-Saharan Africa (malaria zones): ten to twenty percent
  • Outside malaria zones: nearly absent
  • Heterozygotes: protection against falciparum malaria
  • Homozygotes: sickle cell disease

Piel et al. 2010 · Gong et al. 2015

Example 3 · Tibetan altitude adaptation

  • Living above four thousand metres — severe oxygen scarcity
  • Variants in EPAS1 and EGLN1 under strong selection
  • Lower haemoglobin → avoids polycythemia
  • EPAS1 haplotype: ~87% Tibetans vs ~9% Han Chinese
  • Inherited from Denisovans — archaic admixture

Beall 2010 · Yi 2010 · Huerta-Sánchez 2014

§ 6

Quantifying Difference:
FST

The Fixation Index

FST measures how much allele frequencies
differ between populations relative to total variation.
  • FST = 0 · populations identical
  • FST = 1 · populations completely different

The striking human number

0.05 – 0.15
typical FST between continental groups
85 – 95% of human genetic variation exists within populations,
not between them.

FST varies across the genome

  • Most regions: low FST — similar across populations
  • Some regions: high FST — differentiated by selection
  • Example: skin pigmentation genes between African and European populations
  • High-FST scans → candidate regions of local adaptation
§ 7

Why It Matters
in Practice

Correcting GWAS · the main practical use

  • Cases and controls with different ancestry ratios → false positives
  • Any ancestry-differentiated variant looks associated
  • Fix: include leading PCs as covariates
  • Without correction: GWAS is flooded with spurious hits

Other applications

  • Tracing human history — migration, admixture, ancient splits
  • Variant interpretation — common-in-ancestry = likely benign
  • Equity — polygenic scores trained on one ancestry fail in others
  • Multi-ancestry cohorts → accuracy and fairness
§ 8

Summary

What to take away

  • Population structure = systematic frequency differences among groups
  • It arises from drift, founder effects, bottlenecks, admixture, selection
  • PCA / UMAP show variation is continuous, not categorical
  • FST for continental groups: 0.05 – 0.15 — most variation is within
  • Race &neq; population. Ancestry ≠ race.
Next lecture

How do variants travel together
on chromosomes?

Chapter 22 · From Mendel to Morgan — Discovery of Linkage