Coding vs Regulatory Variants: Why Most GWAS Hits Don’t Break Proteins

This article is part II of the Regulatory Variation & Functional Prediction series.

What GWAS is designed to detect

Genome-wide association studies test whether genetic variants are statistically associated with a trait across a population. The output of a GWAS is a set of loci where variation correlates with phenotypic differences.

Importantly, GWAS does not measure molecular function directly. It identifies associations, not mechanisms.

Interpreting GWAS results therefore requires additional biological and statistical context.

The early focus on protein-coding variation

Early human genetics was shaped by the study of protein-coding mutations. Variants that altered amino acid sequences or disrupted protein structure often produced clear and reproducible phenotypic effects.

This made coding variants a natural starting point for interpretation. Proteins offered a direct link between genotype and function.

Many foundational tools for variant annotation were built around this perspective.

An empirical observation from GWAS

As GWAS datasets grew in size and scope, a consistent pattern emerged. Most variants associated with complex traits were not located in protein-coding regions.

Instead, association signals were enriched in introns, intergenic regions, and known regulatory elements.

This observation has been replicated across traits, populations, and study designs.

Why coding variants are uncommon among GWAS hits

Variants with large effects on protein function are often subject to strong purifying selection. As a result, they tend to be rare in the population.

GWAS is optimized to detect common variants with modest effects rather than rare variants with large impacts.

The relative scarcity of coding variants in GWAS signals is therefore consistent with both evolutionary constraints and study design.

Regulatory variants and effect size

Regulatory variants typically influence gene expression levels, timing, or cellular specificity rather than protein sequence.

These effects are often subtle. A single variant may shift expression slightly rather than producing a binary change.

When many such variants act together, their combined influence can shape complex traits.

Proximity does not imply mechanism

GWAS hits are frequently assigned to the nearest gene as a practical heuristic. However, regulatory elements can act over long genomic distances.

Enhancers may influence genes hundreds of kilobases away or interact with multiple targets depending on cellular context.

As a result, physical proximity alone is not sufficient to infer causal mechanisms.

The role of functional annotation

Functional annotations provide valuable context by identifying overlaps with known regulatory features such as enhancers, promoters, and chromatin marks.

These annotations help prioritize variants but do not directly predict how a sequence change alters regulatory activity.

Many variants overlap multiple annotated features, complicating interpretation.

Coding and regulatory variation are complementary

The prominence of regulatory variants in GWAS does not diminish the importance of coding variation.

Coding variants remain central to the study of rare disease, Mendelian traits, and pharmacogenomics.

Different classes of variation address different biological and clinical questions.

GWAS as a starting point

GWAS identifies regions of interest rather than complete mechanistic explanations.

For noncoding associations, understanding how variants influence regulation becomes a necessary downstream step.

Functional interpretation builds on GWAS rather than replacing it.

Looking ahead

GWAS identifies regions of interest, not mechanisms. For noncoding signals, the next step is learning how regulation is organized and how genetic variation perturbs it.

Understanding why most GWAS signals fall outside coding regions reframes the problem of disease biology. The question becomes not “Which protein is broken?” but “Which regulatory process is altered?”


Continue Reading →

Next: From Variants to Effects