Fixture-first validation strategy

pylimma is a faithful port of R limma, not a re-implementation. The primary validation mechanism is a corpus of pre-computed CSV fixtures, generated by a single R script, that define the expected output for every ported function on a range of inputs. A passing test is “pylimma’s output matches R limma’s fixture value within documented tolerance.” A failing test is a parity regression.

This document explains where fixtures live, what R/limma version they were generated against, and how to regenerate them.

Why fixture-first

The principle is simple: the R source explains how limma works; the fixtures define what its output must be. When porting a new function:

Add R code to tests/fixtures/generate_all_fixtures.R that produces the expected output.
Run the R script to write fresh CSVs into tests/fixtures/.
Write the Python implementation.
Add a test in tests/test_r_parity.py that loads the CSV and compares.

The fixture is the contract. Implementation is done when Python matches the fixture within tolerance. Nothing else. This prevents drift where an implementation “interprets” or “improves” the R algorithm.

Where fixtures live

CSV outputs: pylimma/tests/fixtures/*.csv, one or more per function.
Generator: pylimma/tests/fixtures/generate_all_fixtures.R, one top-level script that dispatches to per-module sub-scripts (generate_lmfit_fixtures.R, generate_ebayes_fixtures.R, generate_squeeze_var_fixtures.R, etc.).
Python parity tests: pylimma/tests/test_r_parity.py and the per-module test_*.py files.

Tolerances

Output	Tolerance
Expression matrices, design matrices	`rtol=1e-10`
Precision weights (voom, vooma)	`rtol=1e-8`
Quality weights (arrayWeights, arrayWeightsQuick)	`rtol=1e-8`
Coefficients, t-statistics, sigma	`rtol=1e-8`
P-values	log10 scale, max diff 1.0
`normexp_fit(method="saddle")` parameters	`rtol=1e-3` (see Known differences from R limma)
Rotation-test Monte-Carlo p-values	log10 scale, max diff 0.5 (see Known differences from R limma)

R and limma versions

Fixtures are tied to specific R and Bioconductor versions. The generator script prints the versions it ran against in its first lines of output; running the Python parity tests does not require R or limma.

Target versions for the v0.1.0 fixture set:

R: current release (run R.version.string to confirm)
Bioconductor limma: 3.66.0

Regenerating fixtures

You only need R installed if you are porting a new function or upgrading the pinned limma version. The committed fixtures are sufficient for CI and for downstream users.

To regenerate:

# In R:
install.packages("BiocManager")
BiocManager::install("limma")

# In shell, from the pylimma repo root:
cd pylimma/tests/fixtures
Rscript generate_all_fixtures.R

The script will overwrite every CSV it generates. Review the diff before committing - an unexpected change is a signal that either R or pylimma has drifted, and both deserve an audit before the fixtures are updated.

Versioning fixtures alongside code

Fixture CSVs are committed to the repository. When a fixture regenerates with meaningfully different numbers (more than round-trip precision), the commit that lands the new CSV must also update whatever pylimma code is necessary to keep tests passing, or document the gap in Known differences from R limma. Never commit updated fixtures without explaining why they changed.