Parity with R limma

pylimma is a faithful Python port of R limma, built against the R source as the specification rather than as a clean-room reimplementation. Every module is annotated with per-file SPDX attribution to the R source files it ports from, and each function’s numerical output is validated against R limma 3.66.0 via fixtures regenerated by tests/fixtures/generate_all_fixtures.R. The regression suite fails loudly on any drift from the R reference values.

Worked R-vs-Python notebooks

Four of the five worked-example datasets under examples/ have an R-vs-Python parity notebook that runs the same pipeline in R and pylimma side-by-side, compares the topTable output, and plots R-vs-pylimma scatters:

examples/all_chiaretti/all_R_vs_Python.ipynb - classical microarray pipeline (lm_fit + e_bayes) on the Chiaretti et al. (2004) ALL cohort.
examples/gse60450/gse60450_R_vs_Python.ipynb - bulk RNA-seq with voom on the Fu et al. (2015) mouse mammary data.
examples/pasilla/pasilla_R_vs_Python.ipynb - voom_with_quality_weights + duplicate_correlation on the Drosophila Pasilla data.
examples/yoruba/yoruba_R_vs_Python.ipynb - voom + gene-set testing (camera / roast) on Yoruba HapMap LCLs.

The fifth worked example, examples/kang_pbmc/, is an AnnData-native scRNA-seq pseudobulk demonstration and does not have an R-parity notebook by design - it showcases the scverse integration layer (get_eawp / put_eawp) rather than numerical parity.

Tolerances

Per-function-family tolerances are tabulated in Fixture-first validation strategy. Summary: expression and design matrices match R at rtol=1e-10; voom / vooma precision weights and lm_fit / contrasts_fit statistics at rtol=1e-8; p-values compared on the log10 scale with a max-diff tolerance of 1.0. Two known-divergence families (normexp_fit(method="saddle") and rotation-based Monte-Carlo gene-set tests) use a looser tolerance documented in Known differences from R limma.

Known differences from R limma

Four families of small, non-bug-for-bug divergences are documented in Known differences from R limma:

normexp_fit(method="saddle") drifts up to ~2e-4 from R because scipy’s Nelder-Mead and R’s nmmin share the algorithm but use different termination rules on the flat saddle-likelihood plateau. method="mle" remains bit-exact; the rma / rma75 / closed-form variants match at floating-point precision.
normalize_vsn drifts up to ~2.4e-4 from R/vsn 3.66.0 because the vsn profile log-likelihood is asymptotically flat in the per-column scale parameter. The transform formula is bit-faithful at R’s converged parameters; the residual disagreement is the L-BFGS-B path divergence within that flat valley.
Rotation-based gene-set tests (roast, mroast, romer, gene_set_test) differ by the Monte-Carlo sampling error of independently-seeded RNG streams (NumPy PCG64 vs R’s Mersenne Twister); deterministic summary statistics match R to machine precision.
mrlm stdev_unscaled drifts up to ~15% from R on perfectly-fit synthetic rows (residuals at machine epsilon) because LINPACK DQRDC2 and LAPACK SVD produce different noise patterns. Real-world residuals are 6+ orders of magnitude above the trigger condition.

Publication figures

Publication-quality figures (R-vs-pylimma log-FC scatter, moderated-t scatter, -log10(p) scatter, rank-concordance curve, runtime bars) are planned for the four parity notebooks and will be embedded here once generated.