Parity with R limma
===================

pylimma is a faithful Python port of R limma, built against the R
source as the specification rather than as a clean-room
reimplementation. Every module is annotated with per-file SPDX
attribution to the R source files it ports from, and each function's
numerical output is validated against R limma 3.66.0 via fixtures
regenerated by ``tests/fixtures/generate_all_fixtures.R``. The
regression suite fails loudly on any drift from the R reference
values.

Worked R-vs-Python notebooks
----------------------------

Four of the five worked-example datasets under ``examples/`` have an
R-vs-Python parity notebook that runs the same pipeline in R and
pylimma side-by-side, compares the ``topTable`` output, and plots
R-vs-pylimma scatters:

- ``examples/all_chiaretti/all_R_vs_Python.ipynb`` - classical
  microarray pipeline (``lm_fit`` + ``e_bayes``) on the
  Chiaretti *et al.* (2004) ALL cohort.
- ``examples/gse60450/gse60450_R_vs_Python.ipynb`` - bulk RNA-seq
  with voom on the Fu *et al.* (2015) mouse mammary data.
- ``examples/pasilla/pasilla_R_vs_Python.ipynb`` -
  ``voom_with_quality_weights`` + ``duplicate_correlation`` on the
  *Drosophila* Pasilla data.
- ``examples/yoruba/yoruba_R_vs_Python.ipynb`` - voom + gene-set
  testing (``camera`` / ``roast``) on Yoruba HapMap LCLs.

The fifth worked example, ``examples/kang_pbmc/``, is an
AnnData-native scRNA-seq pseudobulk demonstration and does not have
an R-parity notebook by design - it showcases the scverse integration
layer (``get_eawp`` / ``put_eawp``) rather than numerical parity.

Tolerances
----------

Per-function-family tolerances are tabulated in
:doc:`fixtures`. Summary: expression and design matrices match R at
``rtol=1e-10``; voom / vooma precision weights and lm_fit /
contrasts_fit statistics at ``rtol=1e-8``; p-values compared on the
log10 scale with a max-diff tolerance of 1.0. Two known-divergence
families (``normexp_fit(method="saddle")`` and rotation-based
Monte-Carlo gene-set tests) use a looser tolerance documented in
:doc:`known_differences`.

Known differences from R limma
------------------------------

Four families of small, non-bug-for-bug divergences are documented in
:doc:`known_differences`:

- ``normexp_fit(method="saddle")`` drifts up to ~2e-4 from R because
  scipy's Nelder-Mead and R's ``nmmin`` share the algorithm but use
  different termination rules on the flat saddle-likelihood plateau.
  ``method="mle"`` remains bit-exact; the ``rma`` / ``rma75`` /
  closed-form variants match at floating-point precision.
- ``normalize_vsn`` drifts up to ~2.4e-4 from R/vsn 3.66.0 because
  the vsn profile log-likelihood is asymptotically flat in the
  per-column scale parameter. The transform formula is bit-faithful
  at R's converged parameters; the residual disagreement is the
  L-BFGS-B path divergence within that flat valley.
- Rotation-based gene-set tests (``roast``, ``mroast``, ``romer``,
  ``gene_set_test``) differ by the Monte-Carlo sampling error of
  independently-seeded RNG streams (NumPy PCG64 vs R's Mersenne
  Twister); deterministic summary statistics match R to machine
  precision.
- ``mrlm`` ``stdev_unscaled`` drifts up to ~15% from R on
  perfectly-fit synthetic rows (residuals at machine epsilon)
  because LINPACK ``DQRDC2`` and LAPACK SVD produce different
  noise patterns. Real-world residuals are 6+ orders of magnitude
  above the trigger condition.

Publication figures
-------------------

Publication-quality figures (R-vs-pylimma log-FC scatter, moderated-t
scatter, -log10(p) scatter, rank-concordance curve, runtime bars) are
planned for the four parity notebooks and will be embedded here once
generated.