Benchmarks

This page reports pylimma’s runtime and memory performance against R limma on four reference datasets drawn from the limma User’s Guide. The methodology is documented in benchmarks/README.md in the source tree; the raw JSON results are committed under benchmarks/results/.

Dataset set

Slot	Dataset	Shape	Upstream paper
Microarray	ALL (Chiaretti)	12,625 x 128	Chiaretti 2004, Blood
RNA-seq, two-group	GSE60450 (mammary)	27,179 x 12	Fu 2015, NCB
RNA-seq, scaling	Yoruba HapMap	38,415 x 69	Pickrell 2010, Nature
Splicing	Pasilla	14,599 x 7	Brooks 2011, GR

Full provenance, licence terms, and the extraction scripts for each dataset are documented in data/DATA_PROVENANCE.md in the source tree. All four upstream licences (Artistic-2.0, NCBI public data, LGPL, MIT) are compatible with pylimma’s GPL-3.0-or-later. Users running the benchmarks do not need R or any Bioconductor data packages - the CSVs are committed to the repo.

Pipelines

A (core): lm_fit -> contrasts_fit -> e_bayes -> top_table. Run on all four datasets, plus a 50-gene “overhead floor” subset of Estrogen.
B (voom): voom -> lm_fit -> contrasts_fit -> e_bayes -> top_table. Run on GSE60450 and Yoruba.
C (gene-set testing): pipeline B + camera against 50 curated gene sets. Run on GSE60450.
D (splicing): diff_splice + top_splice on Pasilla.

Reproducibility

All runs pin single-threaded BLAS:

export OMP_NUM_THREADS=1
export OPENBLAS_NUM_THREADS=1
export MKL_NUM_THREADS=1

and use n_reps >= 5 repetitions per dataset-pipeline combination. Reported values are medians with 25th/75th percentile error bars; the single fastest and slowest measurement per group are trimmed to control for OS noise and warm-up cost.

Numerical parity

pylimma reproduces R limma’s top-table output to floating-point precision on every pipeline in the benchmark suite. Median relative differences across all genes, computed on R 4.5.2 + limma 3.66.0:

Dataset	Pipeline	Max relative difference	Median relative difference
ALL	core	6.4e-10	1.6e-14
GSE60450	voom	3.5e-7	6.9e-12
Yoruba	voom	3.8e-6	7.8e-11
Pasilla	splicing	3.2e-10	6.3e-14

The four examples/<dataset>/<dataset>_R_vs_Python.ipynb notebooks recompute these numbers live and additionally plot R-vs-pylimma scatter plots per column. Running them locally will reproduce the parity table with your own R and Python installs.

Runtime comparison

Headline timing (median of 5 reps, single-threaded BLAS, Apple M-series macOS, R 4.5.2 + limma 3.66.0, pylimma 0.1.0):

Dataset	Pipeline	pylimma (s)	R limma (s)	R / pylimma
ALL (full)	`pipeline_a`	0.036	0.032	0.89
GSE60450	`pipeline_a`	0.032	0.030	0.94
GSE60450	`pipeline_b`	1.54	0.72	0.47
GSE60450	`pipeline_c`	1.50	0.72	0.48
Yoruba	`pipeline_a`	0.069	0.069	1.00
Yoruba	`pipeline_b`	3.07	1.67	0.54
Pasilla	`pipeline_d`	0.047	0.042	0.89

Interpretation

Core pipeline (lm_fit -> contrasts_fit -> e_bayes -> top_table): pylimma is within 15% of R across all datasets.
voom pipeline: pylimma is ~2x slower than R on both GSE60450 and Yoruba. The bottleneck is the LOWESS fit inside voom’s mean-variance modelling - R limma uses hand-written C (weighted_lowess.c) while pylimma uses statsmodels.nonparametric.lowess.
Splicing (diff_splice + top_splice) on Pasilla: pylimma is within 15% of R.

pylimma is within 2x of R limma across every pipeline-dataset combination in the reference suite.

Embedded notebook

[ ]:

benchmarks/run_benchmarks.ipynb

Profile output for pipeline_a on Yoruba is committed to benchmarks/results/profile_pipeline_a_yoruba.txt.

Reruns and updates

To rerun benchmarks on your own machine:

cd pylimma/benchmarks
export OMP_NUM_THREADS=1 OPENBLAS_NUM_THREADS=1 MKL_NUM_THREADS=1
python run_python.py
Rscript run_r.R
jupyter nbconvert --to notebook --execute run_benchmarks.ipynb \
    --output run_benchmarks.ipynb

Results are written to benchmarks/results/. Commit the JSON files alongside the notebook if you want them included in the docs.