Benchmarks

This page reports pylimma’s runtime and memory performance against R limma on four reference datasets drawn from the limma User’s Guide. The methodology is documented in benchmarks/README.md in the source tree; the raw JSON results are committed under benchmarks/results/.

Dataset set

Slot

Dataset

Shape

Upstream paper

Microarray

ALL (Chiaretti)

12,625 x 128

Chiaretti 2004, Blood

RNA-seq, two-group

GSE60450 (mammary)

27,179 x 12

Fu 2015, NCB

RNA-seq, scaling

Yoruba HapMap

38,415 x 69

Pickrell 2010, Nature

Splicing

Pasilla

14,599 x 7

Brooks 2011, GR

Full provenance, licence terms, and the extraction scripts for each dataset are documented in data/DATA_PROVENANCE.md in the source tree. All four upstream licences (Artistic-2.0, NCBI public data, LGPL, MIT) are compatible with pylimma’s GPL-3.0-or-later. Users running the benchmarks do not need R or any Bioconductor data packages - the CSVs are committed to the repo.

Pipelines

  • A (core): lm_fit -> contrasts_fit -> e_bayes -> top_table. Run on all four datasets, plus a 50-gene “overhead floor” subset of Estrogen.

  • B (voom): voom -> lm_fit -> contrasts_fit -> e_bayes -> top_table. Run on GSE60450 and Yoruba.

  • C (gene-set testing): pipeline B + camera against 50 curated gene sets. Run on GSE60450.

  • D (splicing): diff_splice + top_splice on Pasilla.

Reproducibility

All runs pin single-threaded BLAS:

export OMP_NUM_THREADS=1
export OPENBLAS_NUM_THREADS=1
export MKL_NUM_THREADS=1

and use n_reps >= 5 repetitions per dataset-pipeline combination. Reported values are medians with 25th/75th percentile error bars; the single fastest and slowest measurement per group are trimmed to control for OS noise and warm-up cost.

Numerical parity

pylimma reproduces R limma’s top-table output to floating-point precision on every pipeline in the benchmark suite. Median relative differences across all genes, computed on R 4.5.2 + limma 3.66.0:

Dataset

Pipeline

Max relative difference

Median relative difference

ALL

core

6.4e-10

1.6e-14

GSE60450

voom

3.5e-7

6.9e-12

Yoruba

voom

3.8e-6

7.8e-11

Pasilla

splicing

3.2e-10

6.3e-14

The four examples/<dataset>/<dataset>_R_vs_Python.ipynb notebooks recompute these numbers live and additionally plot R-vs-pylimma scatter plots per column. Running them locally will reproduce the parity table with your own R and Python installs.

Runtime comparison

Headline timing (median of 5 reps, single-threaded BLAS, Apple M-series macOS, R 4.5.2 + limma 3.66.0, pylimma 0.1.0):

Dataset

Pipeline

pylimma (s)

R limma (s)

R / pylimma

ALL (full)

pipeline_a

0.036

0.032

0.89

GSE60450

pipeline_a

0.032

0.030

0.94

GSE60450

pipeline_b

1.54

0.72

0.47

GSE60450

pipeline_c

1.50

0.72

0.48

Yoruba

pipeline_a

0.069

0.069

1.00

Yoruba

pipeline_b

3.07

1.67

0.54

Pasilla

pipeline_d

0.047

0.042

0.89

Interpretation

  • Core pipeline (lm_fit -> contrasts_fit -> e_bayes -> top_table): pylimma is within 15% of R across all datasets.

  • voom pipeline: pylimma is ~2x slower than R on both GSE60450 and Yoruba. The bottleneck is the LOWESS fit inside voom’s mean-variance modelling - R limma uses hand-written C (weighted_lowess.c) while pylimma uses statsmodels.nonparametric.lowess.

  • Splicing (diff_splice + top_splice) on Pasilla: pylimma is within 15% of R.

pylimma is within 2x of R limma across every pipeline-dataset combination in the reference suite.

Embedded notebook

[ ]:
benchmarks/run_benchmarks.ipynb

Profile output for pipeline_a on Yoruba is committed to benchmarks/results/profile_pipeline_a_yoruba.txt.

Reruns and updates

To rerun benchmarks on your own machine:

cd pylimma/benchmarks
export OMP_NUM_THREADS=1 OPENBLAS_NUM_THREADS=1 MKL_NUM_THREADS=1
python run_python.py
Rscript run_r.R
jupyter nbconvert --to notebook --execute run_benchmarks.ipynb \
    --output run_benchmarks.ipynb

Results are written to benchmarks/results/. Commit the JSON files alongside the notebook if you want them included in the docs.