Benchmarks
This page reports pylimma’s runtime and memory performance against R
limma on four reference datasets drawn from the limma User’s Guide.
The methodology is documented in benchmarks/README.md in the
source tree; the raw JSON results are committed under
benchmarks/results/.
Dataset set
Slot |
Dataset |
Shape |
Upstream paper |
|---|---|---|---|
Microarray |
ALL (Chiaretti) |
12,625 x 128 |
Chiaretti 2004, Blood |
RNA-seq, two-group |
GSE60450 (mammary) |
27,179 x 12 |
Fu 2015, NCB |
RNA-seq, scaling |
Yoruba HapMap |
38,415 x 69 |
Pickrell 2010, Nature |
Splicing |
Pasilla |
14,599 x 7 |
Brooks 2011, GR |
Full provenance, licence terms, and the extraction scripts for each
dataset are documented in data/DATA_PROVENANCE.md in the source
tree. All four upstream licences (Artistic-2.0, NCBI
public data, LGPL, MIT) are compatible with pylimma’s
GPL-3.0-or-later. Users running the benchmarks do not need R or any
Bioconductor data packages - the CSVs are committed to the repo.
Pipelines
A (core):
lm_fit->contrasts_fit->e_bayes->top_table. Run on all four datasets, plus a 50-gene “overhead floor” subset of Estrogen.B (voom):
voom->lm_fit->contrasts_fit->e_bayes->top_table. Run on GSE60450 and Yoruba.C (gene-set testing): pipeline B +
cameraagainst 50 curated gene sets. Run on GSE60450.D (splicing):
diff_splice+top_spliceon Pasilla.
Reproducibility
All runs pin single-threaded BLAS:
export OMP_NUM_THREADS=1
export OPENBLAS_NUM_THREADS=1
export MKL_NUM_THREADS=1
and use n_reps >= 5 repetitions per dataset-pipeline combination.
Reported values are medians with 25th/75th percentile error bars;
the single fastest and slowest measurement per group are trimmed to
control for OS noise and warm-up cost.
Numerical parity
pylimma reproduces R limma’s top-table output to floating-point precision on every pipeline in the benchmark suite. Median relative differences across all genes, computed on R 4.5.2 + limma 3.66.0:
Dataset |
Pipeline |
Max relative difference |
Median relative difference |
|---|---|---|---|
ALL |
core |
6.4e-10 |
1.6e-14 |
GSE60450 |
voom |
3.5e-7 |
6.9e-12 |
Yoruba |
voom |
3.8e-6 |
7.8e-11 |
Pasilla |
splicing |
3.2e-10 |
6.3e-14 |
The four examples/<dataset>/<dataset>_R_vs_Python.ipynb notebooks
recompute these numbers live and additionally plot R-vs-pylimma
scatter plots per column. Running them locally will reproduce the
parity table with your own R and Python installs.
Runtime comparison
Headline timing (median of 5 reps, single-threaded BLAS, Apple M-series macOS, R 4.5.2 + limma 3.66.0, pylimma 0.1.0):
Dataset |
Pipeline |
pylimma (s) |
R limma (s) |
R / pylimma |
|---|---|---|---|---|
ALL (full) |
|
0.036 |
0.032 |
0.89 |
GSE60450 |
|
0.032 |
0.030 |
0.94 |
GSE60450 |
|
1.54 |
0.72 |
0.47 |
GSE60450 |
|
1.50 |
0.72 |
0.48 |
Yoruba |
|
0.069 |
0.069 |
1.00 |
Yoruba |
|
3.07 |
1.67 |
0.54 |
Pasilla |
|
0.047 |
0.042 |
0.89 |
Interpretation
Core pipeline (
lm_fit->contrasts_fit->e_bayes->top_table): pylimma is within 15% of R across all datasets.voom pipeline: pylimma is ~2x slower than R on both GSE60450 and Yoruba. The bottleneck is the LOWESS fit inside
voom’s mean-variance modelling - R limma uses hand-written C (weighted_lowess.c) while pylimma usesstatsmodels.nonparametric.lowess.Splicing (
diff_splice+top_splice) on Pasilla: pylimma is within 15% of R.
pylimma is within 2x of R limma across every pipeline-dataset combination in the reference suite.
Embedded notebook
[ ]:
benchmarks/run_benchmarks.ipynb
Profile output for pipeline_a on Yoruba is committed to
benchmarks/results/profile_pipeline_a_yoruba.txt.
Reruns and updates
To rerun benchmarks on your own machine:
cd pylimma/benchmarks
export OMP_NUM_THREADS=1 OPENBLAS_NUM_THREADS=1 MKL_NUM_THREADS=1
python run_python.py
Rscript run_r.R
jupyter nbconvert --to notebook --execute run_benchmarks.ipynb \
--output run_benchmarks.ipynb
Results are written to benchmarks/results/. Commit the JSON
files alongside the notebook if you want them included in the docs.