pylimma documentation
pylimma is a faithful Python port of R limma, the most widely used
Bioconductor package for differential expression analysis. It provides
the full linear-modelling pipeline (lm_fit, contrasts_fit,
e_bayes, top_table), voom for RNA-seq, gene-set testing
(camera, roast, fry, romer), normalisation, batch correction, and
differential splicing - all validated to match R limma output within
rtol=1e-6 on fixture parity tests.
pylimma accepts numpy arrays, pandas DataFrames, AnnData objects, or
limma-style EList dict subclasses, with centralised polymorphic
dispatch so the same code works across the scverse ecosystem and
R-style workflows.
Quickstart
import numpy as np
from pylimma import lm_fit, contrasts_fit, e_bayes, top_table
expr = np.random.normal(size=(100, 6))
design = np.column_stack([np.ones(6), [0, 0, 0, 1, 1, 1]])
fit = lm_fit(expr, design)
fit = contrasts_fit(fit, contrasts=np.array([[0], [1]]))
fit = e_bayes(fit)
print(top_table(fit, coef=0).head())
See Quickstart for the full walk-through.
Contents
Worked examples
- Differential Expression Analysis of the ALL Chiaretti Microarray Dataset with pylimma
- Dataset
- Pipeline
- 1. Load the expression matrix + phenotype targets
- 2. Per-sample distribution QC
- 3. log-expression density (KDE)
- 4. MDS (coloured by BT)
- 5. Design matrix
- 6. Fit linear models and contrasts (no voom)
- 7. Empirical Bayes moderation
- 8. Top table
- 9. MD + volcano plots
- 10. Heatmap of top 50 DE probes
- Differential Expression Analysis of Mouse Mammary RNA-seq with pylimma
- Dataset
- Pipeline
- 1. Load counts and sample metadata
- 2. Library sizes
- 3. Filter low-expression genes
- 4. log-CPM density before / after filter
- 5. MDS (coloured by group)
- 6. Design matrix
- 7. voom
- 8. Fit linear models and contrasts
- 9. Empirical Bayes moderation
- 10. Top table
- 11. MD plot
- 12. Volcano plot
- 13. Heatmap of top 50 DE genes
- Differential Splicing Analysis with pylimma (pasilla dataset)
- Differential Expression Analysis of Yoruba HapMap RNA-seq with pylimma
- Dataset
- Pipeline
- 1. Load counts and sample metadata
- 2. Library sizes
- 3. Filter low-expression genes
- 4. log-CPM density before / after filter
- 5. MDS (coloured by group)
- 6. Design matrix
- 7. voom
- 8. Fit linear models and contrasts
- 9. Empirical Bayes moderation
- 10. Top table
- 11. MD plot
- 12. Volcano plot
- 13. Heatmap of top 50 DE genes
- Single-cell pseudobulk differential expression with pylimma (Kang 2018 PBMC)
- Differential abundance analysis of proteomics data with pylimma
Citation
Please cite the original limma papers when using pylimma:
Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology 3, Article 3.
Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W., and Smyth, G. K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43(7), e47.
Law, C. W., Chen, Y., Shi, W., and Smyth, G. K. (2014). voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology 15, R29.
A pylimma preprint / Zenodo DOI will be listed here once published. Until then, cite the GitHub repository and the version tag.