pylimma.lm_fit

pylimma.lm_fit(data, design=None, ndups=None, spacing=None, block=None, correlation=None, weights=None, method='ls', key='pylimma', layer=None, weights_layer=None, **mrlm_kwargs)[source]

Fit linear models to expression data.

This is the main entry point for fitting gene-wise linear models. Accepts either an AnnData object or a numpy array/DataFrame.

Parameters:
  • data (AnnData, ndarray, or DataFrame) –

    Expression data. If AnnData, reads from adata.X (samples x genes) or specified layer, and stores results in adata.uns[key]. If ndarray or DataFrame, expects (n_genes, n_samples) and returns results as a dict.

    Important: Expression values must be normalised and log-transformed before calling this function.

  • design (ndarray or str, optional) – Design matrix (n_samples, n_coefficients) or formula string like “~ group + batch”. If None, uses intercept-only model.

  • ndups (int, default 1) – Number of within-array duplicate spots.

  • spacing (int, default 1) – Spacing between duplicate spots in the expression matrix.

  • block (array_like, optional) – Block indicator for correlated samples. When provided, samples within the same block are assumed to be correlated.

  • correlation (float, optional) – Intra-block or intra-duplicate correlation. Required when ndups > 1 or block is provided. Use duplicate_correlation() to estimate this value.

  • weights (ndarray, optional) – Observation weights. Can be: - 1D array of length n_samples (array weights) - 2D array of shape (n_genes, n_samples) (gene-specific weights)

  • method (str, default "ls") – Fitting method. Options: - “ls”: least squares (default) - “robust”: robust regression using M-estimation

  • key (str, default "pylimma") – Key for storing results in adata.uns (AnnData input only).

  • layer (str, optional) – Layer to use for expression data (AnnData input only). If None, uses adata.X.

  • weights_layer (str, optional) – AnnData-only. Layer to read as observation weights. When None (default) and layer ends in "_E", the companion layer {layer[:-2]}_weights is auto-loaded if present (voom / vooma convention). Set this when voom/vooma were called with a non-default weights_layer= so the read side matches.

Returns:

If input is ndarray/DataFrame, returns dict with fit results. If input is AnnData, stores results in adata.uns[key] and returns None.

Return type:

dict or None

Notes

The function dispatches to different fitting algorithms based on parameters:

  • If method=”robust”, uses mrlm() for robust M-estimation

  • If ndups < 2 and block is None, uses lm_series() for simple OLS/WLS

  • If ndups >= 2 or block is provided, uses gls_series() for GLS

The fit results include: - coefficients: estimated coefficients (n_genes, n_coefs) - stdev_unscaled: unscaled standard errors (n_genes, n_coefs) - sigma: residual standard deviation (n_genes,) - df_residual: residual degrees of freedom (n_genes,) - cov_coefficients: coefficient covariance matrix (n_coefs, n_coefs) - Amean: mean expression per gene (n_genes,) - design: the design matrix used

References

Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3(1), Article 3.