pylimma.duplicate_correlation

pylimma.duplicate_correlation(M, design=None, ndups=2, spacing=1, block=None, trim=0.15, weights=None, *, layer=None, weights_layer=None)[source]

Estimate correlation between duplicate spots or blocked samples.

Estimates the intra-block correlation using a mixed linear model, computed separately for each gene and then averaged using Fisher’s z-transformation.

Parameters:

M (ndarray) – Expression matrix, shape (n_genes, n_samples) or (n_spots, n_arrays).
design (ndarray, optional) – Design matrix. If None, uses intercept-only model.
ndups (int, default 2) – Number of within-array duplicate spots.
spacing (int or str, default 1) – Spacing between duplicates. Can be an integer or one of: - “columns”: spacing of 1 (duplicates in adjacent rows) - “topbottom”: spacing of n_spots/2 (duplicates in top/bottom halves)
block (array_like, optional) – Block indicator for correlated samples. If provided, ndups and spacing are ignored.
trim (float, default 0.15) – Trimmed mean proportion for consensus correlation.
weights (ndarray, optional) – Observation weights.
layer (str | None)
weights_layer (str | None)

Returns:

consensus_correlationfloat: Consensus correlation (trimmed mean on Fisher z scale).
corfloat: Same as consensus_correlation (for compatibility).
atanh_correlationsndarray: Gene-wise correlations on Fisher z scale.

Return type:

dict

Notes

This function uses REML estimation of variance components in a mixed linear model. It requires the MixedLM functionality from statsmodels.

The consensus correlation should be used as the correlation argument to gls_series().

References

Smyth, G. K., Michaud, J. and Scott, H. S. (2005). Use of within-array replicate spots for assessing differential expression in microarray experiments. Bioinformatics, 21, 2067-2075.