pylimma.duplicate_correlation
- pylimma.duplicate_correlation(M, design=None, ndups=2, spacing=1, block=None, trim=0.15, weights=None, *, layer=None, weights_layer=None)[source]
Estimate correlation between duplicate spots or blocked samples.
Estimates the intra-block correlation using a mixed linear model, computed separately for each gene and then averaged using Fisher’s z-transformation.
- Parameters:
M (ndarray) – Expression matrix, shape (n_genes, n_samples) or (n_spots, n_arrays).
design (ndarray, optional) – Design matrix. If None, uses intercept-only model.
ndups (int, default 2) – Number of within-array duplicate spots.
spacing (int or str, default 1) – Spacing between duplicates. Can be an integer or one of: - “columns”: spacing of 1 (duplicates in adjacent rows) - “topbottom”: spacing of n_spots/2 (duplicates in top/bottom halves)
block (array_like, optional) – Block indicator for correlated samples. If provided, ndups and spacing are ignored.
trim (float, default 0.15) – Trimmed mean proportion for consensus correlation.
weights (ndarray, optional) – Observation weights.
layer (str | None)
weights_layer (str | None)
- Returns:
- consensus_correlationfloat
Consensus correlation (trimmed mean on Fisher z scale).
- corfloat
Same as consensus_correlation (for compatibility).
- atanh_correlationsndarray
Gene-wise correlations on Fisher z scale.
- Return type:
Notes
This function uses REML estimation of variance components in a mixed linear model. It requires the MixedLM functionality from statsmodels.
The consensus correlation should be used as the correlation argument to gls_series().
References
Smyth, G. K., Michaud, J. and Scott, H. S. (2005). Use of within-array replicate spots for assessing differential expression in microarray experiments. Bioinformatics, 21, 2067-2075.