pylimma.avereps
- pylimma.avereps(x, ID=None)[source]
Average over irregular replicate probes.
Computes the mean across replicate probes identified by ID, mirroring R’s
limma::avereps. Whenxcarries row labels (pandas DataFrame index or Series.name) andIDis not supplied, the row labels are used as probe IDs - the same default as R’savereps.default(x, ID=rownames(x)).- Parameters:
x (ndarray or DataFrame) – Expression matrix, shape (n_probes, n_arrays).
ID (array_like, optional) – Probe identifiers. Probes with the same ID are averaged. If None and
xis a DataFrame,x.indexis used. If no source of IDs is available, raisesValueErrormatching R’s"No probe IDs"error.
- Returns:
Matrix of averaged rows, one per unique ID in order of first appearance. Returned as a DataFrame (indexed by the unique IDs, columns preserved) when
xwas a DataFrame, otherwise as an ndarray. To also recover the unique ID vector from an ndarray return value, read it off withnp.unique(ID, return_index=True)or passxas a DataFrame.- Return type:
ndarray or DataFrame
Examples
R parity (matrix-return):
>>> x = np.array([[1, 2], [3, 4], [5, 6]]) >>> ID = ["A", "A", "B"] >>> avereps(x, ID) array([[2., 3.], [5., 6.]])
DataFrame-in, DataFrame-out (ID defaults to index):
>>> df = pd.DataFrame([[1, 2], [3, 4], [5, 6]], index=["A", "A", "B"]) >>> avereps(df)
For
AnnDatainput,IDdefaults toadata.var_namesand a newAnnDatais returned with the var axis collapsed to the unique ids. As withaver_arrays(), the sample-count vs gene-count shape change means in-place mutation via a layer is not possible, so AnnData-in returns a value.