pylimma.avereps

pylimma.avereps(x, ID=None)[source]

Average over irregular replicate probes.

Computes the mean across replicate probes identified by ID, mirroring R’s limma::avereps. When x carries row labels (pandas DataFrame index or Series .name) and ID is not supplied, the row labels are used as probe IDs - the same default as R’s avereps.default(x, ID=rownames(x)).

Parameters:
  • x (ndarray or DataFrame) – Expression matrix, shape (n_probes, n_arrays).

  • ID (array_like, optional) – Probe identifiers. Probes with the same ID are averaged. If None and x is a DataFrame, x.index is used. If no source of IDs is available, raises ValueError matching R’s "No probe IDs" error.

Returns:

Matrix of averaged rows, one per unique ID in order of first appearance. Returned as a DataFrame (indexed by the unique IDs, columns preserved) when x was a DataFrame, otherwise as an ndarray. To also recover the unique ID vector from an ndarray return value, read it off with np.unique(ID, return_index=True) or pass x as a DataFrame.

Return type:

ndarray or DataFrame

Examples

R parity (matrix-return):

>>> x = np.array([[1, 2], [3, 4], [5, 6]])
>>> ID = ["A", "A", "B"]
>>> avereps(x, ID)
array([[2., 3.],
       [5., 6.]])

DataFrame-in, DataFrame-out (ID defaults to index):

>>> df = pd.DataFrame([[1, 2], [3, 4], [5, 6]], index=["A", "A", "B"])
>>> avereps(df)

For AnnData input, ID defaults to adata.var_names and a new AnnData is returned with the var axis collapsed to the unique ids. As with aver_arrays(), the sample-count vs gene-count shape change means in-place mutation via a layer is not possible, so AnnData-in returns a value.