pylimma.model_matrix
- pylimma.model_matrix(formula, data)[source]
Create a design matrix from a formula and data.
This function creates design matrices from R-style formula strings, matching the behaviour of R’s model.matrix() with default contrasts (contr.treatment / dummy coding).
- Parameters:
formula (str) – R-style formula string. Examples: - “~ group” : intercept + dummy variables for group (reference coding) - “~ 0 + group” or “~ group - 1” : no intercept (cell-means coding) - “~ group + batch” : additive model with two factors - “~ group + age” : factor plus numeric covariate
data (DataFrame) – Data containing the variables referenced in the formula. Columns should include all variables used in the formula.
- Returns:
Design matrix of shape (n_samples, n_coefficients). Use model_matrix_with_names() if column names are needed.
- Return type:
ndarray
Examples
>>> import pandas as pd >>> data = pd.DataFrame({ ... 'group': ['A', 'A', 'B', 'B', 'C', 'C'], ... 'age': [25, 30, 35, 40, 45, 50] ... }) >>> model_matrix("~ group", data) array([[1., 0., 0.], [1., 0., 0.], [1., 1., 0.], [1., 1., 0.], [1., 0., 1.], [1., 0., 1.]])
>>> model_matrix("~ 0 + group", data) array([[1., 0., 0.], [1., 0., 0.], [0., 1., 0.], [0., 1., 0.], [0., 0., 1.], [0., 0., 1.]])
Notes
This function uses patsy for formula parsing with Treatment coding to match R’s default contr.treatment contrast scheme.
See also
make_contrastsCreate contrast matrices for hypothesis testing