pylimma.model_matrix

pylimma.model_matrix(formula, data)[source]

Create a design matrix from a formula and data.

This function creates design matrices from R-style formula strings, matching the behaviour of R’s model.matrix() with default contrasts (contr.treatment / dummy coding).

Parameters:
  • formula (str) – R-style formula string. Examples: - “~ group” : intercept + dummy variables for group (reference coding) - “~ 0 + group” or “~ group - 1” : no intercept (cell-means coding) - “~ group + batch” : additive model with two factors - “~ group + age” : factor plus numeric covariate

  • data (DataFrame) – Data containing the variables referenced in the formula. Columns should include all variables used in the formula.

Returns:

Design matrix of shape (n_samples, n_coefficients). Use model_matrix_with_names() if column names are needed.

Return type:

ndarray

Examples

>>> import pandas as pd
>>> data = pd.DataFrame({
...     'group': ['A', 'A', 'B', 'B', 'C', 'C'],
...     'age': [25, 30, 35, 40, 45, 50]
... })
>>> model_matrix("~ group", data)
array([[1., 0., 0.],
       [1., 0., 0.],
       [1., 1., 0.],
       [1., 1., 0.],
       [1., 0., 1.],
       [1., 0., 1.]])
>>> model_matrix("~ 0 + group", data)
array([[1., 0., 0.],
       [1., 0., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [0., 0., 1.]])

Notes

This function uses patsy for formula parsing with Treatment coding to match R’s default contr.treatment contrast scheme.

See also

make_contrasts

Create contrast matrices for hypothesis testing