<- expand.grid(A = c(-1, 1), B = c(-1, 1)) %>%
(x_mat model.matrix( ~ A * B, data = .))
(Intercept) A B A:B
1 1 -1 -1 1
2 1 1 -1 -1
3 1 -1 1 -1
4 1 1 1 1
attr(,"assign")
[1] 0 1 2 3
Paolo Bosetti
2025-Jan-08
I want to disclose some nice feature of caracas
, a computer algebra system for R, which is actually a NumPy wrapper, although it provides an arguably nicer and more elegant interface.
Let me consider the case of a linear regression with two predictors and two levels each:
\[ \hat y_{ij} = b_1 + b_2 x_{1,i} + b_3x_{2,j} + b_4x_{1,i}x_{2,j} \]
With linear algebra, we can write:
\[ \begin{align} X b &= y \\ X^TXb &= X^T y \\ (X^TX)^{-1}X^TX b &= (X^TX)^{-1}X^T y \end{align} \]
i.e. the vector \(\hat b\) that minimizes \(||y - Xb||^2\) is:
\[
\hat b = (X^TX)^{-1}X^T y
\] The combination of predictor values, assuming that we are using coded units (i.e. each variable ranges from -1 to 1), can be obtained by expand.grid
and model.matrix
:
(Intercept) A B A:B
1 1 -1 -1 1
2 1 1 -1 -1
3 1 -1 1 -1
4 1 1 1 1
attr(,"assign")
[1] 0 1 2 3
In symbols, using caracas
:
\[ y=\left[\begin{matrix}y_{11}\\y_{21}\\y_{12}\\y_{22}\end{matrix}\right];~~~X=\left[\begin{matrix}1 & -1 & -1 & 1\\1 & 1 & -1 & -1\\1 & -1 & 1 & -1\\1 & 1 & 1 & 1\end{matrix}\right];~~~b=\left[\begin{matrix}b_{1}\\b_{2}\\b_{3}\\b_{4}\end{matrix}\right];~~~\mu = X b =\left[\begin{matrix}b_{1} - b_{2} - b_{3} + b_{4}\\b_{1} + b_{2} - b_{3} - b_{4}\\b_{1} - b_{2} + b_{3} - b_{4}\\b_{1} + b_{2} + b_{3} + b_{4}\end{matrix}\right] \]
Note that the above equations are automatically generated by calling, e.g., tex(X)
to get the \(\LaTeX\) expression of the symbolic variable X
.
To find \(\hat b\) we solve the equation \((X^TX) b = X^T y\):
to get:
\[ \hat b = \left[\begin{matrix}b_{1}\\b_{2}\\b_{3}\\b_{4}\end{matrix}\right] = \left[\begin{matrix}\frac{y_{11}}{4} + \frac{y_{12}}{4} + \frac{y_{21}}{4} + \frac{y_{22}}{4}\\- \frac{y_{11}}{4} - \frac{y_{12}}{4} + \frac{y_{21}}{4} + \frac{y_{22}}{4}\\- \frac{y_{11}}{4} + \frac{y_{12}}{4} - \frac{y_{21}}{4} + \frac{y_{22}}{4}\\\frac{y_{11}}{4} - \frac{y_{12}}{4} - \frac{y_{21}}{4} + \frac{y_{22}}{4}\end{matrix}\right] \]
One possible way to evaluate b_hat
at a given set of values \(y=\{1, 2, 3, 7\}\) is to convert it to a function and call it with the arguments 1, 2, 3, and 7:
This is just a stub, To be continued…