Sound cool.
(1994, 1996)
\[ \mathbf{h} = g(\mathbf{x}' \mathbf{W}^{(1)}) \] \[ \mathbf{y} = f(\mathbf{h}' \mathbf{W}^{(2)}) \] where \( f \) and \( g \) are activation functions
Linear regression:
\[ h = g(\mathbf{x}' \mathbf{w}^{(1)}) \] \[ y = f(h) \] \( f(z)=g(z)=z \)
Logistic regression:
\[ h = g(\mathbf{x}' \mathbf{w}^{(1)}) \] \[ y = f(h) \] \( f(z) = \sigma(z) = (1 + \exp{(-z)})^{-1} \)
(Ripley 1996)
~ Any true function \( f^*(x) \) can be represented up to arbitrary precision as a single-layer neural network with \( n_h \) nonlinear hidden units.
Keep doing
\[ z = g^{(n_h)}(g^{(\ldots)}(g^{(2)}(g^{(1)}(\mathbf{x})))) \] then \( y \approx f(z) \).
Difficult to interpret (?)
At UU: