Deep learning

What are neural networks?

Compositional approach to curve-fitting;
“Biologically inspired” (but don't take that too seriously);
Sound cool.

Neural networks have a long history

(1994, 1996)

Neural networks have a long history

Neural networks look like this

\[ \mathbf{h} = g(\mathbf{x}' \mathbf{W}^{(1)}) \] \[ \mathbf{y} = f(\mathbf{h}' \mathbf{W}^{(2)}) \] where \( f \) and \( g \) are activation functions

Common activation functions

Familiar friends

Linear regression:

\[ h = g(\mathbf{x}' \mathbf{w}^{(1)}) \] \[ y = f(h) \] \( f(z)=g(z)=z \)

Logistic regression:

\[ h = g(\mathbf{x}' \mathbf{w}^{(1)}) \] \[ y = f(h) \] \( f(z) = \sigma(z) = (1 + \exp{(-z)})^{-1} \)

...But neural nets can learn "any" function

(Ripley 1996)

"Universal approximation theorem"

~ Any true function \( f^*(x) \) can be represented up to arbitrary precision as a single-layer neural network with \( n_h \) nonlinear hidden units.

Great but:
Unfortunately, the number of hidden units \( n_h \) actually needed (width of the model) may be very large:
It is exponential in the complexity of the function –> overfitting
For this reason, neural nets were abandoned in the 90's in favor of Support Vector Machines, which are equally general but more efficient –> better generalization beyond training set

What is deep learning?

Keep doing

\[ z = g^{(n_h)}(g^{(\ldots)}(g^{(2)}(g^{(1)}(\mathbf{x})))) \] then \( y \approx f(z) \).

Output of each hidden layer is input to subsequent one
Allow representation learning by building complex features out of simpler ones
Go deep: exponential advantages, less overfitting
Aggressive parameterization + aggressive regularization
Compositional: efficient parametrization
Learn relevant features: “End-to-end”

Deep learning models

Not convex
Not identifiable
Not differentiable
Overparameterized
Local maxima, saddlepoints
Computationally challenging
Difficult to interpret (?)

STILL WORKS!

Demos

What do people do with deep learning?

Recognize objects in pictures
Speech recognition
Automatic translation (e.g. Google translate)
Recommender systems (e.g. Amazon, Spotify)
Computer art

At UU:

Physics: Detect states of matter
Chemistry: Predict protein interactions
Medicine: Predict survival of ALS patients from MRI scans
Music: Label chords in song

Word2Vec

AlphaGo Zero

Defeated 18-time world champion Lee Sedol - by 100 games to 0.
Defeated best Go player in the world Ke Yie

Common DL architectures

Deep feedforward network

Convolutional neural network

Recurrent neural network

Common DL problems & solutions: Solved

Solved:

Hard to obtain likelihood –> Backpropagation
Many cases –> Stochastic gradient descent
Many categories to predict –> Hierarchical softmax, Importance sampling, Noise-contrastive estimation

Common DL problems & solutions: Solved?

Solved?

Overfitting –> Regularization
No global optimum –> Early stopping
“Exploding gradients” –> Gradient clipping
“Vanishing gradients” –> Skip-connections, Memory, Attention
Not identified –> Don't care, just find low risk

Should I use deep learning?

More way of life than specific technique
Statisticians: We can definitely learn a lot from DL research to improve our own research
Applied researchers: Currently a hype. But mostly “just” another machine learning technique. Don't forget to try out other things as well!
Particularly when you don't have at least millions of examples
Also brings something truly new: end-to-end is very impressive.
On balance:

Probably, yes.

Some ridiculous ideas for inspiration:

Automatically design public policy based on open interviews with the public
Redefine neurological malfunction based on predicting behavior in daily life directly from brain patterns
Attach sensors to baby and let parent know what baby needs
…Your own ideas here…

Deep learning

What are neural networks?

Neural networks have a long history

Neural networks have a long history

Neural networks look like this

Neural networks look like this

Common activation functions

Familiar friends

...But neural nets can learn "any" function

"Universal approximation theorem"

What is deep learning?

What is deep learning?

What is deep learning?

Deep learning models

STILL WORKS!

Demos

What do people do with deep learning?

Word2Vec

Word2Vec

Word2Vec

Word2Vec

AlphaGo Zero

AlphaGo Zero

AlphaGo Zero

AlphaGo Zero

AlphaGo Zero

Common DL architectures

Deep feedforward network

Convolutional neural network

Recurrent neural network

Recurrent neural network

Common DL problems & solutions: Solved

Common DL problems & solutions: Solved?

Should I use deep learning?

Probably, yes.

Some ridiculous ideas for inspiration: