Deep learning

What are neural networks?

  • Compositional approach to curve-fitting;
  • “Biologically inspired” (but don't take that too seriously);
  • Sound cool.

Neural networks have a long history

(1994, 1996)

Neural networks have a long history

Neural networks look like this

Neural networks look like this

\[ \mathbf{h} = g(\mathbf{x}' \mathbf{W}^{(1)}) \] \[ \mathbf{y} = f(\mathbf{h}' \mathbf{W}^{(2)}) \] where \( f \) and \( g \) are activation functions

Common activation functions

Familiar friends

Linear regression:

\[ h = g(\mathbf{x}' \mathbf{w}^{(1)}) \] \[ y = f(h) \] \( f(z)=g(z)=z \)

Logistic regression:

\[ h = g(\mathbf{x}' \mathbf{w}^{(1)}) \] \[ y = f(h) \] \( f(z) = \sigma(z) = (1 + \exp{(-z)})^{-1} \)

...But neural nets can learn "any" function

(Ripley 1996)

"Universal approximation theorem"

~ Any true function \( f^*(x) \) can be represented up to arbitrary precision as a single-layer neural network with \( n_h \) nonlinear hidden units.

  • Great but:
  • Unfortunately, the number of hidden units \( n_h \) actually needed (width of the model) may be very large:
  • It is exponential in the complexity of the function –> overfitting
  • For this reason, neural nets were abandoned in the 90's in favor of Support Vector Machines, which are equally general but more efficient –> better generalization beyond training set

What is deep learning?

What is deep learning?

What is deep learning?

Keep doing

\[ z = g^{(n_h)}(g^{(\ldots)}(g^{(2)}(g^{(1)}(\mathbf{x})))) \] then \( y \approx f(z) \).

  • Output of each hidden layer is input to subsequent one
  • Allow representation learning by building complex features out of simpler ones
  • Go deep: exponential advantages, less overfitting
  • Aggressive parameterization + aggressive regularization
  • Compositional: efficient parametrization
  • Learn relevant features: “End-to-end”

Deep learning models

  • Not convex
  • Not identifiable
  • Not differentiable
  • Overparameterized
  • Local maxima, saddlepoints
  • Computationally challenging
  • Difficult to interpret (?)

    STILL WORKS!

Demos

What do people do with deep learning?

  • Recognize objects in pictures
  • Speech recognition
  • Automatic translation (e.g. Google translate)
  • Recommender systems (e.g. Amazon, Spotify)
  • Computer art

At UU:

  • Physics: Detect states of matter
  • Chemistry: Predict protein interactions
  • Medicine: Predict survival of ALS patients from MRI scans
  • Music: Label chords in song

Word2Vec

Word2Vec

Word2Vec

Word2Vec

AlphaGo Zero

AlphaGo Zero

AlphaGo Zero

AlphaGo Zero

AlphaGo Zero

  • Defeated 18-time world champion Lee Sedol - by 100 games to 0.
  • Defeated best Go player in the world Ke Yie

Common DL architectures

Deep feedforward network

Convolutional neural network

Recurrent neural network

Recurrent neural network

Common DL problems & solutions: Solved

Solved:

  • Hard to obtain likelihood –> Backpropagation
  • Many cases –> Stochastic gradient descent
  • Many categories to predict –> Hierarchical softmax, Importance sampling, Noise-contrastive estimation

Common DL problems & solutions: Solved?

Solved?

  • Overfitting –> Regularization
  • No global optimum –> Early stopping
  • “Exploding gradients” –> Gradient clipping
  • “Vanishing gradients” –> Skip-connections, Memory, Attention
  • Not identified –> Don't care, just find low risk

Should I use deep learning?

  • More way of life than specific technique
  • Statisticians: We can definitely learn a lot from DL research to improve our own research
  • Applied researchers: Currently a hype. But mostly “just” another machine learning technique. Don't forget to try out other things as well!
  • Particularly when you don't have at least millions of examples
  • Also brings something truly new: end-to-end is very impressive.
  • On balance:

    Probably, yes.

Some ridiculous ideas for inspiration:

  • Automatically design public policy based on open interviews with the public
  • Redefine neurological malfunction based on predicting behavior in daily life directly from brain patterns
  • Attach sensors to baby and let parent know what baby needs

  • …Your own ideas here…