CS231n: Convolutional Neural Networks for Visual Recognition | Course

SungjaeOctober 8, 2017August 27, 2018

Lecture 6 | Training Neural Networks I

Sigmoid

Problems of the sigmoid activation function
- Problem 1: Saturated neurons kill the gradients.
- Problem 2: Sigmoid outputs are not zero-centered.
  - Suppose a given feed-forward neural network has hidden layers and all activation functions are sigmoid.
  - Then, except the first layer, the other layers get only positive inputs.
  - If $\forall i, x_i>0$, then all the gradients are positive.
    - $\frac{\partial \sigma}{\partial w_i} = \frac{\partial \sigma}{\partial (\sum_{i}x_i w_i+b)}x_i = (+)(+)>0$
  - If the gradients are only positive, then the update direction gets very constrained.
- Problem 3: exp() is a bit expensive computation. – (a minor problem)
  - Numerical methods now well solves this problem.

tanh (tangent hyperbolic)

Zero centered
- The problem 2 has been solved.
The problem 1 and 3 are still remained.

ReLU (rectified linear unit)

The problem 1 has been solved in the positive region.
Actually more biologically plausible?than sigmoid. The detail was not introduced in this lecture.
AlexNet used ReLU.
Problems
- Problem 1: Not zero-centered
  - The gradient of each weight is zero or positive.
  - The update direction is always the combination of zeros or positives.
  - The update direction is restricted. This effects inefficient optimization.
- Problem 2: dead ReLU
  - 20% of units are never active nor updated, which are called dead ReLUs.
Initialization
- People like to initialize?ReLU neurons with slightly?positive biases (e.g. 0.01)
Leaky ReLU
PReLU (Parametric Rectifier)
ELU (Exponential Linear Unit)
- Between leaky ReLU and ReLU

Maxout

Nonlinear
a generalized form of ReLU and leaky ReLU
Benefits
- Linear regimes
- Its output does not saturate.
- Its gradient does not die.
Drawback
- Double the number of?weights.

In practice

Use ReLU first.
Try out Leakey ReLU, Maxout, and ELU.
Try out tanh but don’t expect much.
Don’t use sigmoid.

Lecture 11: Detection and segmentation

segmentation, localization, detection
semantic segmentation, instance segmentation
downsampling, upsampling
unpooling by nearest neighbor, unpooling by ‘Bed of Neils’
max unpooling
tranpose convolution, upconvolution, fractionally strided convolution, backward convolution
upsampling: unpooling, strided transpose convolution
Treat localization as a regression problem!
Use L2 loss for localization.

Object dectection

Sliding window

Apply a CNN to many different crops of the image. The CNN classifies each crop as object or background.
Sliding window is very computationally expensive!

Region proposal

Find “blobby” image regions that are likely to contain objects.
Relatively fast to run; e.g. Selective Search gives 1000 region proposals.
R-CNN, Fast R-CNN, Faster R-CNN

R-CNN

Ad hoc training objectives
Training is slow and takes a lot of disk space.
Inference (detection) is slow.

Fast R-CNN

Detect all regions by one ConvNet in parallel.
Problem: Runtime dominated by region proposal

Faster R-CNN

Make a CNN do proposal!
Insert Region Proposal Network (RPN) to predict proposals from features.

Detection without Proposals

YOLO / SSD
Use grid cells
Faster R-CNN is slower but more accurate.

Dense captioning

Dense Captioning = object detection + captioning

Instance segmentation

Mask R-CNN
- Very good result!
- Also do pose detection!

Leave a Reply Cancel reply

Seo wordpress plugin by www.seowizard.org.