CS231n: Convolutional Neural Networks for Visual Recognition | Course

Lecture 6 | Training Neural Networks I

Sigmoid
  • Problems of the sigmoid activation function
    • Problem 1: Saturated neurons kill the gradients.
    • Problem 2: Sigmoid outputs are not zero-centered.
      • Suppose a given feed-forward neural network has hidden layers and all activation functions are sigmoid.
      • Then, except the first layer, the other layers get only positive inputs.
      • If $\forall i, x_i>0$, then all the gradients are positive.
        • $\frac{\partial \sigma}{\partial w_i} = \frac{\partial \sigma}{\partial (\sum_{i}x_i w_i+b)}x_i = (+)(+)>0$
      • If the gradients are only positive, then the update direction gets very constrained.
    • Problem 3: exp() is a bit expensive computation. – (a minor problem)
      • Numerical methods now well solves this problem.
tanh (tangent hyperbolic)
  • Zero centered
    • The problem 2 has been solved.
  • The problem 1 and 3 are still remained.
ReLU (rectified linear unit)
  • The problem 1 has been solved in the positive region.
  • Actually more biologically plausible?than sigmoid. The detail was not introduced in this lecture.
  • AlexNet used ReLU.
  • Problems
    • Problem 1: Not zero-centered
      • The gradient of each weight is zero or positive.
      • The update direction is always the combination of zeros or positives.
      • The update direction is restricted. This effects inefficient optimization.
    • Problem 2: dead ReLU
      • 20% of units are never active nor updated, which are called dead ReLUs.
  • Initialization
    • People like to initialize?ReLU neurons with slightly?positive biases (e.g. 0.01)
  • Leaky ReLU
  • PReLU (Parametric Rectifier)
  • ELU (Exponential Linear Unit)
    • Between leaky ReLU and ReLU
Maxout
  • Nonlinear
  • a generalized form of ReLU and leaky ReLU
  • Benefits
    • Linear regimes
    • Its output does not saturate.
    • Its gradient does not die.
  • Drawback
    • Double the number of?weights.
In practice
  • Use ReLU first.
  • Try out Leakey ReLU, Maxout, and ELU.
  • Try out tanh but don’t expect much.
  • Don’t use sigmoid.

Lecture 11: Detection and segmentation

  • segmentation, localization, detection
  • semantic segmentation, instance segmentation
  • downsampling, upsampling
  • unpooling by nearest neighbor, unpooling by ‘Bed of Neils’
  • max unpooling
  • tranpose convolution, upconvolution, fractionally strided convolution, backward convolution
  • upsampling: unpooling, strided transpose convolution
  • Treat localization as a regression problem!
  • Use L2 loss for localization.
Object dectection
Sliding window
  • Apply a CNN to many different crops of the image. The CNN classifies each crop as object or background.
  • Sliding window is very computationally expensive!
Region proposal
  • Find “blobby” image regions that are likely to contain objects.
  • Relatively fast to run; e.g. Selective Search gives 1000 region proposals.
  • R-CNN, Fast R-CNN, Faster R-CNN
R-CNN
  • Ad hoc training objectives
  • Training is slow and takes a lot of disk space.
  • Inference (detection) is slow.
Fast R-CNN
  • Detect all regions by one ConvNet in parallel.
  • Problem: Runtime dominated by region proposal
Faster R-CNN
  • Make a CNN do proposal!
  • Insert Region Proposal Network (RPN) to predict proposals from features.
Detection without Proposals
  • YOLO / SSD
  • Use grid cells
  • Faster R-CNN is slower but more accurate.
Dense captioning
  • Dense Captioning = object detection + captioning
Instance segmentation
  • Mask R-CNN
    • Very good result!
    • Also do pose detection!

 

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *