Range
 8.1~8.7
 9.1~9.10
 10.1~10.14
 10.19~10.21
Chapter 8. PrincipalComponents Analysis
8.1. Introduction
Selforganized learning
 Selforganized learning is a type of unsupervised learning.
 locality of learning
8.2. Principles of SelfOrganization
Principle 1: selfamplification
The following rule is based on Hebb’s postulate of learning.
 If two neurons of a synapse are activated simultaneously, then synaptic strength is selectively increased.
 If two neurons of a synapse are activated asynchronously, then synaptic strength is selectively decreased.
Principle 2: competition
 The most vigorously growing synapses are selected.
Principle 3: cooperation
 Modifications in synaptic weights and neurons tend to cooperate with each other.
Principle 4: structural information
 The underlying structure in the input signal is acquired by a selforganizing system.
8.4. PrincipalComponents Analysis
Notations
 : the dimension of the data space
 : the dimension of principal components

 Each row is a data vector.

 a unit vector that is to be projected

 The projection of to
 : the correlation matrix between and
8.5. HebbianBased Maximum Eigenfilter
Oja’s rule for maximum eigenfiltering
 : the synaptic weight from input unit to output unit
 : the number of input units
 : update step
 : learning rate
 as . Then, are the principal component.
The Generalized Hebbian Algorithm (GHA)
 : the synaptic weight from input unit to output unit
 : the number of input units
 : the number of output units
 : update step
 : learning rate
 as . Then, are the principal components.
 a neural algorithm for PCA
Chapter 9. SelfOrganizing Maps
9.1 Introduction
Selforganizing maps (SOMs)
A selforganizing map is a topological map of the input patterns.
The principal goal of the selforganizing map
 To transform an input pattern into a one or twodimensional discrete map
 To perform this transformation adaptively in a topologically ordered fashion.
9.3 SelfOrganizing Map
SOM algorithm
 [Initialization] Initialize and
 [Sampling] Sample an input \textbf{x}(n)
 [Similarity matching]
 [Updating]
9.4 Properties of the Feature Map
The four properties of the feature map
 Approximation of the input space
 The feature map, represented by the set of synaptic weight vectors, provides a good approximation to the input space.
 Topological ordering
 The feature map is topologically ordered in the sense that the spatial location in the output lattice corresponds to a particular feature of input patterns.
 Density matching
 Regions in the input space are mapped onto larger regions of the output space.
 Feature selection
 The SOM is able to select a set of best features for approximating the underlying distribution.
9.10 Relationship Between Kernel SOM and KL Divergence
Minimization of the KL divergence is equivalent to maximization of the joint entropy.
11.2 Statistical Mechanics
: the probability of occurrence of state
: the energy of the system when it is in state
if the system is in thermal equilibrium with its surrounding environment:
This probability distribution is called a Gibbs distribution or Boltzmann distribution.
: the absolute temperature in kelvins
 controls thermal fluctuations representing the effect of “synaptic noise” in a neuron.
: Boltzmann’s constant
: a constant that is independent of all states
But we use the following version without in machine learning.
: the Helmholtz free energy
: the average of the system
: the ensembleaverage operation
because
:the system we are interested in.
: the system that contact with .
The total entropy of the and tends to increase. This can be denoted by .
The principle of minimal free energy
The free energy of the system tends to decrease and become a minimum in an equilibrium.
The resulting probability distribution is defined by the Gibbs distribution.
Nature likes to find a physical system with minimum free energy.
11.4 Metropolis Algorithm
 a stochastic simulation method
 a stochastic algorithm for simulating the evolution of a physical system to thermal equilibrium
 a modified Monte Carlo method
 The Metropolis algorithm is commonly referred to as a Markov chain Monte Carlo (MCMC) method.
11.5. Simulated Annealing
 A stochastic optimization method
 annealing: a process of heating and slow cooling in order to toughen and reduce brittleness.
11.6. Gibbs Sampling
 A stochastic simulation method
11.7. Boltzmann Machine
The Boltzmann machine is a stochastic binary machine that consists of stochastic neurons. A stochastic neuron resides in one of two possible states: +1 for the “on” state and 1 for the “off” state.
The stochastic neurons are partitioned into two groups: visible ones and hidden ones.
The visible neurons provide an interface between the network and the environment. The visible neurons are determined by the environment, which is a set of the input states.
The hidden neurons always operate freely. The hidden neurons are used to explain underlying constraints contained in the environmental input vectors. The hidden neurons capture highorder statistical correlations in the input states.
input pattern = input vector = a set of visible neurons = environment = environmental vectors
The primary goal of Boltzmann learning
The primary goal of Boltzmann learning is to produce a neural network that correctly models input vectors[patterns].
Pattern completion by the Boltzmann machine
The Boltzmann machine learns the correlations in the input vectors. Thus, when a input vector whose partial input states are unknown is given, the network is able to complete the unknown states.
Two assumptions in applying Boltzmann learning
 Each environmental input vector persists long enough to permit the network to reach thermal equilibrium.
 There is no sequential structure in the visible neurons.
Boltzmann learning
If it is possible to achieve such a perfect model, #(hidden units) is exponentially larger than #(visible units).
Two phases to the operation of the Boltzmann machine
 Positive phase: The network operates with environmental inputs[training samples].
 Negative phase: The network operates freely without no environmental inputs[training samples].
Boltzmann machines
: the state vector of a Boltzmann machine
: the state of the visible neurons. the realization of the random vector
: the state of the hidden neurons. the realization of the random vector
: the synaptic weight from to where and
: the probability of the visible neurons in state
The objective of Boltzmann learning rule
To minimize the energy of the system given the training data
To maximize the probability of occurrence of state
To maximize the log probability of occurrence of state
Boltzmann learning rule
The probability of the visible neurons
The overall probability distribution
The overall log probability distribution
The derivative of the overall log probability distribution
where always
11.9. Deep Belief Nets
Training of a deep belief nets
 Training on a layerbylayer basis
Chapter 12. Dynamic Programming
Key words
 Markov decision processes
 Bellman’s theory of dynamic programming
 policy iteration
 value iteration
 the direct learningbased approximation of dynamic programming
 temporaldifference learning
 Qlearning
 the indirect approximation of dynamic programming
 least squares policy evaluation
 approximate value iteration
Chapter 13. Neurodynamics
Chapter 14. Bayesian Filtering for State Estimation of Dynamic Systems
Key words
 State estimation (theory)
 Dynamic systems
 Statespace model
 State estimation of dynamic systems
 Kalman filters
 Bayesian filters
 Particle filters
14.1. Introduction
 Dynamic system
 Suppose there is a sequence of times.
 At each time, there are a unknown state and the observation of the state.
14.2. StateSpace Models
 Two components of the statespace model
 State model:
 Measurement[observation] model:
 : state at time
 : measurement at time
 : state noise at time
 : measurement noise at time
 Type 1: Linear, Gaussian statespace models
 State model / prediction:
 Measurement model / observation:
 , : linear transformations.
 , : zeromean Gaussian distributions.
 Type 2: Nonlinear, Gaussian statespace models
 State model / prediction:
 Measurement model / observation:
 , : nonlinear transformations.
 , : zeromean Gaussian distributions.
 Type 3: Linear, nonGaussian models
 State model / prediction:
 Measurement model / observation:
 , : linear transformations.
 , : nonGaussian distributions.
 Type 4: Nonlinear, nonGaussian models
 State model / prediction:
 Measurement model / observation:
 , : nonlinear transformations.
 , : nonGaussian distributions.
14.3 Kalman Filters
 Kalman filters are linear, Gaussian statespace models.
 The parameters of Kalman filters
 : The transition matrix
 : The measurement matrix
 : The Gaussian dynamic noise, which has zero mean
 : The Gaussian measurement noise, which has zero mean
14.6 The Bayesian Filters
 A Bayesian filter is a method to solve the state estimation of a dynamic system.
 The Bayesian filter does not assume linearity and distributions of a spacestate model.
 The only assumption of the Bayesian filter is that the evolution of the state is Markovian.
 The current state is only dependent on the previous state .
 [?needed?]
Update formula
1. Time update
 is given.
 is solvable by .
2. Measurement update
 is solvable by .
 is given on the previous time update step.
 is just a normalizing constant that ensures is probability between 0 and 1.
14.8 Particle Filters
 Particle filters are not just linear filtering but nonlinear filtering.
 A PF describe the indirect global approximation of the Bayesian filter.