Module
A neural network is called a Module
(or simply
module in this documentation) in Torch. Module
is an abstract
class which defines four main methods:
forward(input)
which computes the output
of the module given the input
Tensor
.
backward(input, gradOutput)
which computes the gradients of the module
with respect to its own parameters, and its own inputs.
zeroGradParameters()
which zeroes the gradient with respect to the parameters of
the module.
updateParameters(learningRate)
which updates the parameters after one has computed
the gradients with backward()
It also declares two members:
output
which is the output returned by forward()
.
gradInput
which contains the gradients with respect to the input of the module,
computed in a backward()
.
Two other perhaps less used but handy methods are also defined:
share(mlp,s1,s2,...,sn)
which makes this module share the parameters s1,..sn of the module mlp
. This is useful if you want to have modules that share the same weights.
clone(...)
which produces a deep copy of (i.e. not just a pointer to) this Module, including the current state of its parameters (if any).
Some important remarks:
output
contains only valid values after a
forward(input)
.
gradInput
contains only
valid values after a backward(input, gradOutput)
.
backward(input, gradOutput)
uses certain computations obtained
during forward(input)
. You must call forward()
before
calling a backward()
, on the same input
, or your gradients are going to be incorrect!
Plug and play
Building a simple neural network can be achieved by constructing an available layer. A linear neural network (perceptron!) is built only in one line:
nn = nn.Linear(10,1) -- perceptron with 10 inputs
More complex neural networks are easily built using container classes
Sequential
and Concat
. Sequential
plugs
layer in a feed-forward fully connected manner. Concat
concatenates in
one layer several modules: they take the same inputs, and their output is
concatenated.
Creating a one hidden-layer multi-layer perceptron is thus just as easy as:
mlp = nn.Sequential() mlp:add( nn.Linear(10, 25) ) -- 10 input, 25 hidden units mlp:add( nn.Tanh() ) -- some hyperbolic tangent transfer function mlp:add( nn.Linear(25, 1) ) -- 1 output
Of course, Sequential
and Concat
can contains other Sequential
or
Concat
, allowing you to try the craziest neural networks you ever dreamt
of! See the complete list of available modules.
Training a neural network
Once you built your neural network, you have to choose a particular Criterion
to
train it. A criterion is a class which describes the cost to be minimized during training.
You can then train the neural network by using the StochasticGradient
class.
criterion = nn.MSECriterion(1) -- Mean Squared Error criterion trainer = nn.StochasticGradient(mlp, criterion) trainer:train(dataset) -- train using some examples
StochasticGradient expect as a dataset
an object which implements the operator
dataset[index]
and implements the method dataset:size()
. The size()
methods
returns the number of examples and dataset[i]
has to return the i-th example.
An example
has to be an object which implements the operator
example[field]
, where field
might take the value 1
(input features)
or 2
(corresponding label which will be given to the criterion).
The input is usually a Tensor (except if you use special kind of gradient modules,
like table layers). The label type depends of the criterion.
For example, the MSECriterion
expect a Tensor, but the
ClassNLLCriterion
except a integer number (the class).
Such a dataset is easily constructed by using Lua tables, but it could any C
object
for example, as long as required operators/methods are implemented.
See an example.
StochasticGradient
being written in Lua
, it is extremely easy to
cut-and-paste it and create a variant to it adapted to your needs (if the
constraints of StochasticGradient
do not satisfy you).
Low Level Training Of a Neural Network
If you want to program the StochasticGradient
by hand, you essentially need to control the use of forwards and backwards through the network yourself.
For example, here is the code fragment one would need to make a gradient step given an input x
, a desired output y
, a network mlp
and a given criterion criterion
and learning rate learningRate
:
function gradUpdate(mlp, x, y, criterion, learningRate) local pred = mlp:forward(x) local err = criterion:forward(pred, y) local gradCriterion = criterion:backward(pred, y) mlp:zeroGradParameters() mlp:backward(x, gradCriterion) mlp:updateParameters(learningRate) endFor example, if you wish to use your own criterion you can simple replace
gradCriterion
with the gradient vector of your criterion of choice.