Module

A neural network is called a Module (or simply module in this documentation) in Torch. Module is an abstract class which defines four main methods:

It also declares two members:

Two other perhaps less used but handy methods are also defined:

Some important remarks:

Plug and play

Building a simple neural network can be achieved by constructing an available layer. A linear neural network (perceptron!) is built only in one line:

nn = nn.Linear(10,1) -- perceptron with 10 inputs

More complex neural networks are easily built using container classes Sequential and Concat. Sequential plugs layer in a feed-forward fully connected manner. Concat concatenates in one layer several modules: they take the same inputs, and their output is concatenated.

Creating a one hidden-layer multi-layer perceptron is thus just as easy as:

mlp = nn.Sequential()
mlp:add( nn.Linear(10, 25) ) -- 10 input, 25 hidden units
mlp:add( nn.Tanh() ) -- some hyperbolic tangent transfer function
mlp:add( nn.Linear(25, 1) ) -- 1 output

Of course, Sequential and Concat can contains other Sequential or Concat, allowing you to try the craziest neural networks you ever dreamt of! See the complete list of available modules.

Training a neural network

Once you built your neural network, you have to choose a particular Criterion to train it. A criterion is a class which describes the cost to be minimized during training.

You can then train the neural network by using the StochasticGradient class.

 criterion = nn.MSECriterion(1) -- Mean Squared Error criterion
 trainer = nn.StochasticGradient(mlp, criterion)
 trainer:train(dataset) -- train using some examples

StochasticGradient expect as a dataset an object which implements the operator dataset[index] and implements the method dataset:size(). The size() methods returns the number of examples and dataset[i] has to return the i-th example.

An example has to be an object which implements the operator example[field], where field might take the value 1 (input features) or 2 (corresponding label which will be given to the criterion). The input is usually a Tensor (except if you use special kind of gradient modules, like table layers). The label type depends of the criterion. For example, the MSECriterion expect a Tensor, but the ClassNLLCriterion except a integer number (the class).

Such a dataset is easily constructed by using Lua tables, but it could any C object for example, as long as required operators/methods are implemented. See an example.

StochasticGradient being written in Lua, it is extremely easy to cut-and-paste it and create a variant to it adapted to your needs (if the constraints of StochasticGradient do not satisfy you).

Low Level Training Of a Neural Network

If you want to program the StochasticGradient by hand, you essentially need to control the use of forwards and backwards through the network yourself. For example, here is the code fragment one would need to make a gradient step given an input x, a desired output y, a network mlp and a given criterion criterion and learning rate learningRate:

function gradUpdate(mlp, x, y, criterion, learningRate) 
  local pred = mlp:forward(x)
  local err = criterion:forward(pred, y)
  local gradCriterion = criterion:backward(pred, y)
  mlp:zeroGradParameters()
  mlp:backward(x, gradCriterion)
  mlp:updateParameters(learningRate)
end
For example, if you wish to use your own criterion you can simple replace gradCriterion with the gradient vector of your criterion of choice.