This article chronicles the development of an artificial neural network designed to recognize handwritten digits. Although some theory of neural networks is given here, it
would be better if you already understood some neural network concepts, like neurons, layers, weights, and backpropagation.
The neural network described here is not a general-purpose neural network, and it’s not some kind of a neural network workbench. Rather, we will focus on one very specific neural network (a five-layer convolutional neural network) built for one very specific purpose (to recognize handwritten digits).
The idea of using neural networks for the purpose of recognizing handwritten digits is not a new one. The inspiration for the architecture described here comes from articles written by two separate authors. The first is Dr. Yann LeCun, who was an independent discoverer of the basic backpropagation algorithm. Dr. LeCun hosts an excellent site on his research into neural networks. In particular, you should view his “Learning and Visual Perception” section, which uses animated GIFs to show results of his research. The MNIST database (which provides the database of handwritten digits) was developed by him. I used two of his publications as primary source materials for much of my work, and I highly recommend reading his other publications too (they’re posted at his site). Unlike many other publications on neural networks, Dr. LeCun’s publications are not inordinately theoretical and math-intensive; rather, they are extremely readable, and provide practical insights and explanations.Neural Network for Recognition of Handwritten and Digits
The Activation Function (or, “Sigmoid” or “Squashing” Function)
Selection of a good activation function is an important part of the design of a neural network. Generally speaking, the activation function should be symmetric, and the neural network should be trained to a value that is lower than the limits of the function.
One function that should never be used as the activation function is the classical sigmoid function (or “logistic” function), defined as Logisitc function. It should never be used since it is not symmetric: its value approaches +1 for increasing x, but for decreasing x its value approaches zero (i.e., it does not approach -1 which it should for symmetry). The reason the logistic function is even mentioned here is that there are many articles on the web that recommend its use in neural networks, for example, Sigmoid function in the Wikipedia. In my view, this is a poor recommendation and should be avoided.