Many people think of artificial neural networks as blenders,or a magic tool that you do not know what it is happening. ANN just gives you the solution, and it needs some training before that. You just feed photos of dogs and cats and in a magic way, they will learn who the dog is and who the cat is. On the other hand, many people read about the theoretical part of ANN and just get confused because there are no examples from real life to how the ANN will work.
This article is for the second group, it has no math and no code, so the first group can also read it and understand that it is not that simple, but also not that scary.
Why we use ANN?
The good news about artificial neural network: do not panic, artificial neural network is not an extraordinary machine learning, it will do usual regression and classification tasks, but in a way more suitable to computers with multiprocessors [1].
Second thing, you do not need to re-invent the wheel when you work with artificial neural networks, most of its types is already exist in famous libraries for python, matlab, R language and even dot net. However, certainly you need to know how it works in order to use its libraries. Building the artificial neural network should not be your biggest concern, but to find the data, train the network and achieve a good accuracy for your task.
You obviously heard and read a lot that artificial neural network is called so because it is an attempt to simulate the human brain and how it works, but why? Well for main two reasons: first, human brain divides the memory and processing among millions of neurons and that’s seems similar to the large server clusters and multiprocessor environments today, the hardware advances in this field has preceded the advances in software development that helps with such environments; second: human brain does not deal with the data at once. Instead, it deals with data gradually. You do not download the language, but you learn new words every day.
One other thing that you should be afraid of when reading about artificial neural networks and their applications. You may have seen many amazing applications of face recognition or natural language processing that said to have done by artificial neural network, but ANN has nothing to do with the image, sound or words itself. Instead, a statistical method deals only with numbers, and numbers here are the numerical representation of these objects’ features.
An example
The simplest way to understand the idea of artificial neural network (and even other machine learning methods) is the decision-making matrix. Something that I was doing before understanding artificial neural network and machine learning. My friend was very hesitated about what car he is going to buy. I asked him about the features that he cares about, and I draw a table like this:
Car | Price | Shape | Safety | Engine size |
Toyota | ***** | *** | *** | *** |
Mercedes | ** | ***** | ***** | *** |
Jeep | *** | *** | ***** | **** |
Well I cannot remember exactly what cars did he choose, but it was similar to this. To make it easier for him we gave ratings of stars to each groups. Now how can we calculate which car is the best? We have to assess the importance that he gives to each of these attributes (weights).
For him, the price was not so important, nor the engine size, while the safety and the shape was very important.
Car | Price | PriceW | Shape | ShapeW | Safety | SaftyW | Engine size | EngineW |
Toyota | 5 | 1 | 3 | 3 | 3 | 4 | 3 | 2 |
Mercedes | 2 | 1 | 5 | 3 | 5 | 4 | 3 | 2 |
Jeep | 3 | 1 | 3 | 3 | 5 | 4 | 4 | 2 |
Then, we multiplied each value with its weight and got a total value for each car.
Car | Total value |
Toyota | 32 |
Mercedes | 43 |
Jeep | 40 |
That table helped him to choose the Mercedes.
Now the weights represented my friends reflects his preferences. What if I have more data with people choices but I do not have the preferences of the people? Well, how can the preferences of people that have bought cars in the past be useful to me?
It would be useful if I can expect the preferences of someone who want to buy a car, or someone who might be considering buying a car. This type of information not needed by my friend, but from the advertisement agency that wants to show him the correct advertisement in the correct time to buy a specific car.
Now let us assume that we have started to create an algorithm for the advertisement agency that contracted with cars seller. What are the weights that we are going to start with?
The agency has given us many of their sales and customer information, but we still do not have the weights, this data called the training data. While we do the calculations – the reverse calculations to get weights – then we can get the preferences of next customers according to the available information regarding the customers. This is exactly how neural network problems are stated.
Hidden layers
Now, let us assume that the calculation will not only pass through a one set of weights to get the result. What if we needed to expect some virtual values that occur because of the previously mentioned calculations! In the agency, we found that the car attributes were more complex than we thought. Some of the attributes like shape and price have more effect.
Here we will create another layer of calculations, where the multiplication of weights (preferences) by the inputs (attributes) will create estimates, and these estimates will have other weights that will guide us to the car. The new weights will not have a direct real value for us as individuals. They could be our imaginary value of the relationship between price and shape. The value or the preference that we usually do not talk about it, we just feel it. This is called the hidden layer, and that type of learning that has hidden layers is called deep learning.
Recurrent Neural Network
Now let us assume something more complex, where the agency had information about all the cars that the customer has bought during his life, like 20 cars. This is kind of a sequential relationship, where cars have some relationship between them, the person who tried a car with less price, may try some car from a higher price range later! We have all the choices in our data. In order to expect the car correctly, we definitely need to depend on the previous choices of the person. In this case the network output (the car) will be used along with the other car attributes (price, shape..etc). The network that uses its output as an input, or the network that depend on previous outputs is called recurrent neural network.
Convolutional Neural Networks
If you heard about recurrent neural network, then you’ve obviously heard about convolutional neural network, talking about filters, and filters that are moving across the data, What is convolutional neural network? Unfortunately, it cannot be simplified with the cars example. However, the idea of convolutional neural network (CNN) could be less scary.
If you see the amazing applications of image processing and neural network, don’t think that there’s some magic inside, think about the simplest part of the digital image, the pixel. How the pixel presented in the computer? By three digits of red, blue and green colors. Then we have numbers now, but a huge count of numbers, not as simple as the example of the car buying decision.
The second part about images, is that in many cases, there’s some relationship between the pixels. They’re not just like the price of the car and the shape of the car, an image of a woman with high contrast between her lips color and her face color may guide us to the fact that she is using a lipstick, we recognize the colors and the parts by their relationships.
So we cannot use the pixels in the same way as independent attributes, in addition to that, it is not rational to process larger images in all the layers to get decision, this will take too much time, we should have some type of compression. What is the solution? Filters are small matrixes (like 3X3) that are multiplied by the opposite set of pixels to get smaller and more meaningful set of pixels.
Categories and Boolean
Well, Boolean is easy to be represented in the neural network inputs as 0 or 1, it is like when my friends want to answer by yes or no regarding one of his car preferences. Do you want a CD player in the car? Then I will ask him about his preferences regarding the CD player.
However, How to choose between three or more categories? The color for example, let us assume that he can choose between 10 colors for the car he wants. We knew that the ANN works with numbers, so can we say that blue is one and red is ten? In this case, the ten will have a higher value, and it will ruin the calculations of other values.
Categories should have an input for each, red color should have a binary input to the network, so we will have in our table: price, shape… etc. and then a field for each color. My friend will give zero to all the colors that he do not want to choose and one to the color that he likes. Here we will always have the color represented by one number.
Overfitting and under fitting
When I was a child, I tried to touch the fire, so I burned my finger. Then, my neural network (real neural network) has added the glowing non-solid wobbling things as a dangerous hot thing. In another day, I tried to touch the lamp, and I burned my finger too. Then my brain removed the solidity and movement attributes from the danger field, and he kept the glowing objects as being dangerous. Later when we saw the economic lamp for the first time, my ANN considered it as dangerous, after trying I found that it was not. What happened here?
My ANN thought that the economic lamp is hot, but it was not, it was a false case. This is called overfitting. When it started to include examples according to a more complicated criteria. In the first case when I touched the lamp and burned, it was under fitting, my criteria was so simple, including only the fire as a hot object.
The very complicated models will generally do overfitting in ANN and in all machine-learning methods, while the under-trained simple models will under-fit the examples.
Forward propagation and back propagation
Forward propagation is the usual operation when inputs multiplied by weights throughout the layers to get the output. Back propagation is when we compare the output in the training with the real given output (as in training we have the correct outputs already) and get the error, the weights across the layers will be updated according to the error value (here you may see the disliked steps of math, like chain rule and derivative equations).
Training data is the data with correct output that is usually prepared by humans. In the example of car selection, we may go to persons who have already chosen their cars, and we will ask them again, about how they chose their car.