Why is a neural network non-linear

to directory mode

Linear separability

The concept of linear separability can best be demonstrated with a simple example, the well-known XOR problem. Consider a single-stage perceptron network with one output neuron in level 1 and two neurons in level 0. The output of the neuron should be 0, if its binary inputs are the same () otherwise it should be 1. In other words, the following must apply:

For this is equivalent to the following inequality:

For a constant threshold value, there is a straight line in the plane formed by and (see (Fig. 1), left). With positive combinations of and, all points above this straight line represent for which the neuron fires. If it is negative, all points below the straight line are points for which the neuron fires. Note that this derivation applies in general to real activations; in the case of binary activations, only the corner points of the unit square marked with,, and are possible.

A neural network that wants to solve the XOR problem must assign the points and one class, the points and the class. It is obvious that this is not possible by shifting and rotating a single straight line that linearly separates the input space. So the following applies:

The sets and the XOR problem cannot be linearly separated, i.e. there is no combination of values ​​of, and, for which is in for all points and is in for all points at the same time.

For neurons that provide inputs to a neuron, the space of the inputs can be represented as a -dimensional cube (if the input is restricted to, otherwise it is the -dimensional space). The neuron separates this input space by a -dimensional hyperplane. For this is shown in (Fig. 1) on the right. In general:

A single-level perceptron (i.e. a perceptron with only one level of modifiable weights) can only classify linearly separable quantities, i.e. quantities that can be separated by a hyperplane.

For practical applications, this raises the question of how often real problems can be linearly separated. Since this depends on the problem and the coding chosen, there is no general answer to this question. However, we know or assume that many problems cannot be linearly separated. There is also a theoretical study by Widner, who examined the number of linearly separable functions among all possible binary functions of input neurons. He has found that their percentage decreases very quickly as it increases.

Tab. 1
Number of binary functions of inputs and number of linearly separable functions
nNumber of binary functions of n inputsNumber of functions that can be linearly separated from this

Conclusion

The bottom line is that single-stage perceptrons are only suitable for very simple tasks with a small number of inputs per cell.