VCHui xor_pytorch: A classical XOR neural network using pytorch

Another very useful approach to solve the XOR problem would be engineering a third dimension. The first and second features will remain the same, we will just engineer the third feature. After we set the data type to be equal to np.float32,  we can apply this index shuffle variable on x1, x2 and y. After this, we also need to add some noise to x1 and x2 arrays.


In larger networks the error can jump around quite erractically so often smoothing (e.g. EWMA) is used to see the decline. We should check the convergence for any neural network across the paramters. A neural network is essentially a series of hyperplanes (a plane in N dimensions) that group / separate regions in the target hyperplane. The XOR function is the simplest (afaik) non-linear function.Is is impossible to separate True results from the False results using a linear function.


We can see that our model made pretty much good predictions. They are not as accurate as before, but if we change the iteration number the result  will get even better. After printing our result we can see that we get a value that is close to zero and the original value is also zero. On the other hand, when we test the fifth number in the dataset we get the value that is close to 1 and the original value is also 1.

Simple Logical Boolean Operator Problems

Then, we will create a criterion where we will calculate the loss using the function torch.nn.BCELoss() ( Binary Cross Entropy Loss). Also we need to define an optimizer by using the Stochastic Gradient descent. As xor neural network parameters we will pass model_AND.parameters(), and we will set the learning rate to be equal to 0.01. This completes a single forward pass, where our predicted_output needs to be compared with the expected_output.

  1. It consists of finding the gradient, or the fastest descent along the surface of the function and choosing the next solution point.
  2. If this was a real problem, we would save the weights and bias as these define the model.
  3. Non-linearity allows for more complex decision boundaries.
  4. If the input patterns are plotted according to their outputs, it is seen that these points are not linearly separable.
  5. Backpropagation is a generalization of the gradient descent family of algorithms that is specifically used to train multi-layer feedforward networks.
  6. Next, we will use the function  np.random.shuffle() on the variable index_shuffle.

Machine learning algorithms and concepts

They allow finding the minimum of error (or cost) function with a large number of weights and biases in a reasonable number of iterations. A drawback of the gradient descent method is the need to calculate partial derivatives for each of the input values. Very often when training neural networks, we can get to the local minimum of the function without finding an adjacent minimum with the best values. Also, gradient descent can be very slow and makes too many iterations if we are close to the local minimum. We implemented our backpropagation algorithm using the Python programming language and devised a multi-layer, feedforward NeuralNetwork class.

Neural networks are now widespread and are used in practical tasks such as speech recognition, automatic text translation, image processing, analysis of complex processes and so on. We read every piece of feedback, and take your input very seriously. Let’s bring everything together by creating an MLP class. The plot function is exactly the same as the one in the Perceptron class. The method of updating weights directly follows from derivation and the chain rule.

If we use something called a sigmoidal activation function, we can fit that within a range of 0 to 1, which can be interpreted directly as a probability of a datapoint belonging to a particular class. We want to find the minimum loss given a set of parameters (the weights and biases). Recalling some AS level maths, we can find the minima of a function by minimising the gradient (each minima has zero gradient).

Neural nets used in production or research are never this simple, but they almost always build on the basics outlined here. Hopefully, this post gave you some idea on how to build and train perceptrons and vanilla networks. In the forward pass, we apply the wX + b relation multiple times, and applying a sigmoid function after each call.

It was used here to make it easier to understand how a perceptron works, but for classification tasks, there are better alternatives, like binary cross-entropy loss. A clear non-linear decision boundary is created here with our generalized neural network, or MLP. The architecture of a network refers to its general structure — the number of hidden layers, the number of nodes in each layer and how these nodes are inter-connected. This data is the same for each kind of logic gate, since they all take in two boolean variables as input. However, usually the weights are much more important than the particular function chosen.


Leave a Reply

Your email address will not be published. Required fields are marked *