Over the last few weeks, a robust debate has been taking place online about the prospects that Deep Learning neural networks would lead to advances in the quest for Artificial General Intelligence.

All current AI is what is known as Artificial Narrow Intelligence. This means that the models work well (sometimes extremely well) on specific problems that are well defined. Unfortunately, they are also quite brittle and do not generalise to other problems, or even variants of the problem they are trained on. By contrast, a long-term goal of the field is to get to AIs that can generalise and extrapolate, amongst other things. This is called Artificial General Intelligence.

The debate started back in September when Geoffrey Hinton proposed that researchers should start looking at alternatives to the default back propagation algorithms that are currently quite successful. This was followed up by a more detailed critical review published by Gary Marcus earlier this month outlining many of the problems with neural networks and deep learning. There has been quite a bit of debate about the merits of Marcus’ points on social media, so much so that he published a defence on Medium, responding to the various criticisms raised.

One of the most serious points is that artificial neural networks don’t generalise and cannot extrapolate from what they have been trained on to new instances with different characteristics. This is quite simple to demonstrate using Keras and TensorFlow. In this example, we have a neural network that tries to learn to reverse the digits in a binary number:

import numpy as np from keras.models import Sequential from keras.layers import Dense, Activation from keras.optimizers import SGD x = np.array([[0,0,0,0], [0,0,0,1], [0,0,1,0], [0,1,0,1], [0,1,1,0], [0,1,1,1]]) y = np.array([[0,0,0,0], [1,0,0,0], [0,1,0,0], [1,0,1,0], [0,1,1,0], [1,1,1,0]]) model = Sequential() model.add(Dense(4, input_shape=(4,))) model.add(Activation('sigmoid')) model.add(Dense(4)) model.add(Activation('sigmoid')) model.compile(loss='mean_squared_error', optimizer=SGD(lr=0.1)) model.fit(x,y, epochs=10000, batch_size=8, verbose=0)

The above model trains the network on examples where the first digit is always 0. If we test the trained network, it will perform as expected, reversing the order of the digits:

np.around(model.predict(x))

generates the following output:

array([[0., 0., 0., 0.], [1., 0., 0., 0.], [0., 1., 0., 0.], [1., 0., 1., 0.], [0., 1., 1., 0.], [1., 1., 1., 0.]], dtype=float32)

Which is exactly what we want. However, when we change the inputs to begin with ‘1’ then the network gets confused:

offdim = np.array([[1,0,0,0], [1,0,0,1], [1,0,1,0], [1,1,0,1], [1,1,1,0], [1,1,1,1]]) np.around(model.predict(offdim))

as we can see in the following output:

array([[0., 0., 0., 0.], [1., 0., 0., 0.], [0., 1., 0., 0.], [1., 0., 1., 0.], [0., 1., 1., 0.], [1., 1., 1., 0.]], dtype=float32)

Clearly the network isn’t learning anything about the concept of “reversing” even though it looks like it initially. It has completely ignored the new number.

This lack of generalising demonstrates clearly that neural networks, as currently architected, don’t operate on any level of abstraction. This is one of the fundamental problems with neural networks that must be solved if we have any hope of them advancing our efforts towards Artificial General Intelligence.

As Marcus points out in his defence article, there may be a need for a blend of techniques, including old-school symbolic AI to help deep learning networks move forward towards solving the generalisation problem.

Sources:

Deep Learning, A Critical Appraisal – Gary Marcus https://arxiv.org/abs/1801.00631

In defense of skepticism about deep learning – Gary Marcus https://medium.com/@GaryMarcus/in-defense-of-skepticism-about-deep-learning-6e8bfd5ae0f1

Sample code on Github: https://github.com/StephenOman/TensorFlowExamples/blob/master/Reverse%20Binary%20Number%20Simple%20NN.ipynb