Deep Learning Dead-End?

Deep Learning is at the core of much of modern Artificial Intelligence. It has had some spectacular recent successes, not least being a major part of the system that beat the world champion at Go.

Key to its success is the Back-Propagation algorithm, usually shortened to “Backprop”. I’ve written elsewhere about how this algorithm works, but essentially, it takes an error in the output of a neural network and propagates it backwards through the network, adjusting the network’s configuration as it goes to reduce that output error.

This algorithm has been so successful that deep learning neural networks are finding applications in a broad range of industries. Much of the recent popularisation of artificial intelligence and machine learning is due to this very success.

Now one of the longest proponents of this algorithm, Geoffrey Hinton from the University of Toronto has suggested that if progress is to be made on AI, then focus must shift away from Backprop. His view is based on the observation that the brain doesn’t learn that way and if intelligent machines are to be developed, new techniques are required.

In particular, he suggests that Unsupervised Learning will prove to be a fertile area. The method of learning does not rely on having fully labelled or classified datasets the way Supervised Learning and Backprop need. Instead it tries to understand patterns within the data itself to be able to make predictions and classifications.

Deep Learning does still have a lot to offer. However, given it’s requirements for large amounts of data and computing power, there is an increasing awareness that alternative machine learning techniques may prove to be equally fruitful.

Source: Artificial intelligence pioneer says we need to start over – Axios

In Praise Of Reinventing The Wheel

ferris-wheel
The original Ferris Wheel at the 1893 World Columbian Exposition in Chicago (Source: Public Domain via Wikipedia)

 

There is a strong principle in software engineering of reusing code wherever possible. It’s considered so important, that it’s got its own TLA: DRY (“Don’t Repeat Yourself”). In other words, don’t reinvent the wheel.

There are obvious benefits to this. Nobody wants to type the same piece of code over and over. So write it once as a function and reuse it wherever you need to, freeing yourself up to build other stuff. Better still, reuse someone else’s functions.

Indeed, the ready availability of open-source libraries and frameworks for almost every popular requirement under the sun, means that you almost never have to develop your own low level software. Thank goodness. Software development would get really tedious if you had to start out writing basic I/O routines with each new project.

Even new developers can get sophisticated projects up and running fairly quickly. Github and StackOverflow have seen to that.

The principle is taken to an extreme by some web developers, who include the entire jQuery library on their webpage in order to implement some simple form validation. I’m all in favour of simplifying JavaScript development, second only to abolishing it altogether. But including the whole library is a tad overkill.

And this highlights a problem with treating DRY as dogma.

When we rely on existing libraries and frameworks, we don’t get a good idea of what’s really happening within our code base. Yes, we can read the library code to get the gist of what’s going on. But we never really do.

There’s a really good reason to try to implement your own routines, if only in your test projects. You’ll get an appreciation of the lower level algorithms that your applications depend on. You may even discover how difficult it is to get some of these algorithms to run quickly or to use a small amount of memory.

This will lead you to the core principles of algorithmic analysis that are fundamental to computer science. Along the way, you’ll pick up a really good sense of what works well and what doesn’t, what’s fast and what’s slow, what’s memory efficient and what’s a complete hog. Hopefully, you can reflect those lessons in your own production application code.

So the next time you have a chance, try to implement your own I/O routines or other low level algorithms and see how you compare to other published libraries.

Remember, DRY is a principle, not a dogma.

Sometimes there is merit in repeating yourself (or someone else).

 

2015 Underhanded C Contest Results Released

I’ve just found out that the results of the 2015 Underhanded C Contest have been published and (YIPPEE!!), I managed to bag a runner-up spot.

This year, the competition was sponsored by the Nuclear Threat Initiative. The goal of the contest was to fake the results of a test of a nuclear warhead to show that a warhead contained fissile material, when in fact it did not.

Congratulations to the winner, Linus Åkesson. You can read about his entry on his blog. It’s also worth reading about the other entrants’ ideas for how to hide malicious code inside normal looking code.

Here is the NTI’s article about the contest: http://www.nti.org/newsroom/news/underhanded-c-contest-highlights-challenges-nuclear-arms-control-verification-technologies/

Lastly, if you’re curious about my entry, I’ve posted the code and an explanation on GitHub: https://github.com/StephenOman/UnderhandedC2015

 

A Simple Neural Network in Octave – Part 3

This is the final post in a short series looking at implementing a small neural network to solve the XOR problem in Octave.

In the first post, we looked at how to set up the parameters of the network (the weights between the nodes), feed in an example and get the network to predict some output. The second post looked at implementing the back propagation algorithm, which adjusts those parameters in the network to improve the accuracy of it’s outputs.

Now it’s time to pull those pieces together so that we can let the network adjust the parameters to get as close to the right output as we desire.

First, we’ll set up a function to run all the training examples through the network. Presenting all the training examples once (one after the other) is called an epoch. The goal of this function will be to return the updated parameters after an epoch:

function [THETA1_new, THETA2_new] = xor_nn(XOR, THETA1, THETA2, init_w=0, learn=0, alpha=0.01)

Just a quick note on the notation and parameters in this function header. The first part after the word “function” specifies what this function returns. In our case, we want new versions of the parameters in the network (represented by THETA1 and THETA2).

The name of the function is “xor_nn” and the parameters to the function are contained within the brackets:

  • XOR This is the representation of the training set.
  • THETA1 & THETA2 Current values for the parameters in the network.
  • init_w=0 This tells the function to initialise the weights in the network
  • learn=0 This tells the network to learn from the examples (see below)
  • alpha=0.01 This is the learning rate (default value is 0.01)

Note that any parameter that has an “=” sign is optional and will get the default value shown if it is not provided explicitly.

The first step in our function is to check if we need to initialize the weights.

if (init_w == 1)
    THETA1 = 2*rand(2,3) - 1;
    THETA2 = 2*rand(1,3) - 1;
endif

This is simply the same code as before, inside an “if” block.

Now we initialize the cost variable. This is done for every epoch:

J = 0.0;

In the previous post, we looked at updating the weights in the network after calculating the cost after the network has processed a single training example. This is a specific type of learning called “online learning”. There is another type of learning called “batch learning”, and this works by updating the weights once after all the training examples have been processed. Let’s implement that instead in our function.

We need to record the number of training examples and the total delta across all the training examples (rather than just across one as before). So here are the extra variables required:

T1_DELTA = zeros(size(THETA1));
T2_DELTA = zeros(size(THETA2));
m = 0;

Remember that THETA1 and THETA2 are matrices, so we need to initialize every element of the matrix, and make our delta matrices the same size. We’ll also use “m” to record the number of training examples that we present to the network in the epoch.

Now lets set up a loop to present those training examples to the network one by one:

for i = 1:rows(XOR)

This simply says repeat the following block for the same number of rows that exist in our XOR data set.

Now put in the code from Part 1 that processes an input:

A1 = [1; XOR(i,1:2)'];
Z2 = THETA1 * A1;
A2 = [1; sigmoid(Z2)];
Z3 = THETA2 * A2;
h = sigmoid(Z3);
J = J + ( XOR(i,3) * log(h) ) + ( (1 - XOR(i,3)) * log(1 - h) );
m = m + 1;

Note the slight change in moving the bias node from the layer input calculation (Z3) to the output from the previous layer (A2). This just makes the code slightly simpler.

Then we add the code from Part 2, inside a test to see if we are in learning mode. The code has been slightly modified as we are implementing batch learning rather than online learning. In batch mode, we want to calculate the errors across all the examples:

if (learn == 1)
    delta3 = h - XOR(i,3);
    delta2 = ((THETA2' * delta3) .* (A2 .* (1 - A2)))(2:end);
    T2_DELTA = T2_DELTA + (delta3 * A2');
    T1_DELTA = T1_DELTA + (delta2 * A1');
else
    disp('Hypothesis for '), disp(XOR(i,1:2)), disp('is '), disp(h);
endif

If we’re not learning from this example, then we simply display the cost of this particular example.

That’s the end of the loop, so we close the block in Octave with:

endfor

Now we calculate the average cost across all the examples:

J = J / -m;

This gives us a useful guide to see if the cost is reducing each time we run the function (which it should!).

Now we’ll update the weights in the network, remembering to divide by the number of training examples as we are in batch mode:

if (learn==1)
    THETA1 = THETA1 - (alpha * (T1_DELTA / m));
    THETA2 = THETA2 - (alpha * (T2_DELTA / m));
else
    disp('J: '), disp(J);
endif

Lastly, we’ll set the return values as our new updated weights:

THETA1_new = THETA1;
THETA2_new = THETA2;

And ending the function:

endfunction

So that’s all that’s required to run all the training examples through the network. If you run the function a few times, you should see the cost of the network reducing as we expect it to:

octave_3_1

If you run the function as shown, you’ll see that the cost (J) reduces each time, but not by a lot. It would be a bit boring to have to run the function each time, so let’s set up a short script to run it many times:

XOR = [0,0,0; 0,1,1; 1,0,1; 1,1,0];
THETA1 = 0;
THETA2 = 0;

[THETA1, THETA2] = xor_nn(XOR, THETA1, THETA2, 1, 1, 0.01);
for i = 1:100000
    [THETA1, THETA2] = xor_nn(XOR, THETA1, THETA2, 0, 1, 0.01);
    if (mod(i,1000) == 0)
        disp('Iteration : '), disp(i)
        [THETA1, THETA2] = xor_nn(XOR, THETA1, THETA2);
    endif
endfor

There’s not a lot here that’s new. The first call to the xor_nn function initialises the weights. The loop calls the function 100,000 times, printing out the cost every 1000 iterations. That may seem like a lot of function calls, but remember that the network weights are being adjusted by a small amount each time.

If you run that script, you should see something like this as the output:

octave_3_2

When the loop gets to the end, the output will be close to this (the exact numbers may be different):

octave_3_3

As you can see, the network guesses small numbers (close to 0) for the first and last XOR examples and high (close to 1) for the two middle examples. This is close to what we want the network to do, so we’ve successfully trained this particular network to recognise the XOR function.

If you wish to download the code directly, it’s available here on Github:

https://github.com/StephenOman/Octave/tree/master/xor%20neural%20network

There are a lot of things that we should do to make sure that the algorithm is optimised, including checking that the default learning rate of 0.01 is actually the right rate. But that is a job for another day.

Now that you’ve seen a simple neural network recognizing patterns and learning from examples, you can implement your own in Octave for other more interesting problems.

Using Deep Learning to Analyze Genetic Mutations

Tiny part of a strand of DNA

Here’s a great application of machine learning: looking for genetic mutations that could cause disease.

The Human Genome Project mapped the sequences of base pairs in DNA during the 1990s. However, it generated an enormous amount of data, and now teams are applying machine learning techniques in order to analyze the genome. The hope is to be able to identify those mutations in the genome sequence that cause diseases and that could lead to early intervention and perhaps new treatments in the longer term.

It’s nice to see big data and machine learning applications in other areas than trying to improve advertising!

The interview below is with Brendan Frey, CEO of Deep Genomics in Canada and gives a nice overview of what that team is doing and how they are hoping to leverage machine learning.

Sources

Using deep learning to analyze genetic mutations: an interview with Brendan Frey.

Deep Genomics Website

DNA Animation: brian0918™ (Own work) [Public domain], via Wikimedia Commons

Data Mining Algorithms Explained In Plain Language

Here’s a really great resource. Raymond Li from Microsoft has written an explanation of the top 10 data mining algorithms, but in plain language. These algorithms are used a lot in machine learning too.

So if you are confused about Naive Bayes or Support Vector Machines, then take a look at Ray’s easy to understand explanations.

Top 10 data mining algorithms in plain English | rayli.net.

The Machine Vision Algorithm Beating Art Historians at Their Own Game

This is an interesting application of image processing. The machine learning algorithms are trained on a subset of paintings taken from a data set of more than 80,000. The resulting feature set has over 400 dimensions.

When presented with a painting it has not seen before, it correctly guessed the artist more than 60% of the time. It has also detected additional links between different styles and periods:

It links expressionism and fauvism, which might be expected given that the latter movement is often thought of as a type of expressionism. It links the mannerist and Renaissance styles, which clearly reflects that fact that mannerism is a form of early Renaissance painting.

However, it also apparently confuses certain styles:

… it often confuses examples of abstract expressionism and action paintings, in which artists drip or fling paint and step on the canvas. Saleh and Elgammal [the creators of the ML algorithms] … say that this kind of mix-up would be entirely understandable for a human viewer. “’Action painting’ is a type or subgenre of “abstract expressionism,’” they point out.

Of course, this could also mean that the machine is correct and different “genres” of abstact paintings are completely arbitrary. But what is does highlight is that machine learning has a way to go before it can start offering subjective opinions.

via The Machine Vision Algorithm Beating Art Historians at Their Own Game | MIT Technology Review.

Turing Test Contest Details

Alan Turing (Image Public Domain)
Alan Turing (Image Public Domain)

The recent movie The Imitation Game, starring Benedict Cumberbatch, brought the work of Alan Turing to a much wider audience than was previously the case.

The movie has an interesting scene where Alan discusses the concept of an intelligent machine with the policeman who arrested him. He asks if you would consider a machine to be intelligent if a human couldn’t tell the difference between it and another human when conversing with it.

He published this idea in a paper in 1950 and he called the scenario The Imitation Game. Later this became known as the Turing Test.

Every year, a competition is held to test real world Artificial Intelligence algorithms to see if they can pass the Turing Test. This competition is called the Loebner Prize. Since the competition started in 1991, no algorithm has managed to pass the test.

This year, the competition will be held at Bletchley Park, where Alan Turing worked on breaking the German Enigma Code. The deadline for applications is 1st July 2015 and the final will be held on 19th September.

You can read Turing’s original paper proposing the test here:

Computing Machinery And Intelligence by A.M.Turing

The details of the Loebner Prize are here:

AISB – The Society for the Study of Artificial Intelligence and Simulation of Behaviour – Loebner Prize.