Tag Archives: Machine learning

Basic automatic testing of machine learning algorithm in Python

Introduction

I will present a basic solution to realize automatic testing for machine learning algorithm. There is many languages used for machine learning. Python is one of the most popular language for machine learning.It is not the fastest or the easiest language but it is a general purpose language that does a bit of everything.

I am gonna use the machine learning algorithm made by Michael E Nielsen http://neuralnetworksanddeeplearning.com. It is coded in Python. This ebook explains Neural Networks and deep learning with code examples. That’s a really good article to start learning machine learning.

This article explains a machine learning algorithm (neural networks and deep learning) . Michael E Nielsen uses these algorithms to resolve the problem of recognizing handwritten numbers.

Solution

I will show the tools i use to code and test in Python language. Then i will present some of my basic unit test to test the code source of Michael E Nielsen. I will also present a short unit test for Stochastic Gradient Descent. Also i will show my solution to launch all Python tests from Jenkins

Develop and test Python code in Eclipse

There are many IDE to code in Python. I chose to use Eclipse with Pydev plugin. I used it because it is free and easy to use. I also use “git” for source control.

Once you have installed Pydev plugin in eclipse, you need to configure eclipse if you would like to run unit tests inside it. One of my problem was to run unit tests inside eclipse.

project-neural
Neural networks project in Eclipse from the ebook

As you can see from the picture , my unit tests are in a folder called “test”. The “src” folder contains the source code of the neural networks algorithm. My initial problem was that my unit tests could not import the source code. For example I could not import the class Network.

The solution for this problem was to configure eclipse Pydev plugin  like in this link. http://stackoverflow.com/questions/4631377/unresolved-import-issues-with-pydev-and-eclipse

Go to the pane of “PyDev – PYTHONPATH” of the python project and add your source code in external libraries.

configure_eclipse_pydev
Configure Pydev to launch unit tests inside Eclipse

Now i can launch the test inside Eclipse.

Create Basic Unit Tests with Python

Assuming you are using version 3.7 you should inform yourself about unittest package from the Python manual  https://docs.python.org/3.7/library/unittest.html .

This is the class i would like to test :

class Network(object):

def __init__(self, sizes):
        """The list ``sizes`` contains the number of neurons in the
        respective layers of the network.  For example, if the list
        was [2, 3, 1] then it would be a three-layer network, with the
        first layer containing 2 neurons, the second layer 3 neurons,
        and the third layer 1 neuron.  The biases and weights for the
        network are initialized randomly, using a Gaussian
        distribution with mean 0, and variance 1.  Note that the first
        layer is assumed to be an input layer, and by convention we
        won't set any biases for those neurons, since biases are only
        ever used in computing the outputs from later layers."""
        self.num_layers = len(sizes)
        self.sizes = sizes
        self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
        self.weights = [np.random.randn(y, x)
                        for x, y in zip(sizes[:-1], sizes[1:])]

I am gonna test only few parts of the class Network and run a testcase from the chapter 1 http://neuralnetworksanddeeplearning.com/chap1.html.

This is my test class which test the Network class.

import unittest
import network
import mnist_loader



class test_network(unittest.TestCase):
    
    
    def testCaseRecognizeHandWrittenDigits(self):
        #loading the MNIST data
        training_data, validation_data, test_data = mnist_loader.load_data_wrapper()
        #set up a Network with 30 hidden neurons
        net = network.Network([784, 30, 10])
        #Finally, we'll use stochastic gradient descent to learn 
        #from the MNIST training_data over 30 epochs, with a mini-batch size of 10, and a learning rate of eta=3.0, 
        epochs = 3#30
        net.SGD(training_data, epochs, 10, 3.0, test_data=test_data)
        

    
    def testnetwork(self):
        print "init network"
        size = [784, 30, 10]
        net = network.Network(size)
        
        # verify the number of items in the collection size
        self.assertEqual(net.num_layers, 3)  

The test “testCaseRecognizeHandWrittenDigits” just launch one testcase of the chapter 1. It does not verify anything. It is checking if everything is compiling but we have no idea if the code is doing something useful.

The test “testnetwork” is a unit test for the object Network. We verify that the number of items is correct. When i launch the tests from eclipse the results are OK :

run_test_network
Run as Python Unittest from Eclipse

As you can see from the previous testcase i just run SGD over three epochs instead of 30.SGD is the method which implements stochastic gradient descent.

System Test of Stochastic Gradient Descent algorithm

Now i am gonna test SGD of the object Network2 of the code source.

In practice, stochastic gradient descent(SGD) is a commonly used and powerful technique for learning in neural networks, and it’s the basis for most of the learning techniques we’ll develop in this book.

The unit test I have created will tell us if the algorithm detect  handwritten numbers with more than 90% accuracy.

class test_network2(unittest.TestCase):
        
    def testCaseRecognizeHandWrittenDigits(self):
        #loading the MNIST data
        training_data, validation_data, test_data = mnist_loader.load_data_wrapper()
        #set up a Network with 30 hidden neurons
        net = network2.Network([784, 30, 10], cost=network2.CrossEntropyCost)
        net.large_weight_initializer()
        #We set the learning rate to eta and we train for 3 epochs
        epochs = 3# for speed i chose 3 instead of 30
        global_evaluation_data = net.SGD(training_data, epochs, 10, 0.5, evaluation_data=test_data, monitor_evaluation_cost=True,monitor_evaluation_accuracy=True,monitor_training_cost=True, monitor_training_accuracy=True)
        total_accuracy_training_data = global_evaluation_data[1]
        
        # verify that accuracy of training data for all epoch is superiori to 90%
        for accuracy in total_accuracy_training_data :
            print accuracy
            percentage = accuracy / 10000.0
            print percentage
            self.assertGreater(percentage, 0.9, "accuracy must be superior to 90 percent for all epoch")

As you can see from the code the methode SGD returns the accuracy of the training data and also evaluation data. In the code I am just verifying that for all epochs the accuracy of the detection of handwritten images is superior to 90%. And this verification is done just for training data.

This is just an example of unit test for machine learning algorithm. By testing everyday this testcase with Jenkins , we verify that any modification in our algorithm won’t diminish the accuracy of the detection.

Here a caption of the result when i run the unit test from eclipse :

eclipse_unit_test_results
The accuracy of the training data for all epochs is superior to 90%

How to run Python tests from Jenkins ?

To launch all Python tests of the project everyday I use Jenkins and nose2.

On Ubuntu it is easy to install Nose2. Follow the instructions of this link https://nose2.readthedocs.io/en/latest/getting_started.html .

Once nose2 is installed just go to the top directory of your python project for a test. Launch nose2 to run all the Python tests of your project. To give an example of the result with Neural Networks unit tests i have previously created :

nose2_results
All four tests i have created have been run with success

Finally create a new job from Jenkins.Configure the job to get the code from a git repository(for instance) and then launch all tests with nose2. See this link for more information about nose2 and Jenkins integration : https://jenkins.io/solutions/python/

Conclusion

We can imagine many more tests for Neural Networks algorithm. We could test if the algorithm is learning fast or slow. We could check problems such as overfitting, underfitting, etc…

This article about TDD machine learning can give us more ideas about what to verify in our machine learning algorithms :

https://www.safaribooksonline.com/library/view/thoughtful-machine-learning/9781449374075/ch01.html

I may do another post later for more unit tests for these machine learning algorithms.

 

 

 

Advertisements