I will present a basic solution to realize automatic testing for machine learning algorithm. There is many languages used for machine learning. Python is one of the most popular language for machine learning.It is not the fastest or the easiest language but it is a general purpose language that does a bit of everything.
I am gonna use the machine learning algorithm made by Michael E Nielsen http://neuralnetworksanddeeplearning.com. It is coded in Python. This ebook explains Neural Networks and deep learning with code examples. That’s a really good article to start learning machine learning.
This article explains a machine learning algorithm (neural networks and deep learning) . Michael E Nielsen uses these algorithms to resolve the problem of recognizing handwritten numbers.
I will show the tools i use to code and test in Python language. Then i will present some of my basic unit test to test the code source of Michael E Nielsen. I will also present a short unit test for Stochastic Gradient Descent. Also i will show my solution to launch all Python tests from Jenkins
Develop and test Python code in Eclipse
There are many IDE to code in Python. I chose to use Eclipse with Pydev plugin. I used it because it is free and easy to use. I also use “git” for source control.
Once you have installed Pydev plugin in eclipse, you need to configure eclipse if you would like to run unit tests inside it. One of my problem was to run unit tests inside eclipse.
As you can see from the picture , my unit tests are in a folder called “test”. The “src” folder contains the source code of the neural networks algorithm. My initial problem was that my unit tests could not import the source code. For example I could not import the class Network.
The solution for this problem was to configure eclipse Pydev plugin like in this link. http://stackoverflow.com/questions/4631377/unresolved-import-issues-with-pydev-and-eclipse
Go to the pane of “PyDev – PYTHONPATH” of the python project and add your source code in external libraries.
Now i can launch the test inside Eclipse.
Create Basic Unit Tests with Python
Assuming you are using version 3.7 you should inform yourself about unittest package from the Python manual https://docs.python.org/3.7/library/unittest.html .
This is the class i would like to test :
class Network(object): def __init__(self, sizes): """The list ``sizes`` contains the number of neurons in the respective layers of the network. For example, if the list was [2, 3, 1] then it would be a three-layer network, with the first layer containing 2 neurons, the second layer 3 neurons, and the third layer 1 neuron. The biases and weights for the network are initialized randomly, using a Gaussian distribution with mean 0, and variance 1. Note that the first layer is assumed to be an input layer, and by convention we won't set any biases for those neurons, since biases are only ever used in computing the outputs from later layers.""" self.num_layers = len(sizes) self.sizes = sizes self.biases = [np.random.randn(y, 1) for y in sizes[1:]] self.weights = [np.random.randn(y, x) for x, y in zip(sizes[:-1], sizes[1:])]
I am gonna test only few parts of the class Network and run a testcase from the chapter 1 http://neuralnetworksanddeeplearning.com/chap1.html.
This is my test class which test the Network class.
import unittest import network import mnist_loader class test_network(unittest.TestCase): def testCaseRecognizeHandWrittenDigits(self): #loading the MNIST data training_data, validation_data, test_data = mnist_loader.load_data_wrapper() #set up a Network with 30 hidden neurons net = network.Network([784, 30, 10]) #Finally, we'll use stochastic gradient descent to learn #from the MNIST training_data over 30 epochs, with a mini-batch size of 10, and a learning rate of eta=3.0, epochs = 3#30 net.SGD(training_data, epochs, 10, 3.0, test_data=test_data) def testnetwork(self): print "init network" size = [784, 30, 10] net = network.Network(size) # verify the number of items in the collection size self.assertEqual(net.num_layers, 3)
The test “testCaseRecognizeHandWrittenDigits” just launch one testcase of the chapter 1. It does not verify anything. It is checking if everything is compiling but we have no idea if the code is doing something useful.
The test “testnetwork” is a unit test for the object Network. We verify that the number of items is correct. When i launch the tests from eclipse the results are OK :
As you can see from the previous testcase i just run SGD over three epochs instead of 30.SGD is the method which implements stochastic gradient descent.
System Test of Stochastic Gradient Descent algorithm
Now i am gonna test SGD of the object Network2 of the code source.
In practice, stochastic gradient descent(SGD) is a commonly used and powerful technique for learning in neural networks, and it’s the basis for most of the learning techniques we’ll develop in this book.
The unit test I have created will tell us if the algorithm detect handwritten numbers with more than 90% accuracy.
class test_network2(unittest.TestCase): def testCaseRecognizeHandWrittenDigits(self): #loading the MNIST data training_data, validation_data, test_data = mnist_loader.load_data_wrapper() #set up a Network with 30 hidden neurons net = network2.Network([784, 30, 10], cost=network2.CrossEntropyCost) net.large_weight_initializer() #We set the learning rate to eta and we train for 3 epochs epochs = 3# for speed i chose 3 instead of 30 global_evaluation_data = net.SGD(training_data, epochs, 10, 0.5, evaluation_data=test_data, monitor_evaluation_cost=True,monitor_evaluation_accuracy=True,monitor_training_cost=True, monitor_training_accuracy=True) total_accuracy_training_data = global_evaluation_data # verify that accuracy of training data for all epoch is superiori to 90% for accuracy in total_accuracy_training_data : print accuracy percentage = accuracy / 10000.0 print percentage self.assertGreater(percentage, 0.9, "accuracy must be superior to 90 percent for all epoch")
As you can see from the code the methode SGD returns the accuracy of the training data and also evaluation data. In the code I am just verifying that for all epochs the accuracy of the detection of handwritten images is superior to 90%. And this verification is done just for training data.
This is just an example of unit test for machine learning algorithm. By testing everyday this testcase with Jenkins , we verify that any modification in our algorithm won’t diminish the accuracy of the detection.
Here a caption of the result when i run the unit test from eclipse :
How to run Python tests from Jenkins ?
To launch all Python tests of the project everyday I use Jenkins and nose2.
On Ubuntu it is easy to install Nose2. Follow the instructions of this link https://nose2.readthedocs.io/en/latest/getting_started.html .
Once nose2 is installed just go to the top directory of your python project for a test. Launch nose2 to run all the Python tests of your project. To give an example of the result with Neural Networks unit tests i have previously created :
Finally create a new job from Jenkins.Configure the job to get the code from a git repository(for instance) and then launch all tests with nose2. See this link for more information about nose2 and Jenkins integration : https://jenkins.io/solutions/python/
We can imagine many more tests for Neural Networks algorithm. We could test if the algorithm is learning fast or slow. We could check problems such as overfitting, underfitting, etc…
This article about TDD machine learning can give us more ideas about what to verify in our machine learning algorithms :
I may do another post later for more unit tests for these machine learning algorithms.