Tag Archives: Jenkins

Basic automatic testing of machine learning algorithm in Python

Introduction

I will present a basic solution to realize automatic testing for machine learning algorithm. There is many languages used for machine learning. Python is one of the most popular language for machine learning.It is not the fastest or the easiest language but it is a general purpose language that does a bit of everything.

I am gonna use the machine learning algorithm made by Michael E Nielsen http://neuralnetworksanddeeplearning.com. It is coded in Python. This ebook explains Neural Networks and deep learning with code examples. That’s a really good article to start learning machine learning.

This article explains a machine learning algorithm (neural networks and deep learning) . Michael E Nielsen uses these algorithms to resolve the problem of recognizing handwritten numbers.

Solution

I will show the tools i use to code and test in Python language. Then i will present some of my basic unit test to test the code source of Michael E Nielsen. I will also present a short unit test for Stochastic Gradient Descent. Also i will show my solution to launch all Python tests from Jenkins

Develop and test Python code in Eclipse

There are many IDE to code in Python. I chose to use Eclipse with Pydev plugin. I used it because it is free and easy to use. I also use “git” for source control.

Once you have installed Pydev plugin in eclipse, you need to configure eclipse if you would like to run unit tests inside it. One of my problem was to run unit tests inside eclipse.

project-neural
Neural networks project in Eclipse from the ebook

As you can see from the picture , my unit tests are in a folder called “test”. The “src” folder contains the source code of the neural networks algorithm. My initial problem was that my unit tests could not import the source code. For example I could not import the class Network.

The solution for this problem was to configure eclipse Pydev plugin  like in this link. http://stackoverflow.com/questions/4631377/unresolved-import-issues-with-pydev-and-eclipse

Go to the pane of “PyDev – PYTHONPATH” of the python project and add your source code in external libraries.

configure_eclipse_pydev
Configure Pydev to launch unit tests inside Eclipse

Now i can launch the test inside Eclipse.

Create Basic Unit Tests with Python

Assuming you are using version 3.7 you should inform yourself about unittest package from the Python manual  https://docs.python.org/3.7/library/unittest.html .

This is the class i would like to test :

class Network(object):

def __init__(self, sizes):
        """The list ``sizes`` contains the number of neurons in the
        respective layers of the network.  For example, if the list
        was [2, 3, 1] then it would be a three-layer network, with the
        first layer containing 2 neurons, the second layer 3 neurons,
        and the third layer 1 neuron.  The biases and weights for the
        network are initialized randomly, using a Gaussian
        distribution with mean 0, and variance 1.  Note that the first
        layer is assumed to be an input layer, and by convention we
        won't set any biases for those neurons, since biases are only
        ever used in computing the outputs from later layers."""
        self.num_layers = len(sizes)
        self.sizes = sizes
        self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
        self.weights = [np.random.randn(y, x)
                        for x, y in zip(sizes[:-1], sizes[1:])]

I am gonna test only few parts of the class Network and run a testcase from the chapter 1 http://neuralnetworksanddeeplearning.com/chap1.html.

This is my test class which test the Network class.

import unittest
import network
import mnist_loader



class test_network(unittest.TestCase):
    
    
    def testCaseRecognizeHandWrittenDigits(self):
        #loading the MNIST data
        training_data, validation_data, test_data = mnist_loader.load_data_wrapper()
        #set up a Network with 30 hidden neurons
        net = network.Network([784, 30, 10])
        #Finally, we'll use stochastic gradient descent to learn 
        #from the MNIST training_data over 30 epochs, with a mini-batch size of 10, and a learning rate of eta=3.0, 
        epochs = 3#30
        net.SGD(training_data, epochs, 10, 3.0, test_data=test_data)
        

    
    def testnetwork(self):
        print "init network"
        size = [784, 30, 10]
        net = network.Network(size)
        
        # verify the number of items in the collection size
        self.assertEqual(net.num_layers, 3)  

The test “testCaseRecognizeHandWrittenDigits” just launch one testcase of the chapter 1. It does not verify anything. It is checking if everything is compiling but we have no idea if the code is doing something useful.

The test “testnetwork” is a unit test for the object Network. We verify that the number of items is correct. When i launch the tests from eclipse the results are OK :

run_test_network
Run as Python Unittest from Eclipse

As you can see from the previous testcase i just run SGD over three epochs instead of 30.SGD is the method which implements stochastic gradient descent.

System Test of Stochastic Gradient Descent algorithm

Now i am gonna test SGD of the object Network2 of the code source.

In practice, stochastic gradient descent(SGD) is a commonly used and powerful technique for learning in neural networks, and it’s the basis for most of the learning techniques we’ll develop in this book.

The unit test I have created will tell us if the algorithm detect  handwritten numbers with more than 90% accuracy.

class test_network2(unittest.TestCase):
        
    def testCaseRecognizeHandWrittenDigits(self):
        #loading the MNIST data
        training_data, validation_data, test_data = mnist_loader.load_data_wrapper()
        #set up a Network with 30 hidden neurons
        net = network2.Network([784, 30, 10], cost=network2.CrossEntropyCost)
        net.large_weight_initializer()
        #We set the learning rate to eta and we train for 3 epochs
        epochs = 3# for speed i chose 3 instead of 30
        global_evaluation_data = net.SGD(training_data, epochs, 10, 0.5, evaluation_data=test_data, monitor_evaluation_cost=True,monitor_evaluation_accuracy=True,monitor_training_cost=True, monitor_training_accuracy=True)
        total_accuracy_training_data = global_evaluation_data[1]
        
        # verify that accuracy of training data for all epoch is superiori to 90%
        for accuracy in total_accuracy_training_data :
            print accuracy
            percentage = accuracy / 10000.0
            print percentage
            self.assertGreater(percentage, 0.9, "accuracy must be superior to 90 percent for all epoch")

As you can see from the code the methode SGD returns the accuracy of the training data and also evaluation data. In the code I am just verifying that for all epochs the accuracy of the detection of handwritten images is superior to 90%. And this verification is done just for training data.

This is just an example of unit test for machine learning algorithm. By testing everyday this testcase with Jenkins , we verify that any modification in our algorithm won’t diminish the accuracy of the detection.

Here a caption of the result when i run the unit test from eclipse :

eclipse_unit_test_results
The accuracy of the training data for all epochs is superior to 90%

How to run Python tests from Jenkins ?

To launch all Python tests of the project everyday I use Jenkins and nose2.

On Ubuntu it is easy to install Nose2. Follow the instructions of this link https://nose2.readthedocs.io/en/latest/getting_started.html .

Once nose2 is installed just go to the top directory of your python project for a test. Launch nose2 to run all the Python tests of your project. To give an example of the result with Neural Networks unit tests i have previously created :

nose2_results
All four tests i have created have been run with success

Finally create a new job from Jenkins.Configure the job to get the code from a git repository(for instance) and then launch all tests with nose2. See this link for more information about nose2 and Jenkins integration : https://jenkins.io/solutions/python/

Conclusion

We can imagine many more tests for Neural Networks algorithm. We could test if the algorithm is learning fast or slow. We could check problems such as overfitting, underfitting, etc…

This article about TDD machine learning can give us more ideas about what to verify in our machine learning algorithms :

https://www.safaribooksonline.com/library/view/thoughtful-machine-learning/9781449374075/ch01.html

I may do another post later for more unit tests for these machine learning algorithms.

 

 

 

Advertisements

Automate Unit testing of Javascripts with Karma Runner

The problem :

Working on TDD/BDD method with Javascripts is more tricky than with Java language. In the Java world you have Junit to test your unit tests and maven to execute all of them and create easily reports in Jenkins. In the Javascript world  the equivalent to run Junit is Qunit,Jasmine, Mocha . There is more tools because there is more complexity and these tools do not work perfectly . Not only you have different style of Unit testing but also different Test runner too. In Java you used just the one given by default.

Initially I was using Qunit with a maven plugin to run the tests. But it meant i could not use Jasmine 2 or Mocha. Qunit has limitations too that’s why i moved to Jasmine 2.0.

Now I use Karma test runner to run the Jasmine Unit Tests . But i can also run Qunit tests if I would like to. Karma is really flexible and can also run Mocha. It can also exclude or include some files to test them.  The other complexity of Javascript testing is the browser. Indeed Javascript can be run inside the Chrome,Firefox, Internet Explorer Browsers . For CI testing I used a headless browser called PhantomJs.

Also, Karma has a plugin to analyse Javascript code coverage.
This is the  best generic tool runner i found so far to run Javascript unit tests.

The solution :

I will present the solution I implemented to launch Javascript tests from Jenkins platform. Karma is launched From Jenkins. Also Karma is configured to find Javascripts Jasmine tests and generate a Junit report style, code coverage report.

Realise Unit test with Jasmine 2.0

You need to write specification .spec file describing the test. Example for CommonUtilSpec.js

describe("CommonUtil", function() {

it("trim testing ", function() {
var trimmed = trimFunc('tri');

expect(trimmed).toBe("tri");
expect(trimmed).not.toBe(null);
});

});

The documentation for Jasmine 2 :http://jasmine.github.io/2.0/introduction.html

Install tools on the Jenkins machine

Install Node.js like in this link :
http://karma-runner.github.io/1.0/intro/installation.html

Get as well yeoman and bower to get javascript packages
https://inviqa.com/blog/testing-javascript-get-started-jasmine-0

NOTE : Bower is an equivalent de maven for javascript http://stackoverflow.com/questions/12334346/dependency-management-and-build-tool-for-javascript

Configure Karma configuration for Jenkins

I have used the following link to configure Karma for Jenkins https://karma-runner.github.io/1.0/plus/jenkins.html. The configuration for Karma of this link have compilation errors.

I have installed EnvInject in order to specify some environment variables for Jenkins.

Configure Karma for Jenkins. I have used the information for the junit reporter at https://www.npmjs.com/package/karma-junit-reporter . It does junit style unit test reporting. It is very practical. I have also used the coverage plugin in order to know which part of the code have been tested https://www.npmjs.com/package/karma-coverage .

The content of my file :

module.exports = function(config) {
config.set({

// base path that will be used to resolve all patterns (eg. files, exclude)
basePath: '',

// frameworks to use
// available frameworks: https://npmjs.org/browse/keyword/karma-adapter
frameworks: ['jasmine'],

// list of files / patterns to load in the browser
files: [
'/myproject/src/test/**/*.js',
'/myproject/src/main/Javascripts/commonUtil.js'
],

// list of files to exclude
exclude: [
],

// preprocess matching files before serving them to the browser
// available preprocessors: https://npmjs.org/browse/keyword/karma-preprocessor
preprocessors: {
'/myproject/src/main/**/*.js' : ['coverage']
},

// test results reporter to use
// possible values: 'dots', 'progress'
// available reporters: https://npmjs.org/browse/keyword/karma-reporter

reporters: ['progress', 'junit','coverage'],

// the default configuration
junitReporter: {
outputDir: '', // results will be saved as $outputDir/$browserName.xml
outputFile: 'test_jasmine_js.xml', // if included, results will be saved as $outputDir/$browserName/$outputFile
suite: '', // suite will become the package name attribute in xml testsuite element
useBrowserName: true, // add browser name to report and classes names
nameFormatter: undefined, // function (browser, result) to customize the name attribute in xml testcase element
classNameFormatter: undefined, // function (browser, result) to customize the classname attribute in xml testcase element
properties: {} // key value pair of properties to add to the section of the report
},

// web server port
port: 9876,

// enable / disable colors in the output (reporters and logs)
colors: true,

// level of logging
// possible values: config.LOG_DISABLE || config.LOG_ERROR || config.LOG_WARN || config.LOG_INFO || config.LOG_DEBUG
logLevel: config.LOG_INFO,

// enable / disable watching file and executing tests whenever any file changes
autoWatch: true,

// start these browsers
// available browser launchers: https://npmjs.org/browse/keyword/karma-launcher
browsers: ['PhantomJS'],

plugins : [
'karma-phantomjs-launcher',
'karma-jasmine',
'karma-html-reporter',
'karma-junit-reporter',
'karma-coverage'
],

// Continuous Integration mode
// if true, Karma captures browsers, runs the tests and exits
singleRun: true,

// Concurrency level
// how many browser should be started simultaneous
concurrency: Infinity
})
}

Configure and launch Jenkins

I configured jenkins to launch karma with shell script plugin. This solution is not optimized because the npm plugins are being installed at each launch. These plugins should be installed once.

npm install karma-jasmine --save-dev
npm install jasmine-core --save-dev
npm install karma-phantomjs-launcher --save-dev
npm install karma-junit-reporter --save-dev
npm install karma-jasmine-html-reporter --save-dev
npm install karma-html-reporter --save-dev
cp /home/myuser/myapp my.jenkins.conf.js .
karma start my.jenkins.conf.js

Here is a picture of the configuration in Jenkins :

config_jenkins_karma

Once you have finished the configuration, launch the tests. The result of the report should look like that

result_phantomsjs

The plugin coverage is really neat and show in details where the test were not covered  :

javascript_coverage

 

TROUBLESHOOTING

Error 1 : No provider for “framework:jasmine”! (Resolving: framework:jasmine
Correction : npm install karma-jasmine –save-dev

Error 2 : Error: Cannot find module ‘jasmine-core’
Correction : -npm install ‘jasmine-core –save-dev

Error 3 : Cannot load browser “PhantomJS”: it is not registered! Perhaps you are missing some plugin?
Correction: npm install karma-phantomjs-launcher –save-dev

Error 4 : Can not load reporter “junit”,
Correction:

plugins : [
        'karma-phantomjs-launcher',
        'karma-jasmine',
        'karma-junit-reporter'
    ],

Error 5 : Cannot find plugin “karma-junit-reporter
Correction: -npm install ‘karma-junit-reporter –save-dev

Error 6 : reporter.junit Cannot write JUnit xml
Problem was specific to where my conf file was located.The fix was to copy the conf file locally in working directory of the slave.

Clean up Jenkins Workspaces

Problem :

When you use Jenkins for continuous integration, you can quickly have disk usage problems on slave nodes and sometimes with the master node too. Disk usage problems happen when you have many jobs . For example one of our jobs is taking up to 2GB. Therefore where Jenkins sits we might have disk usage problems.

Delete Workspace when build is done

For my projects I use often this option to delete workspace after build :https://wiki.jenkins-ci.org/display/JENKINS/Workspace+Cleanup+Plugin

Disadvantage :

After build, sometimes it is useful to keep the workspace  in order to understand failures. You can always decide to suppress this option temporarily if there is a problem.

Clean up Slaves Workspaces

Slaves workspaces are not deleted by this method. Therefore i use as well a script to delete slave workspaces :

https://gist.github.com/rb2k/8372402

To execute the script on the master node, go to “manage Jenkins” -> “ Console  Script”.

Comment the line “workspacePath.deleteRecursive()” if you want to verify which folders are going to be deleted  .

Write a clean up job for Jenkins :

It is a good practice to clean up Jenkins’s Workspaces.https://wiki.jenkins-ci.org/display/JENKINS/Jenkins+Best+Practices

Write jobs for your maintenance tasks, such as cleanup operations to avoid full disk problems.

upload_plugin_jenkins
Go to “Manage Jenkins”-> “Manage Plugin”.After reboot of Jenkins the plugin can be used.
  • Secondly set up and configure Groovy. Go to Manage Jenkins -> Configure System where you can configure Groovy. Then start to follow the steps indicated on the wiki of the Groovy  plugin(previous link).

Few tips how I installed Groovy when following the steps of the wiki

Use repo URL by extracting it from zip file :

http://stackoverflow.com/questions/29931091/upgrade-groovy-installation-on-slave-node-to-a-recent-version

Choose a label for the Groovy installation. For example it can be Groovy.

Download Url for binary archive :
https://bintray.com/artifact/download/groovy/maven/groovy-binary-2.3.11.zip

  • Then you can just create your Jenkins job to clean up daily workspaces. In configuration of the job you choose “Execute System Groovy script”.
groovy_jobv2y
Put the previous command from gist.github here
  • This Jenkins job saved me lot of time ! Since then i did not have disk problem ! I don’t need anymore to clean up the disk manually.

Clean up Master Workspace

It happens from time to time that the master node is full. It is necessary to clean up the workspace to launch jobs. In order to do so go to the console of the master node first.

Then choose the option ‘Script Console’ in the master node. Then i simply look for jobs which are taking the most of space.

In order to do so first find the directory of JENKINS_HOME  with this command for example :

println "env".execute().text

Find out which jobs are taking the most spaces with :

println "du -mh [JENKINS_HOME]/jobs".execute().text

Suppress the directory taking the most space(you need admin rights)

Example ::

println "rm -rf   [JENKINS_HOME]/jobs/[my_job_to_clean_up]/builds".execute().text

Link to the book “The Pragmatic Programmer” on Amazon

The Pragmatic Programmer: From Journeyman to Master

I advised this book because it helped me to understand the big picture of informatics : https://julienprog.wordpress.com/2015/03/14/review-book-pragmatic-programmer/

Other commands useful

Check where is located the path of the slaves directory  :

http://stackoverflow.com/questions/11387762/how-to-trigger-manual-clean-of-hudson-workspaces

def hi = hudson.model.Hudson.instance
hi.getItems(hudson.model.Job).each {
job ->
println(job.displayName)
println(job.isDisabled())
println(job.workspace)
}

Delete unused jobs :

https://gist.github.com/ceilfors/1400fd590632db1f51ca

How to create groovy script which clean up workspaces :

http://stackoverflow.com/questions/11387762/how-to-trigger-manual-clean-of-hudson-workspaces

List of groovy scripts for Jenkins

https://gist.github.com/dnozay/e7afcf7a7dd8f73a4e05

Wipe out workspaces for a specific job :

https://wiki.jenkins-ci.org/display/JENKINS/Wipe+workspaces+for+a+set+of+jobs+on+all+nodes