Skip to content

Releases: jeff1evesque/machine-learning

Standardize build, testing, sphinx doc, minor frontend facelift

12 Jun 02:49
5ba749e
Compare
Choose a tag to compare

This release involved some major projects, some requiring more effort than others. First, the vagrant development environment was replaced with the rancher orchestration. During this process, we created a single bash script called install_rancher. This script attempts to install a rancher server, then spins up docker containers, contained in a rancher stack. But, it was difficult to generalize this script across multiple different operating systems (i.e. windows 7/10, osx, linux).

Due to limitations of resources, install_rancher was primarily developed within windows 7, and briefly tested on windows 10. In the upcoming milestone, we are likely going to modify this script to work within a flavor for linux. This way we can launch rancher on some internet hosting, with a webhook to the master branch. However, for the time being, users can opt to use the provided docker-compose.yml. If bugs are found with this method, please help us and report a bug. We are pushing towards getting rancher working. However, the docker-compose method should be a stable alternate approach.

The next biggest accomplishment, mostly facilitates our development in upcoming milestones. More specifically, we have optimized our unit testing. This includes splitting up the linting, pytest, as well as frontend unit tests, to be segregated scripts. Essentially, each segment can be run manually in the local development environment. But, most importantly, we have improved the runtime on the travis ci, by running each script as a concurrent job. Previously, the entire travis build would take up to 21+ minutes. We have improved the same build, with additional package installation, to roughly 9 minutes. This also includes the several jest + enzyme frontend unit tests that have already been integrated as npm scripts, intended to be run within the browserify docker container.

Our next accomplishment, really ties in with the first. During the process of dockerizing our vagrant build, we decided to have puppet be the method to provision our containers. Some arguments can be made here. But, ultimately, we like the idea of being able to enforce our environment state, especially if a container could run for an unknown amount of time. Therefore, our puppet modules were completely refactored, by cleverly implementing class variables, as well as hiera configurations. Many times the two choices, provided the same configuration options. These were put to good use, in the corresponding dockerfiles. On the same note, our pytests have been configured to allow users to choose, whether to build an environment based on local dockerfiles, or the equivalent dockerhub containers.

Lastly, we did some minor frontend facelifting, as well as update the scikit-learn library to 0.19.1. The frontend improvements include a solid top navigation bar. When a user logs into the application, a black solid bar will exist at the top of the screen, and include a series of links, associated to the users account. Furthermore, we integrated bootstrap, to ensure the menu bar was responsive, as well as a couple of our other pages. We also added some cool range-sliders to our existing model_generate page, allowing users to slide a value, for a corresponding penalty, or gamma values, when generating an svm, or svr model. Then, we added a frontpage animation, at least until we have a better design. The animation was a pretty cool D3JS. However, it was a tedious process to convert the syntax to be reactjs compliant.

We have focused largely on a standardizing the environment, and attempting to choose a set of technologies for this overall project. So, it's about time to bridge the various algorithms with either a web-interface, or a rest api endpoint. Now, we'll attempt to interface a variety of additional algorithms in the upcoming milestones. However, this will likely involve refactoring our database(s), so users can interface with proper permissions, and abilities to perform particular actions associated with the new algorithms.

Another thing: we have improved our sphinx documentation, and launched on github pages:

Separated web and rest programmatic-api, build + web-interface bug fixes

20 Nov 02:20
13ae291
Compare
Choose a tag to compare

This release encompasses issues pertaining to milestone 0.6.1.

This short milestone has been motivated by the following bugs:

  • web-interface, returned 500 http errors, upon generating a model, using uploaded json dataset(s)
  • web-interface not properly logging into designated flask.log
  • libssl-dev package broke, during build, since minor version gets updated frequently
  • existing bagr datasets, incorrectly use classification style values

Along the way of squashing the above, the following enhancements were made:

  • views.py has been split, to allow different flask blueprint implementations
  • two separate nginx reverse proxies regulate the gunicorn webservers
    • web-interface
    • rest programmatic-api
  • rest programmatic-api now requires all routes (except /login) to submit a valid token
    • existing unit tests have been updated respectively
  • corresponding README.md, as well as existing documentation have been updated

Mongodb, improved security, consistent package installation, better reactjs syntax, reusable unit tests

05 Nov 00:49
dd1b778
Compare
Choose a tag to compare

This release encompasses issues pertaining to milestone 0.6.

This release has taken a significant amount of time, largely due to many important factors. First, the single mariadb database, has been split, allowing ML related datasets to be stored in mongodb. This was streamlined, to improve performance, and reduce complexity of the code. Now, users can supply datasets (i.e. json file), without needing to be parsed into several dedicated sql database tables. Additionally, anonymous users are limited to upload a maximum of 50 mongodb collections, while the authenticated users, are granted 150. To further on this, the sum of all collections, are allowed 10 (anonymous users), and 30 (authenticated users) documents. These values, can be configured, through the provided application.yaml, which will require the corresponding webserver(s) to be restarted.

Also, our flask app, is wired up in such a way now, that a mariadb, and mongodb connection is always open, and ready for transactions. This is a better solution, since each client accessing the ML application, doesn't need to open up a new connection each time they perform an operation. This was more of a problem, when the database was restricted to a single mariadb, since corresponding sql transactions used to be very granular. If related questions come up, regarding the benefits of having connection pools (versus having a single open connection), we can briefly argue, to spin up a dedicated machine, containing another flask instance. However, this application, is not yet production grade.

Additionally, major changes has occurred to help improve many security aspects of the application. For example, now the vagrant up build includes https://, as well as redis being implemented in place of the default, traditional cookie implementation. Users can /login, through the browser, and have their user information stored in redis, while having a randomized value, corresponding to their redis key, returned to them intrinsically as a cookie. This is better than sending an entire cookie, containing all of the user information. Similarly, users can now authenticate through the programmatic-api. Upon a successful post login, flask will return a token, which can be used on successive rest calls, to validate their session as a valid user.

Also, our build process, of enforcing the installation of particular packages (across multiple package managers), has been dynamically streamlined, based on the definition of packages.yaml:

    ## iterate 'packages' hash
    $packages.each |String $provider, $providers| {
        if ($provider in ['apt', 'npm', 'pip']) {
            $providers['general'].each|String $package, String $version| {
                package { $package:
                    ensure   => $version,
                    provider => $provider,
                    require  => [
                        Class['apt'],
                        Class['python'],
                        Class['package::nodejs'],
                        Class['package::python_dev']
                    ],
                }
            }
        }
}

We've also completed many enhancements to the frontend, which is difficult to formally list. To put things short, we've begun (not entirely) to heavily use redux between various reactjs components. Also, two new minimal reactjs pages have been created. One dedicated to allow users to save a generated prediction result, through a minimal webform, on /session/current-result, and another to list all previously saved /session/results. Lastly, we have to give thanks to @Vitao18, for converting every jsx file's createClass, and corresponding constructor, to the native javascript syntax.

Unit testing, has dramatically improved, in context of functionality, and resusability. We now have a single bash script, unit-tests, which contains all the necessary logic to build a sufficient testing environment, before tests are run against it. This allows the script to be used by our travis ci, along with the potential of running the test locally, even in our vagrant up build.

You may wonder what the heck the heck the bgc, and bgr datasets are doing in this milestone. To answer that, you'll have to wait until milestone-0.9 is finally merged to the master branch. Many thanks also go out to @protojas, for helping expediate, our future milestone-0.9, with the ensemble learning models.

Gunicorn, reverse proxy, sql index, enhanced frontend, unit tests with coverage

20 Jan 06:41
Compare
Choose a tag to compare

This release encompasses issues pertaining to milestone 0.5.

The flask application has integrated gunicorn processes, with nginx serving as a reverse proxy server. This new feature has significantly enhanced performance. For example, identical unit tests now run about 2x faster than the previous default flask microframework (i.e. without uwsgi). This can be seen by comparing the unit test benchmark, located on our new pytest.rst page, with the 0.4 release statement.

To tie together enhanced performance, various additional pytests have been added (or configured), along with the integration of coveralls. This particular tool is useful, since it indicates the degree, or percentage of lines of python code actually unit tested, within the entire application. A small visual representation has been added, to the main README.md, in the form of a badge, labelled as coverage.

Additionally, necessary database tables were given indexes, to help improve query performances. Also, the previous tbl_feature_value was split into two database tables, to better organize the storing of the supplied dataset(s):

  • tbl_svm_data
  • tbl_svr_data

On a similar topic of databases, necessary backend constructs were created in conjunction with the frontend react-redux, to store the userid of logged-in users, via the browsers internal sessionStorage. This allows the application the capability to validate a login attempt, and upon success, store the userid on the frontend, for a duration of a browser session. However, the login feature introduced scrypt (on the backend), a resource intensive algorithm, used to generate, and validate passwords. Because the implementation is resource expensive, we ensured our Vagrantfile allocated more than enough memory in the virtual machine.

Note: the login feature lays the foundation of many issues assigned to milestone 0.6.

With the integration of the login feature, the frontend required some adjustments, along with minor cosmetic touches. This involved the implementation of react-router, which generally enhances the user experience, by ensuring fixed urls, are associated with particular reactjs components.

Of course we attempted to enhance the general build, and security for our overall application. So, a new custom ubuntu 14.04 vagrant box was created, on the atlas repository. By creating our own vagrant base box, we were able to generate a corresponding MD5 checksum, which is validated against, on each vagrant build. If the vagrant box changes the slightest amount, the corresponding checksum would change, and the build would not succeed, since there would be a mismatch on the MD5 checksum.

Lastly, we decoupled some background information, from the main README.md into it's own dedicated project documentation/. In the future, this documentation/ will be generated into it's own dedicated website (possibly via sphinx), and serve as a primary hub, for visitors requiring particular how-to's.

SVR, kernels, confidence levels, more (automated) unit testing

15 Sep 12:20
Compare
Choose a tag to compare

This release encompasses issues pertaining to milestone 0.4.

Now users can perform support vector regression (SVR) analysis (with returned r^2), while having the flexibility to choose which kernel to employ, both on the webform, or programmatic api:

  • linear
  • rbf
  • polynomial
  • sigmoid

This flexibility is also made available for support vector machine (SVM) analysis, which now returns confidence level and decision function measures. Additionally, users can submit a url reference as their dataset, via the webform, or the programmatic api.

To correspond to the above changes, we've had to refactor our flask implementation to include app factory notation. This allows the travis ci to leverage necessary components to perform automated unit testing, when code is committed within the github repository, as noted within the official docker wiki page, under How to incorporate python unit testing.... Specifically, both the manual, and automated unit testing now covers the additional SVR case, which can be executed manually:

$ cd /path/to/machine-learning/
$ vagrant up
$ vagrant ssh
vagrant@vagrant-ubuntu-trusty-64:~$ (cd /vagrant/test && pytest manual)
================================================= test session starts ==================================================
platform linux2 -- Python 2.7.6, pytest-3.0.2, py-1.4.31, pluggy-0.3.1
rootdir: /vagrant/test/manual, inifile: pytest.ini
plugins: flask-0.10.0
collected 16 items

manual/programmatic_interface/dataset_url/pytest_svm_dataset_url.py ....
manual/programmatic_interface/dataset_url/pytest_svr_dataset_url.py ....
manual/programmatic_interface/file_upload/pytest_svm_file_upload.py ....
manual/programmatic_interface/file_upload/pytest_svr_file_upload.py ....

============================================== 16 passed in 58.60 seconds ==============================================

Also, some other changes have been implemented. For example, configurations stored within settings.py have been ported to several standardized yaml files (puppet also requires an intermediate hiera.yaml). This flexibility allows both the application, and provisioner (i.e. puppet) to utilize consistent constant application settings. Of course, we added yaml linting in the .travis.yml.

Additionally, we increased flexibility of the Vagrantfile, such that vagrant destroy removes all cached files (including from pytest), and added a python Logger class, allowing exceptions to be logged into desired custom log files. Specifically, this added feature is intended to make debugging easier, since the flask application currently runs as a background service, which means typical error messages will print in an unseen background. Finally, the travis ci button at the top of the README.md is premised only on the master branch.

Reactjs, upstart scripts, puppet modules

10 Apr 13:22
Compare
Choose a tag to compare

This release encompasses issues pertaining to milestone 0.3.

All jquery code (including ajax), have been refactored into a combination of reactjs, fetch, and pure javacript. Correspondingly, eslint has been implemented, with necessary plugins to lint jsx templates.

Also, existing upstart scripts were tightened, so only corresponding source file types are compiled. This prevents the compiler from producing an error, if an incorrect file type is placed within a corresponding directory. Finally, the upstart script responsible for compiling jsx templates into js (i.e. browserify), adds an entry of the compiled js filename, within .gitignore, if the corresponding entry did not already exist.

Lastly, two major changes occurred with the puppet implementation. First, all logic has been streamlined into modules, rather than a slew of manifests. Second, the previous shell script puppet_updater.sh, responsible for updating puppet, now implements the vagrant-puppet-install plugin.

Programmatic interface, linting, unit tests

24 Nov 14:06
Compare
Choose a tag to compare

This release encompasses issues pertaining to milestone 0.2.

Now, a programmatic-interface is provided along with the previous web-interface. On a build level, this release includes linting on all scripts, with the exception of puppet (erb) templates, and a handful of open source libraries, via .travis.yml. Bash scripts used for the webcompilers, were enhanced with syntax adjustments. These improvements guaranty source files are properly compiled to corresponding asset directories, during initial build, and successive source modification, when edited within the vagrant virtual machine.

Also, high level unit tests can be run:

$ /vagrant/test
$ sudo pip install pytest
$ py.test
============================= test session starts ==============================

platform linux2 -- Python 2.7.6, pytest-2.8.3, py-1.4.30, pluggy-0.3.1
rootdir: /vagrant/test, inifile: pytest.ini
collected 4 items

programmatic_interface/pytest_session.py ....

=========================== 4 passed in 0.43 seconds ===========================

Lastly, among various markdown enhancements, contribute.md has been created to integrate when issues are created, along with corresponding pull requests.

Note: unit test(s) will be incorporated into the travis-docker container build, on a future release.

Note: the remaining OS related problem associated with milestone 0.1 has been resolved.

Initial web interface

09 Sep 12:52
Compare
Choose a tag to compare

This release encompasses issues pertaining to milestone 0.1.

The web interface is currently limited on the clients browser, such that the clients OS sometimes cannot define csv, or json mime types for file upload(s). This means, only the xml file upload(s) can be guaranteed at the moment. However, the next release corresponding to milestone 0.2, will address this issue.