Skip to content

Comparison of Gradient Descent and Block Coordinate Gradient Descent methods through a semi supervised learning problem.

Notifications You must be signed in to change notification settings

Di40/Optimisation-BCGD-SemiSupervised

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gradient Descent and Block Coordinate Gradient Descent for Semi supervised learning

To read in detail about this project, please refer to the written report.

Table of contents

Project description

In this study, various optimization algorithms including standard Gradient Descent, Block Coordinate Gradient Descent (BCGD) with Randomized rule and Gauss Southwel rule, are used to address a semi-supervised learning problem. To accomplish this, a set of random points and labels are initially generated, with most of the labels being later discarded. The accuracy of data point classification and the loss function are computed for each algorithm. Additionally, an analysis of the loss function, number of iterations and CPU time is conducted. Finally, the effectiveness of the algorithms is evaluated using a real-world dataset.

Generated clusters of points with labeled and unlabeled points:

image

For each of the algorithms, different learning rate strategies are employed, including: constant learning rate, 1/Lipschitz constant, 1/Li and Armijo Rule.

Results

We tested the three chosen algorithms on a Credit card defaulter data set that we obtained from Kaggle. We decided to use the features balance and income. Because of resources limitations, we performed the tests on a subset of points, chosen randomly. The original dataset has 5776 rows, but as per the training we decided to use 1000 of them. During training we used 10% as the ratio of the labeled points, but here we proved that even having a smaller portion of labeled points works well. In our case, we chose 3% as the number of labeled points. From the comparison between the three chosen models, we drew the following conclusions:

  • Both the gradient descent and BCGD with Gauss Southwell rule managed to label all of the points correctly (100% accuracy).
  • Randomized BCGD yielded 82% accuracy.
  • GS BCGD needed most time to complete, which was expected. However Randomized BCGD took more time than the Gradient Descent, which was unexpected. One possible explanation would be that the calculation of the Hessian (for obtaining Li) is more costly than computing the full gradient.

Results for the credit card defaulter dataset:

image

image

Code structure

  • optimisation.py contains the classes and methods for each of the three algorithms.
  • utils.py contains helper functions needed for generating artificial dataset, loading the real dataset and for plotting.
  • test.py is used for testing different algorithms and learning rates.

Team members:

About

Comparison of Gradient Descent and Block Coordinate Gradient Descent methods through a semi supervised learning problem.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages