Skip to content

napsu/sparsePKL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sparsePKL - Sparse Pairwise Kernel Learning Software

sparsePKL is a pairwise kernel learning algorithm based on nonsmooth DC (difference of two convex functions) optimization. It learns sparse models for predicting in pairwise data (e.g. drug-target interactions) by using double regularization with both L1-norm and L0-pseudonorm. The nonsmooth DC optimization problem is solved using the limited memory bundle DC algorithm (LMB-DCA). In addition, sparsePKL uses pairwise Kronecker product kernels computed via generalized vec-trick to model interactions between drug and target features. The included loss-functions for the pairwise kernel problem are:

  • squared loss,
  • squared epsilon-insensitive loss,
  • epsilon-insensitive squared loss,
  • epsilon-insensitive absolute loss,
  • absolute loss.

Files included

  • sparsepkl.py

    • Main python file. Includes RLScore calls.
  • pkl_utility.py

    • Python utility programs.
  • sparsepkl.f95

    • Main Fortran file for sparsePKL software.
  • lmbdca.f95

    • LMB-DCA - the limited memory bundle DC algorithm.
  • solvedca.f95

    • Limited memory bundle method for solving convex DCA-type of problems.
  • objfun.f95

    • Computation of the function and subgradients values with different loss functions. Selection between loss functions is made in sparsepkl.py
  • initpkl.f95

    • Initialization of parameters and variables in sparsePKL and LMB-DCA. Includes modules:
      • initpkl - Initialization of parameters for pairwise learning.
      • initlmbdca - Initialization of LMB-DCA.
  • parameters.f95

    • Parameters for Fortran. Inludes modules:
      • r_precision - Precision for reals,
      • param - Parameters,
      • exe_time - Execution time.
  • subpro.f95

    • subprograms for LMB-DCA and LMBM.
  • data.py

    • Contains functions to load the example data sets. Data files are assumed to be in a folder "data" that is not part of the current folder.
    • Contains functions to create train-test-validation splits. Splits are created for every experimental setting S1-S4 (see the reference below).
  • Makefile

    • makefile: builds a shared library to allow sparsepkl (Fortran95 code) to be called from Python. Uses f2py, Python3.7, and requires a Fortran compiler (gfortran) to be installed.

Installation and usage

The source uses f2py and Python3.7, and requires a Fortran compiler (gfortran by default) and the RLScore to be installed.

To use the code:

  1. Select the data, loss function, and the desired sparsity level from sparsepkl.py file.
  2. Run Makefile (by typing "make") to build a shared library that allows sparsepkl (Fortran95 code) to be called from Python.
  3. Finally, just type "python3.7 sparsepkl.py".

The algorithm returns a csv-file with performance measures (C-index and MSE) computed in the test set under different experimental settings S1-S4. The best results are selected using a separate validation set and validated w.r.t. C-index. In addition, separate csv-files with predictions under different experimental settings S1-S4 are returned.

References:

Acknowledgements

The work was financially supported by the Research Council of Finland projects (Project No. #345804 and #345805) led by Antti Airola and Tapio Pahikkala.

About

Learning algorithm to sparse pairwise kernel learning

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published