Skip to content

SAG-KeLP/kelp-additional-algorithms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kelp_additional_algorithms

KeLP is the Kernel-based Learning Platform (Filice '15) developed in the Semantic Analytics Group of the University of Roma Tor Vergata.

This project contains several learning algorithms extending the set of algorithms provided in the kelp-core project, e.g. the C-Support Vector Machine or ν-Support Vector Machine learning algorithms. In particular, advanced learning algorithms for classification and regression can be found in this package. The algorithms are grouped in:

  • Batch Learning algorithms;
  • Online Learning algorithms.

In Batch Learning, the complete training dataset is supposed to be entirely available during the learning phase. In Online Learning individual examples are exploited one at a time to incrementally acquire the model.

Moreover this project defines LinearizationFunction that can be used to approximate kernel spaces so projecting (potentially structured) examples into linear representations, i.e. vectors.

Finally, this project implements learning method for the estiation of classification functions operating over sequences of examples: in this Sequential Labeling Paradigm the classification of an example from a sequence depends on the classes assigned to the previous one(s).

Examples about the usage of the following algorithms, please refer to the project kelp-full.

Batch Learning algorithms

The following batch learning algorithms are implemented:

CLASSIFICATION ALGORITHMS:

  • LibLinearLearningAlgorithm: it is the KeLP implementation of LibLinear (Fan '08), a linear learning algorithm for binary classification.
  • PegasosLearningAlgorithm: the KeLP implementation of Primal Estimated sub-GrAdient SOlver (PEGASOS) for SVM (Singer '07). It is a linear learning algorithm for binary classification.
  • DCDLearningAlgorithm: the KeLP implementation of Dual Coordinate Descent (DCD) training algorithm for a Linear L1 or L2 Support Vector Machine for binary classification (Hsieh '08).

REGRESSION ALGORITHMS:

  • LibLinearRegression: It implements the linear regression learning algorithm descrived in (Fan '08).

Online Learning Algorithms

The following online learning algorithms are implemented:

CLASSIFICATION ALGORITHMS:

  • LinearPassiveAggressiveClassification: linear version of the Passive Aggressive learning algorithm for classification (Crammer '06)
  • KernelizedPassiveAggressiveClassification: kernel-based version of the Passive Aggressive learning algorithm for classification (Crammer '06)
  • LinearPerceptron: linear version of the Perceptron learning algorithm for classification (Rosenblatt '57)
  • KernelizedPerceptron: kernel-based version of the Perceptron learning algorithm for classification (Rosenblatt '57)
  • RandomizedBudgetPerceptron: an extension of the Randomized Budget Perceptron proposed in (Cavallanti '06)
  • BudgetedPassiveAggressiveClassification: budgeted learning algorithm proposed in (Wang '10)
  • SoftConfidenceWeightedClassification: an online linear learning algorithms proposed in (Wang '12)

REGRESSION ALGORITHMS:

  • LinearPassiveAggressiveRegression: linear version of the Passive Aggressive learning algorithm for regression (Crammer '06)
  • KernelizedPassiveAggressiveRegression: kernel-based version of the Passive Aggressive learning algorithm for regression (Crammer '06)

META-LEARNING ALGORITHMS:

  • MultiEpochLearning: a meta algorithm for performing multiple iterations on a training data
  • Stoptron: an extension of the Stoptron algorithm proposed in (Orabona '08)

Linearization Functions

The following linearization functions have been implemented:

  • NystromMethod: a linearization function based on the Nystrom Method (Williams '01) on the notion of approximation of the Gram Matrix underlying a kernel function. The application of the Nystrom method (Croce '16).

Learning Algorithm over Sequences

KeLP also implements a sequential labeling paradigm. Given sequences of items (each implemented as an example and associated to one Label) this classes allow to apply a generic Learning Algorithm to use the "history" of each item in the sequence in order to improve the classification quality. In other words, the classification of each example does not depend only its representation, but it also depend on its "history", in terms of the classed assigned to the preceding examples.

In particular:

  • SequenceClassificationLinearLearningAlgorithm: This class should be used when a linear learning algorithm is used, thus directly operating in the representation space.

  • SequenceClassificationKernelBasedLearningAlgorithm: This class should be used when a kernel-based learning algorithm is used, thus directly operating in the implicit space underlying a kernel function.

=============

##Including KeLP in your project

If you want to include this set of learning algorithms, you can easily include it in your Maven project adding the following repositories to your pom file:

<repositories>
	<repository>
			<id>kelp_repo_snap</id>
			<name>KeLP Snapshots repository</name>
			<releases>
				<enabled>false</enabled>
				<updatePolicy>always</updatePolicy>
				<checksumPolicy>warn</checksumPolicy>
			</releases>
			<snapshots>
				<enabled>true</enabled>
				<updatePolicy>always</updatePolicy>
				<checksumPolicy>fail</checksumPolicy>
			</snapshots>
			<url>http://sag.art.uniroma2.it:8081/artifactory/kelp-snapshot/</url>
		</repository>
		<repository>
			<id>kelp_repo_release</id>
			<name>KeLP Stable repository</name>
			<releases>
				<enabled>true</enabled>
				<updatePolicy>always</updatePolicy>
				<checksumPolicy>warn</checksumPolicy>
			</releases>
			<snapshots>
				<enabled>false</enabled>
				<updatePolicy>always</updatePolicy>
				<checksumPolicy>fail</checksumPolicy>
			</snapshots>
			<url>http://sag.art.uniroma2.it:8081/artifactory/kelp-release/</url>
		</repository>
	</repositories>

Then, the Maven dependency for the whole KeLP package:

<dependency>
    <groupId>it.uniroma2.sag.kelp</groupId>
    <artifactId>kelp-additional-algorithms</artifactId>
    <version>2.1.0</version>
</dependency>

Alternatively, thanks to the modularity of KeLP, you can include one of the following modules:

  • kelp-core: it contains the core interfaces and classes for algorithms, kernels and representations. It contains also the base set of classifiers, regressors and clustering algorithms. It serves as the main module to develop new kernel functions or new algorithms;

  • kelp-additional-kernels: it contains additional kernel functions, such as the Tree Kernels or the Graph Kernels;

  • kelp-full: it is a complete package of KeLP that contains the entire set of existing modules, i.e. additional kernel functions and algorithms.

============= How to cite KeLP

If you find KeLP usefull in your researches, please cite the following paper:

@InProceedings{filice-EtAl:2015:ACL-IJCNLP-2015-System-Demonstrations,
	author = {Filice, Simone and Castellucci, Giuseppe and Croce, Danilo and Basili, Roberto},
	title = {KeLP: a Kernel-based Learning Platform for Natural Language Processing},
	booktitle = {Proceedings of ACL-IJCNLP 2015 System Demonstrations},
	month = {July},
	year = {2015},
	address = {Beijing, China},
	publisher = {Association for Computational Linguistics and The Asian Federation of Natural Language Processing},
	pages = {19--24},
	url = {http://www.aclweb.org/anthology/P15-4004}
}

=============

References

(Rosenblatt '57) F. Rosenblatt. The Perceptron - a perceiving and recognizing automaton. Report 85-460-1, Cornell Aeronautical Laboratory (1957)

(Williams '01) Williams, C.K.I., Seeger, M.: Using the nystrom method to speed up kernel machines. In: Proceedings of NIPS 2000 (2001)

(Crammer '06) K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz and Y. Singer. Online Passive-Aggressive Algorithms. Journal of Machine Learning Research (2006)

(Cavallanti '06) G. Cavallanti, N. Cesa-Bianchi, C. Gentile. Tracking the best hyperplane with a simple budget Perceptron. In proc. of the 19-th annual conference on Computational Learning Theory. (2006)

(Singer '07) Y. Singer and N. Srebro. Pegasos: Primal estimated sub-gradient solver for SVM. In Proceeding of ICML 2007

(Fan '08) Ron-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang and Chih-Jen Lin. LIBLINEAR: A Library for Large Linear Classification. Journal of Machine Learning Research 9(2008), 1871-1874. Original code available at LibLinear. The original JAVA porting available at LibLinear-java

(Orabona '08) Francesco Orabona, Joseph Keshet, and Barbara Caputo. The projectron: a bounded kernel-based perceptron. In Int. Conf. on Machine Learning (2008)

(Hsieh '08) Hsieh, C.-J., Chang, K.-W., Lin, C.-J., Keerthi, S. S.; Sundararajan, S. A Dual Coordinate Descent Method for Large-scale Linear SVM. In Proceedings of the 25th international conference on Machine learning - ICML '08 (pp. 408-415). New York, New York, USA: ACM Press.

(Wang '10) Zhuang Wang and Slobodan Vucetic. Online Passive-Aggressive Algorithms on a Budget. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS) (2010)

(Filice '15) Simone Filice, Giuseppe Castellucci, Danilo Croce, Roberto Basili: Kelp: a kernel-based learning platform for natural language processing. In: Proceedings of ACL: System Demonstrations. Beijing, China (July 2015)

(Croce '16) Danilo Croce and Roberto Basili. Large-scale Kernel-based Language Learning through the Ensemble Nystrom methods. In Proceedings of ECIR 2016. Padova, Italy, 2016

(Want '12) Wang, J., Zhao, P., Hoi, S.C.: Exact soft confidence-weighted learning. In: Proceedings of the ICML 2012. ACM, New York, NY, USA (2012)

============

Usefull Links

KeLP site: http://www.kelp-ml.org

SAG site: http://sag.art.uniroma2.it

Source code hosted at GitHub: https://github.com/SAG-KeLP