Distiller-CORE library

Distiller is a framework to extract and infer knowledge from texts. Distiller takes its roots from DIKpE [1] and further evolutions [4], but it is improved with multilanguage support [5], entity linking with [2] and concept inference. By now, Distiller supports only keyphrase extraction in Italian and English; we plan to include support for keyphrase extraction in other languages.

The default Distiller pipeline works on keyphrase extraction; anyways, since the framework is built with extensibility in mind, it's possible to extend it and write pipelines for any high-level NLP task. As an example, we include a simple Sentiment Analysis module, based on M.L. Jokers' Syuzhet library [3].

Architecture

The architecture of the framework and its usage is described in "Introducing Distiller: a unifying framework for Knowledge Extraction" , 1st AI*IA Workshop on Intelligent Techniques At Libraries and Archives, 2015 (download link).

How to build and use the Distiller

Inside the Wiki we have some guides on downloading, building and using Distiller.

Distiller, by now, is distibuted in source code form only. You can open it in your favourite IDE or compile it yourself, since it's just a simple Maven project). When we'll reach a stable enough codebase, we'll also publish Distiller to Maven Central or another Maven repository, to make it easier to use it in your projects.

Please note that to use some features of Distiller you should install also R.

Acknowledgements

The "dirty work" in the library is handled mainly by three libraries:

The Italian language implementation of the Distiller is made possible by:

Andrea Ciapetti's OpenNLP models for sentence splitting, tokenization and PoS tagging;
Morph-it! for lemmatization.

Citing

If you use Distiller, please cite this paper:

@inproceedings{distillerintroducing,
  title={Introducing Distiller: a unifying framework for Knowledge Extraction},
  author={Basaldella, Marco and De Nart, Dario and Tasso, Carlo},
  year={2015},
  booktitle={Proceedings of 1st AI*IA Workshop on Intelligent Techniques At Libraries and Archives co-located with XIV Conference of the Italian Association for Artificial Intelligence (AI*IA 2015)},
  organization={Associazione Italiana per l'Intelligenza Artificiale},  
  year={2015}
}

License

This program is free software; you can redistribuite it and/or modify it under the terms of the GNU/General Pubblic License as published the Free software Foundation; either version 2 of the License, or (at your opinion) any later version.

References

[1] Pudota, Nirmala, et al. "Automatic keyphrase extraction and ontology mining for content‐based tag recommendation." International Journal of Intelligent Systems 25.12 (2010): 1158-1186.

[2] Paolo Ferragina, Ugo Scaiella. "Fast and Accurate Annotation of Short Texts with Wikipedia Pages". IEEE Software 29(1): 70-75 (2012).

[3] https://github.com/mjockers/syuzhet

[4] De Nart, Dario, and Carlo Tasso. "A domain independent double layered approach to keyphrase generation." WEBIST 2014-Proceedings of the 10th International Conference on Web Information Systems and Technologies. 2014.

[5] Dante Degl'Innocenti, Dario De Nart, Carlo Tasso. "A New Multi-lingual Knowledge-base Approach to Keyphrase Extraction for the Italian Language". KDIR 2014: 78-85

Name		Name	Last commit message	Last commit date
Latest commit History 219 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bean.xml		bean.xml
dependency-reduced-pom.xml		dependency-reduced-pom.xml
licenseheader.txt		licenseheader.txt
nb-configuration.xml		nb-configuration.xml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

bean.xml

bean.xml

dependency-reduced-pom.xml

dependency-reduced-pom.xml

licenseheader.txt

licenseheader.txt

nb-configuration.xml

nb-configuration.xml

pom.xml

pom.xml

Repository files navigation

Distiller-CORE library

Architecture

How to build and use the Distiller

Acknowledgements

Citing

License

References

About

Releases

Packages

Contributors 4

Languages

License

ailab-uniud/distiller-CORE

Folders and files

Latest commit

History

Repository files navigation

Distiller-CORE library

Architecture

How to build and use the Distiller

Acknowledgements

Citing

License

References

About

Resources

License

Stars

Watchers

Forks

Languages