simplified-ECON

simplified-ECON is a simplified version of the ECON pipeline introduced in the following paper: Concept Mining via Embedding. The main simplification lies in the candidate generation stage of the pipeline, for which simplified-ECON only uses AutoPhrase, while ECON uses multiple candidate generation techniques. This repository also includes a comparison between AutoPhrase, ECON, and PRDR Phrase Detection results on the same input corpus (10000 arXiv computer science paper abstracts).

Pipeline

Candidate Generation
Superspan Sequence Generation
Embedding Construction
Feature Generation
Classifier Training
Concept Recognition

The code for each stage in the pipeline builds off of or is copied from the original ECON pipeline implementation.

The autophrase_comparison.ipynb notebook is a novel contribution, and can be used to compare the extracted concept results of the AutoPhrase and ECON pipelines.

Additionally, the method_evaluation.ipynb notebook is a novel contribution, and can be used to evaluate the performance of the AutoPhrase, PRDR Phrase Detection, and ECON pipelines.

Usage

Environment

Navigate to the simplified-ECON directory and setup a new conda environment using the following commands.

conda create -n se python=3.8.5 -y
conda activate se
conda install ipykernel -y
ipython kernel install --user --name=se

Clone the AutoPhrase repository. In the candidate_generation.ipynb and feature_generation.ipynb notebooks, set AUTOPHRASE_PATH to the path of the cloned AutoPhrase repository.

Dependencies

Install the dependencies using the following command.

pip install -r requirements.txt

Execution

To run the pipeline, run the cells of the Jupyter notebooks in the order of the pipeline steps listed above, using jupyter lab, ensuring the se kernel is selected.

To compare the results of the AutoPhrase and ECON pipelines, run the cells of the autophrase_comparison.ipynb notebook.

To evaluate the performance of the AutoPhrase, PRDR Phrase Detection, and ECON pipelines, run the cells of the method_evaluation.ipynb notebook.

Authors

Rishi Masand

References

Keqian Li, Hanwen Zha, Yu Su, and Xifeng Yan, "Concept Mining via Embedding", 2018.

Jingbo Shang, Jialu Liu, Meng Jiang, Xiang Ren, Clare R Voss, and Jiawei Han, "Automated Phrase Mining from Massive Text Corpora", 2017.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
common		common
data		data
.gitignore		.gitignore
README.md		README.md
autophrase_comparison.ipynb		autophrase_comparison.ipynb
candidate_generation.ipynb		candidate_generation.ipynb
classifier_training.ipynb		classifier_training.ipynb
concept_recognition.ipynb		concept_recognition.ipynb
embedding_construction.ipynb		embedding_construction.ipynb
feature_generation.ipynb		feature_generation.ipynb
method_evaluation.ipynb		method_evaluation.ipynb
requirements.txt		requirements.txt
superspan_sequence_generation.ipynb		superspan_sequence_generation.ipynb

darthbatman/simplified-ECON

Folders and files

Latest commit

History

Repository files navigation

simplified-ECON

Pipeline

Usage

Environment

Dependencies

Execution

Authors

References

About

Topics

Resources

Stars

Watchers

Forks

Languages