Text2Class

Build multi-class text classifiers using state-of-the-art pre-trained contextualized language models, e.g. BERT. Only a few hundred samples per class are necessary to get started.

Background

This project is based on our study: Transfer Learning Robustness in Multi-Class Categorization by Fine-Tuning Pre-Trained Contextualized Language Models.

Citation

To cite this work, use the following BibTeX citation.

@article{transfer2019multiclass,
  title={Transfer Learning Robustness in Multi-Class Categorization by Fine-Tuning Pre-Trained Contextualized Language Models},
  author={Liu, Xinyi and Wangperawong, Artit},
  journal={arXiv preprint arXiv:1909.03564},
  year={2019}
}

Installation

pip install text2class

Example usage

Create a dataframe with two columns, such as 'text' and 'label'. No text pre-processing is necessary.

import pandas as pd
from text2class.text_classifier import TextClassifier

df = pd.read_csv("data.csv")

train = df.sample(frac=0.9,random_state=200)
test = df.drop(train.index)

cls = TextClassifier(
	num_labels=3,
	data_column="text",
	label_column="label",
	max_seq_length=128
)

cls.fit(train)

predictions = cls.predict(test["text"])

Advanced usage

Model type

The default model is an uncased Bidirectional Encoder Representations from Transformers (BERT) consisting of 12 transformer layers, 12 self-attention heads per layer, and a hidden size of 768. Below are all models currently supported that you can specify with hub_module_handle. We expect that more will be added in the future. For more information, see BERT's GitHub.

https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1
https://tfhub.dev/google/bert_uncased_L-24_H-1024_A-16/1
https://tfhub.dev/google/bert_cased_L-12_H-768_A-12/1
https://tfhub.dev/google/bert_cased_L-24_H-1024_A-16/1
https://tfhub.dev/google/bert_chinese_L-12_H-768_A-12/1
https://tfhub.dev/google/bert_multi_cased_L-12_H-768_A-12/1

cls = TextClassifier(
	num_labels=3,
	data_column="text",
	label_column="label",
	max_seq_length=128,
	hub_module_handle="https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1"
)

Contributing

Text2Class is an open-source project founded and maintained to better serve the machine learning and data science community. Please feel free to submit pull requests to contribute to the project. By participating, you are expected to adhere to Text2Class's code of conduct.

Questions?

For questions or help using Text2Class, please submit a GitHub issue.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
text2class		text2class
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text2class

text2class

.gitignore

.gitignore

CODE_OF_CONDUCT.md

CODE_OF_CONDUCT.md

LICENSE

LICENSE

README.md

README.md

setup.py

setup.py

Repository files navigation

Text2Class

Background

Citation

Installation

Example usage

Create a dataframe with two columns, such as 'text' and 'label'. No text pre-processing is necessary.

Advanced usage

Model type

Contributing

Questions?

About

Releases

Packages

Contributors 3

Languages

License

artitw/text2class

Folders and files

Latest commit

History

Repository files navigation

Text2Class

Background

Citation

Installation

Example usage

Create a dataframe with two columns, such as 'text' and 'label'. No text pre-processing is necessary.

Advanced usage

Model type

Contributing

Questions?

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages