Sparse Structure Learning via Graph Neural Networks for inductive document classification

Figure 1. The architecture of TextSSL.

About data

We use the same benchmark datasets that are used in Yao, Mao, and Luo 2019, where we follow the same train/test splits and data preprocessing for MR, Ohsumed and 20NG datasets as Kim 2014; Yao, Mao, and Luo 2019. Thanks for their work.

For R8 and R52 datasets, they are only provided by a preprocessed version that lack punctuations and do not have explicit sample names. Since we use documents with sentence segmentation information to construct graph, we re-extract the data from original Reuters-21578 dataset.

You can download the dataset here:

re-extract R8 and R52 datasets.

python re-extract_data/mk_R8_R52.py --name R8

remove words.
```
python remove_words.py --name R8
```

About path

To run the code, you should change Your_path=/data/project/yinhuapark/ssl/ to your own path.

Make graph dataset

create co-occurrence pairs of each documents.

python ssl_make_graphs/create_cooc_document.py --name R8

construct graphs of each documents in InMemoryDatset.

python ssl_make_graphs/PygDocsGraphDataset.py --name R8

Train

python ssl_graphmodels/pyg_models/train_docs.py --name R8

Reference

If you find our paper and repo useful, please cite our paper:

@inproceedings{piao2022sparse,
  title={Sparse Structure Learning via Graph Neural Networks for Inductive Document Classification},
  author={Piao, Yinhua and Lee, Sangseon and Lee, Dohoon and Kim, Sun},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={36},
  number={10},
  pages={11165--11173},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
re-extract_data		re-extract_data
ssl_graphmodels		ssl_graphmodels
ssl_make_graphs		ssl_make_graphs
LICENSE		LICENSE
README.md		README.md
TextSSL.png		TextSSL.png
__init__.py		__init__.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

re-extract_data

re-extract_data

ssl_graphmodels

ssl_graphmodels

ssl_make_graphs

ssl_make_graphs

LICENSE

LICENSE

README.md

README.md

TextSSL.png

TextSSL.png

init.py

init.py

Repository files navigation

Sparse Structure Learning via Graph Neural Networks for inductive document classification

About data

About path

Make graph dataset

Train

Reference

The readme is inspired by GSAT.

About

Releases

Packages

Languages

License

qkrdmsghk/TextSSL

Folders and files

Latest commit

History

Repository files navigation

Sparse Structure Learning via Graph Neural Networks for inductive document classification

About data

About path

Make graph dataset

Train

Reference

The readme is inspired by GSAT.

About

Topics

Resources

License

Stars

Watchers

Forks

Languages