Skip to content

Latest commit

 

History

History
13 lines (8 loc) · 1.44 KB

README.md

File metadata and controls

13 lines (8 loc) · 1.44 KB

Genre-Fiction-Classification

A PyTorch project experimenting text classification

What is the idea?

Genre fictions become more popular nowadays, e.g. the most popular genres, namely Mystery, Thriller, Science Fiction, Romance, Fantasy. And each has many sub-genres. https://writerswrite.co.za/the-17-most-popular-genres-in-fiction-and-why-they-matter/

The project tries to classify short fictions into 5 major genres with character-level RNN models. https://arxiv.org/pdf/1509.01626.pdf If it works, I will try to classify fictions into 17 sub-genres. And hopefull I could find a way to classify Chinese fictions. The challenge would be there are more than 5000 commonly used Chinese characters and I would need to find a proper encoding method. https://arxiv.org/pdf/1708.02657.pdf

  • "Although this is a dataset in Chinese, we used pypinyin package combined with jieba Chinese segmentation system to produce Pinyin – a phonetic romanization of Chinese. The models for English can then be applied to this dataset without change. The fields used are title and content." https://arxiv.org/pdf/1509.01626.pdf

Also I am thinking of experimenting deep CNN to tell whether a fiction is written by the same author. There is a Chinese classical novel, Dream of the Red Chamber. It is commonly agreed that the former 80 chapters of it were written by the author, Cao Xueqin, however the latter 40 chapters were written by another author, Gao E. I was inspired by that.