Skip to content

Nov05/Genre-Fiction-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 

Repository files navigation

Genre-Fiction-Classification

A PyTorch project experimenting text classification

What is the idea?

Genre fictions become more popular nowadays, e.g. the most popular genres, namely Mystery, Thriller, Science Fiction, Romance, Fantasy. And each has many sub-genres. https://writerswrite.co.za/the-17-most-popular-genres-in-fiction-and-why-they-matter/

The project tries to classify short fictions into 5 major genres with character-level RNN models. https://arxiv.org/pdf/1509.01626.pdf If it works, I will try to classify fictions into 17 sub-genres. And hopefull I could find a way to classify Chinese fictions. The challenge would be there are more than 5000 commonly used Chinese characters and I would need to find a proper encoding method. https://arxiv.org/pdf/1708.02657.pdf

  • "Although this is a dataset in Chinese, we used pypinyin package combined with jieba Chinese segmentation system to produce Pinyin – a phonetic romanization of Chinese. The models for English can then be applied to this dataset without change. The fields used are title and content." https://arxiv.org/pdf/1509.01626.pdf

Also I am thinking of experimenting deep CNN to tell whether a fiction is written by the same author. There is a Chinese classical novel, Dream of the Red Chamber. It is commonly agreed that the former 80 chapters of it were written by the author, Cao Xueqin, however the latter 40 chapters were written by another author, Gao E. I was inspired by that.

About

A PyTorch project experimenting text classification

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published