Skip to content

ahmednabil950/TV-Series-Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

TV-Series-Classifier

Arabic TV Series Detection App trained on TV web pages.



Web Demo
Fig.1 - WEB DEMO

Abstract

This repository contains experiment about a project i was doing about TV Series Classification from any given web page.
The target web page the model can recognize is Ra7eem TV Series related web pages


Features Engineering

In my experiment through building models before in topic modeling problems i prefer TfIdf features vectorization to CountVectorizer (normal BOW) due to it weights the important features which is relevant to the context of the web pages that is used to build the model. So better performance is obtained.
There is many features that is not very important like shown in the next captions and it can be treated like stopwords so better features realization is obtained.


Classifier

Web Demo
Fig.2 - Training performance
Web Demo
Fig.3 - test performance



Examples you can test with and were not in the training set:
https://mzarita.tv/watch.php?vid=dfa569e72
https://moviz4u.tv/
https://www.elcinema.com/work/2048748
http://www.masrawy.com/ramadan/Tag/797735/%D8%B1%D8%AD%D9%8A%D9%85
http://www.masrawy.com/ramadan/drama-news/details/2018/6/16/1376870/%D8%A8%D8%A7%D9%84%D9%81%D9%8A%D8%AF%D9%8A%D9%88-%D9%83%D9%88%D8%A7%D9%84%D9%8A%D8%B3-%D9%82%D8%AA%D9%84-%D8%B1%D8%AD%D9%8A%D9%85-%D9%81%D9%8A-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-%D8%A7%D9%84%D8%A3%D8%AE%D9%8A%D8%B1%D8%A9-#keyword



Data

Data fetching

The data is obtained through Scrapper library from any web page. The used scrapper i preferred in my experiment is BeautifulSoup4.

Missing data

There is many approaches to handle the missing data, The good practise recommend not to ignore the missing rows in the dataset since it will affect the learned parameters through the optimization. In my case the missing data was 1.12 % of the whole dataset so for sake of simplicity i dropped them. Otherwise, other cases could be handled like filling random numbers that represents features in those missing entries.


USAGE

  • navigate to live App to test.