Skip to content

Bachelor Thesis: Classsification of Advertisements by means of Supervised Learning Methods

Notifications You must be signed in to change notification settings

paschok/Diploma

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Diploma

My bachelor work in Hochschule Merseburg written in Python, using Native Language Processing of ML

My bachelor thesis is: ***Classification of advertisements by means of supervised learning methods ***

Work process:

  • Learn about NLP
  • Scrap data
  • Try NLTK / spacy on datasets
  • Learn more about hclustering algorithms / Neural networks / Other NLP methods like Topic Modelling, W2W and so on
  • Code the Diploma
  • Write a Diploma itself = Thesis

My bachelor has two major branches:

  1. Data
    • Scrapping data from web using scapy, google useragent or proxies. I used to scrap amazon with proxie, but because of lagging and switching off decided to use useragent and time.sleep()
  2. ML
    • Code implemenation

Commits

One of the 2 branches above: subproject: message. Not including README.md.

Example:

Data: amazon: added new spider

README.md: update

Data comes from these websites:

  • obszone
    • had problems downloading american products for sale, so had to use a litle trick with url
  • geebo
  • adlandpro
  • pennysaverusa
  • hoobly
  • oodle
  • gumtree
  • letgo
  • salespider
  • ebay
  • amazon

Amazon data issues:

When entering departments on amazon you can scrap either 400 pages of common products of said department, or go into Feature Categories and scrap precise products.
For instance: 400 pages of automotive department OR Car care, car electronics and so on.