Skip to content

School project for the PSZ (Pronalazenje Skrivenog Znanja, en. Data Mining and Semantic Web) course at the School of Electrical Engineering, University of Belgrade.

Notifications You must be signed in to change notification settings

nikolapeja6/psz_proj

Repository files navigation

PSZ proj

This school project was created for the PSZ (Pronalaženje Skrivenog Znanja, en. Data Mining and Semantic Web) course, which is part of the Master studies at the School of Electrical Engineering, University of Belgrade.

The project consisted of crawling the discogs website in order to gather data for albums, artists and songs. After gathering it, the raw data was pre-processed and then stored in a SQLite database (the psz_database.db file in the data folder), which was the first task of the project. The remaining 4 tasks were centered around processing the data, visualizing it and running unsupervised learning algorithms (in my case only clustering algorithms).

The whole project statement (in Serbian) is located in the docs folder.

Requirements

In order to run the code, you need to have Python 3.x installed.

You will also need the following python packages:

  • requests
  • beautifulsoup4
  • fuzzywuzzy
  • python-Levenshtein
  • regex
  • matplotlib
  • numpy
  • cyrtranslit
  • scikit-learn
  • bokeh

Scraping

The data on the pages can be structured differently, which caused me some difficulties when I tried to scrape it. Below are some examples of the pages with different structures.

About

School project for the PSZ (Pronalazenje Skrivenog Znanja, en. Data Mining and Semantic Web) course at the School of Electrical Engineering, University of Belgrade.

Topics

Resources

Stars

Watchers

Forks