Skip to content

niklasben/BAThesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BAThesis

Files to graduate as a Bachelor of Arts in Information Management from the University of Applied Sciences and Arts in Hanover.

Crawler

I used Scrapy to crawl the Websites for this Project.

Scripts

This Folder contains the used German Stopword List as a txt-File, the STW Thesaurus for Economics as rdf-File and two Python Scripts.

Files_Crawled

Storage Folder for the crawled Website-Files with some re-structuring to make them easier readable by human beings.

Files_Machine_Learning

Folder where the crawled Websites are stored after the processing via processing_crawled_to_pref.py. One Folder for the whole Text, excluding German Stopwords, and replaced skos:altLabel with skos:prefLabel and one Folder only for the stored skos:prefLabel.

Releases

No releases published

Packages

No packages published