Files to graduate as a Bachelor of Arts in Information Management from the University of Applied Sciences and Arts in Hanover.
I used Scrapy to crawl the Websites for this Project.
This Folder contains the used German Stopword List as a txt-File, the STW Thesaurus for Economics as rdf-File and two Python Scripts.
Storage Folder for the crawled Website-Files with some re-structuring to make them easier readable by human beings.
Folder where the crawled Websites are stored after the processing via processing_crawled_to_pref.py. One Folder for the whole Text, excluding German Stopwords, and replaced skos:altLabel with skos:prefLabel and one Folder only for the stored skos:prefLabel.