Skip to content

C++ program file_scraper with classes response_getter, parser and downloader for getting HTML5 from URL, parsing it and downloading all file referenced in that HTML with hash calculated.

Notifications You must be signed in to change notification settings

salda/file_scraper

Repository files navigation

file_scraper.out is the executable. 
My notes are in notes.txt. 

There are 2 classes making interfaces usable in some other program: 
response_getter and parser and there should be actually 3, the last downloader, 
but I am sending it like that, because following days I will not have time to finish it next 30 hours. 

My research where to search for files referenced from HTML is in places_with_references_in _html.txt. 
Despite receiving following Links in HTTP header, the assignment is clear in downloading only "files the page references" 
and from Wikipedia: "The web page usually means what is visible, but the term may also refer to a computer file, usually written in HTML or a comparable markup language." 
so information from protocol does not qualify. 
Link: <https://www.meetangee.com/wp-json/>; rel="https://api.w.org/"
Link: <https://www.meetangee.com/>; rel=shortlink

I really want to make the program better, but I think I will be too late after that, this is surely my biggest homework. 

About

C++ program file_scraper with classes response_getter, parser and downloader for getting HTML5 from URL, parsing it and downloading all file referenced in that HTML with hash calculated.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published