Skip to content

sa-artea/VVG-Dataset-Creator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vincent van Gogh Dataset Creator

This is a project to train a Machine Learning model based in the Vinvent Van Gogh collection data. In here the script takes the scraped data feom the Webpage, and creates an alternative representation with the JSON data.

The data is divided in different tables taking into account different aspects of the gallery (i.e.: description, related work, search tags, image, among others). the missing data is completed with default values and then with each new alternative table (alternative representation for the ML model) the script integrate them into an star model with. This creates a unique data registerfor the future stages to process.

Originaly developed for the final project for the tittle of Digital humanities Msc. degree between 2019 - 2021.

The code was refactored and commented for the official and final presentation for the 2020/2021 project of the Uniandes Digital Humanities graduate program.


Project Structure

LICENSE: MIT Project license description.

README: Project general description.

PROJECT STRUCTURE:

  • *\Abstraction #TODO.

    • StarDataStruct-VVG-Large: #TODO.
    • VStarDataStruct-VVG-Large: #TODO.
  • *\Data is the folder containing the CSV files containing the gallery's scraped data.

    • *Source: #TODO
      • VanGoghGallery_large.csv Gallery's large file with 964 register of Vincent Van Gogh work.
      • VanGoghGallery_small.csv Gallery's small file with 61 register of Vincent Van Gogh work. Useful for functional tests.
    • *Star: #TODO
      • *Small: #TODO
      • *Large: #TODO

Data Structure

First, description of the CSV files inside the *\Data\Source folder goes as follows:

  • ID: element ID in the gallery and local folder name.
  • TITLE: tittle of the element in the gallery.
  • COLLECTION_URL: recovered element (paint) URL.
  • DOWNLOAD_URL: direct image URL/link for the image in the gallery.
  • HAS_PICTURE: boolean if there is a picture file in the local folder.
  • DESCRIPTION: JSON with the description of the element.
  • SEARCH_TAGS: JSON with the collection tags of the element.
  • OBJ_DATA: JSON with the museum object data of the element.
  • RELATED_WORKS: JSON with the related work text and URLs of the element.
  • IMG_DATA: numpy RGB matrix created from original image.

Second, description of the CSV files inside the *\Data\Source folder goes as follows:

  • *data-object-alt.csv: #TODO
  • *description-alt.csv: #TODO
  • *related-work-alt.csv: #TODO
  • *search-tags-alt.csv: #TODO
  • *img-data: #TODO
  • *VVG-Star-Gallery-Collection.csv: #TODO

Important Notes


About

Vincent Van Gogh's museum gallery preprocessing code to create an alternative representation for the scrapped dataset

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published