Skip to content

mespana/datasets_ES

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

datasets_ES

I. 112 RESEÑAS DE ARTE DEL MUSEO THYSSEN-BORNEMISZA

A collection of nine sets of RAW textual data, in SPANISH language (8 "Recorridos Temáticos"), intended for RESEARCH & EDUCATIONAL purposes, specially TRAINING OF Text-mining, text-analytics technical skills: NLP, PCA, Corpus construction, Preprocessing of unstructured data (importing, encoding, and other commonly to raw textual data: cleaning, applying stopwords, stemming, data visualization, etc.).

CSV files-Themas:

  • 1: Wine,
  • 2: Love,
  • 3: Gastronomie,
  • 4: Sustainability,
  • 5: Moda fashion,
  • 6: Jewellery,
  • 7: Leisure time,
  • 8: Flowers.

Additionally, we binded all them to construct a common dataset with the additional COLUMN/Variable 'Tema'.

  • fichas_8-recorridosTematicos_MThyssen_raw.csv

The data were collected from the website Museo Thyssen-Bornemisza using 'R' and the libraries 'rvest', 'tidyverse'.

Data source: Museo Thyssen-Bornemisza.

https://www.museothyssen.org/visita/recorridos-tematicos [Retrieved: 2020-12-01]

https://www.rdocumentation.org/packages/rvest

https://purrr.tidyverse.org