Skip to content

13caroline/imdb-datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Managing large data sets

Project 1 (Technical report)

This practical work consisted in the implementation and experimental evaluation of data storage and processing tasks, through the development of methods and classes that extend the Map Reduce and Avro + Parquet interfaces existing in the Apache Hadoop framework, allowing to answer the questions raised. In order to answer these questions, the public IMDB dataset was used.

Project 2 (Technical report)

This practical work consisted in the implementation and experimental evaluation of data storage and processing tasks, using the Spark library and the Hive Metastore service. In order to answer these questions, the public IMDB dataset was used.

Collaborators

Name
Bruno Veloso
Carolina Cunha
Diogo Tavares
Hugo Nogueira
Luís Abreu

University of Minho, Software Engineering (4th Year).