Managing large data sets

Project 1 (Technical report)

This practical work consisted in the implementation and experimental evaluation of data storage and processing tasks, through the development of methods and classes that extend the Map Reduce and Avro + Parquet interfaces existing in the Apache Hadoop framework, allowing to answer the questions raised. In order to answer these questions, the public IMDB dataset was used.

Project 2 (Technical report)

This practical work consisted in the implementation and experimental evaluation of data storage and processing tasks, using the Spark library and the Hive Metastore service. In order to answer these questions, the public IMDB dataset was used.

Collaborators

Name
Bruno Veloso
Carolina Cunha
Diogo Tavares
Hugo Nogueira
Luís Abreu

University of Minho, Software Engineering (4th Year).

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Projetos		Projetos
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Projetos

Projetos

README.md

README.md

Repository files navigation

Managing large data sets

Project 1 (Technical report)

Project 2 (Technical report)

Collaborators

About

Languages

13caroline/imdb-datasets

Folders and files

Latest commit

History

Projetos

Projetos

README.md

README.md

Repository files navigation

Managing large data sets

Project 1 (Technical report)

Project 2 (Technical report)

Collaborators

About

Topics

Resources

Stars

Watchers

Forks

Languages