Using Scala for big data computations for basic tasks
-
Updated
Nov 16, 2018 - Scala
Using Scala for big data computations for basic tasks
Standard Hadoop MapReduce Tasks using Java
Simple CSV parser for huge volumes of data with the use of the library Pandas for Python for getting specific columns of a CSV file and putting the extracted data into one or more files (each column in a separated file or all of them in the same output) in a short amount of time.
Solved tasks of the master's degree courses of speciality "Algorithms and Systems for Big Data Processing".
Data modeling with Cassandra, building Data Warehouse using Redshift and creation of Data Lake using Spark and Airflow
Implementation of algorithms for big data using python, numpy, pandas.
Collection of homework (mostly Spark-based) from the course "Big Data Computing" - University of Padua.
A Docker Compose Template to deploy Airflow with sync from a remote repository
Introduction to Spark Batch processing.
Github Repository for a versatile usable Big Data infrastructure (AVUBDI)
Github Repository for a versatile usable Big Data infrastructure (AVUBDI)
CURE clustering algorithm implementation in Scala with Spark
Project using Python, Hive and MapReduce to compare various techniques to find the top K words in a very large file i.e. different techniques to process Big Data.
A pipeline that consumes twitter data to extract meaningful insights about a variety of topics using the following technologies: twitter API, Kafka, MongoDB, and Tableau.
This code creates a Kinesis Firehose in AWS to send CloudWatch log data to S3.
datasets-toolbox are some scripts usefull to generate, transfom and valid large dataset files, not openable with editor because too large. datasets-toolbox provide also a ping script.
GCP_Data_Enginner
A list of awesome big data testing frameworks, resources and other awesomeness.
Analysis, organization and querying of large genomic datasets using C++, Monsoon and various data structures.
Add a description, image, and links to the big-data-processing topic page so that developers can more easily learn about it.
To associate your repository with the big-data-processing topic, visit your repo's landing page and select "manage topics."