big-data-processing

Star

Here are 65 public repositories matching this topic...

Bennyhwanggggg / Basic-Scala-Computations

Star

Using Scala for big data computations for basic tasks

scala big-data spark scala-graph graphx big-data-analytics spark-scala big-data-processing

Updated Nov 16, 2018
Scala

Bennyhwanggggg / Basic-Hadoop-MapReduce

Star

Standard Hadoop MapReduce Tasks using Java

java big-data hadoop hadoop-mapreduce big-data-analytics big-data-processing

Updated Nov 16, 2018
Java

levindoneto / pandas-simple-csv-parser

Star

Simple CSV parser for huge volumes of data with the use of the library Pandas for Python for getting specific columns of a CSV file and putting the extracted data into one or more files (each column in a separated file or all of them in the same output) in a short amount of time.

parser csv data-manipulation pandas-dataframes conda-environment pandas-datareader big-data-processing

Updated Jan 7, 2019
Python

mikhail-kukuyev / Masters-Degree-Courses

Star

Solved tasks of the master's degree courses of speciality "Algorithms and Systems for Big Data Processing".

machine-learning information-retrieval highload mpi neural-networks external-memory university-course python-course randomized-algorithms cache-optimization page-rank big-data-processing

Updated Jan 13, 2019
Python

bdnf / BigData-Engineering-Projects

Star

Data modeling with Cassandra, building Data Warehouse using Redshift and creation of Data Lake using Spark and Airflow

airflow spark cassandra data-warehouse data-lake redshift big-data-analytics big-data-processing

Updated Feb 28, 2020
Jupyter Notebook

kochlisGit / Big-Data-Algorithms

Star

Implementation of algorithms for big data using python, numpy, pandas.

python bloom-filter lsh streams frequent-itemset-mining pcy frequent-itemsets stream-mining shingling big-data-processing lsh-algorithm min-hasing similar-items a-priori multistage-pcy multihash-pcy

Updated Apr 27, 2020
Python

lucamoroz / BigDataComputing-UniPD

Star

Collection of homework (mostly Spark-based) from the course "Big Data Computing" - University of Padua.

java spark big-data-processing

Updated Jun 25, 2020
Java

siddharths067 / Easy-Airflow-Deployment

Star

A Docker Compose Template to deploy Airflow with sync from a remote repository

data-science big-data etl apache-airflow big-data-analytics big-data-processing

Updated Aug 30, 2020
Shell

mtumilowicz / big-data-scala-spark-batch-workshop

Star

Introduction to Spark Batch processing.

big-data workshop spark workshop-materials batch-processing spark-sql big-data-processing

Updated Feb 14, 2021
Scala

SCCH-KVS / AVUBDI

Star

Github Repository for a versatile usable Big Data infrastructure (AVUBDI)

kafka spark docker-swarm template-project big-data-platform process-monitoring big-data-processing process-industry

Updated Feb 16, 2021
Shell

software-competence-center-hagenberg / AVUBDI

Star

Github Repository for a versatile usable Big Data infrastructure (AVUBDI)

docker kafka spark docker-compose docker-swarm template-project big-data-platform big-data-processing

Updated Feb 23, 2021
Shell

Enkhai / CURE-spark

Star

CURE clustering algorithm implementation in Scala with Spark

scala spark clustering cure big-data-processing

Updated Jun 15, 2021
Scala

ridakn / Big-Data-Top-K-Words

Star

Project using Python, Hive and MapReduce to compare various techniques to find the top K words in a very large file i.e. different techniques to process Big Data.

big-data hive mapreduce mapreduce-python top-k-query big-data-processing

Updated Jun 23, 2021
Python

chandnii7 / Big-Data-Processing-Pipeline

Star

A pipeline that consumes twitter data to extract meaningful insights about a variety of topics using the following technologies: twitter API, Kafka, MongoDB, and Tableau.

kafka big-data mongodb twitter-api data-visualization zookeeper data-analytics kafka-consumer kafka-producer tableau nosql-database kafka-streaming big-data-processing data-processing-pipelines

Updated Aug 2, 2021
Python

felipefrizzo / terraform-aws-kinesis-firehose

Star

This code creates a Kinesis Firehose in AWS to send CloudWatch log data to S3.

big-data analytics terraform kinesis-firehose cloudwatch-logs parquet terraform-provider etl-job terraform-aws big-data-processing

Updated Aug 4, 2021
HCL

franck-mahieu / datasets-toolbox

Star

datasets-toolbox are some scripts usefull to generate, transfom and valid large dataset files, not openable with editor because too large. datasets-toolbox provide also a ping script.

json json-data ping transform-data toolbox json-parsing jsonlines jsonl big-data-processing ping-launch