learn-apache-spark

Learn Apache Spark Java

This repo to demonstrate some features of Apache Spark like RDD, SQL, Streaming, ...

Apache Spark™ is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters

Apache Spark features

Batch/streaming data: Unify the processing of your data in batches and real-time streaming, using your preferred language: Python, SQL, Scala, Java or R.
SQL analytics: Execute fast, distributed ANSI SQL queries for dashboarding and ad-hoc reporting. Runs faster than most data warehouses. Apache Spark™ is built on an advanced distributed SQL engine for large-scale data.
Machine learning: Train machine learning algorithms on a laptop and use the same code to scale to fault-tolerant clusters of thousands of machines.
Data science at scale: Perform Exploratory Data Analysis (EDA) on petabyte-scale data without having to resort to downsampling

Demo

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
apache-spark-rdd-demo		apache-spark-rdd-demo
apache-spark-sql-demo		apache-spark-sql-demo
apache-spark-streaming-demo		apache-spark-streaming-demo
gradle/wrapper		gradle/wrapper
images		images
.gitignore		.gitignore
README.md		README.md
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apache-spark-rdd-demo

apache-spark-rdd-demo

apache-spark-sql-demo

apache-spark-sql-demo

apache-spark-streaming-demo

apache-spark-streaming-demo

gradle/wrapper

gradle/wrapper

images

images

.gitignore

.gitignore

README.md

README.md

gradlew

gradlew

gradlew.bat

gradlew.bat

settings.gradle

settings.gradle

Repository files navigation

learn-apache-spark

Apache Spark features

Demo

RDD

Spark SQL

Streaming

References

About

Releases

Packages

Languages

gpcodervn/learn-apache-spark

Folders and files

Latest commit

History

Repository files navigation

learn-apache-spark

Apache Spark features

Demo

RDD

Spark SQL

Streaming

References

About

Topics

Resources

Stars

Watchers

Forks

Languages