spark-mllib

Spark library for generalized K-Means clustering. Supports general Bregman divergences. Suitable for clustering probabilistic data, time series data, high dimensional data, and very large data.

spark entropy clustering embeddings kullback-leibler-divergence cosine-similarity k-means spark-mllib similarity-search euclidean-distance bregman-divergence itakura-saito-divergence

Updated Jan 19, 2024
HTML

marcocolangelo / Big-Data-processing-and-Analytics

Star

The current repository contains all the code developed during the Big Data processing and Analytics laboratories. Data are processed and analyzed using Hadoop and Spark

java spark spark-streaming data-analysis hadoop-mapreduce spark-sql spark-mllib big-data-analytics hadoop-hdfs

Updated Jan 15, 2024
Java

mikerly131 / project_attempt_on_hold

Star

Abandoned in favor of FastAPI and new repo.

kafka postgresql webapp spark-streaming fhir kafka-consumer kafka-producer spark-sql spark-mllib

Updated Dec 21, 2023
CSS

miguelangel43 / Prediction-Flight-Arrivals-Delays-Spark

Star

Application that trains a classifier and predicts flight arrival delays based on past information. Uses the libraries pyspark.ml and pyspark.sql, performs feature engineering, cross-validation and tests various ML algorithms.

spark spark-sql spark-mllib

Updated Dec 10, 2023
Python

ShubhamJagtap2000 / Spark-Python

Star

🐍💥Python and Spark for Big Data

mysql sql big-data spark jupyter-notebook python3 pyspark spark-sql spark-mllib pyspark-tutorial pyspark-api pyspark-python sparkdataframe

Updated Oct 28, 2023
Jupyter Notebook

polaternez / Introduction-to-Big-Data

Star

Big Data projects for beginners

java elasticsearch kafka mongodb zookeeper spark-streaming spark-sql spark-mllib spark-core

Updated Oct 20, 2023
Java

grishenkovp / apache_spark

Star

Изучение Apache Spark. Библиотека PySpark

spark apache-spark pyspark spark-streaming spark-sql spark-mllib

Updated Sep 7, 2023
Jupyter Notebook

BinhMinhs10 / DataMiningExample

Star

Maven project cover scala language: sparkml, spark_streaming, spark_dataframe, ... + java language: threadpool, kafka, jpa, timer, request api

spark spark-streaming thread-pool spark-mllib timertask

Updated Sep 5, 2023
Scala

lucashomuniz / Project-14

Star

Development of an AutoML System to Predict the Compressive Strength of Concrete

machine-learning apache-spark linear-regression python-script pyspark data-analysis isotonic-regression spark-mllib civil-engineering concrete-strength automl-algorithms decisiontreeregressor randomforestregressor gbt-classification

Updated Aug 2, 2023
Python

Lucass97 / FlightAnalysis

Star

This project implemented a lambda architecture for analyzing domestic flight data in the US from 2009 to 2020. It used Apache Spark for batch processing, Spark Streaming for real-time analysis, and SVM models to predict flight cancellations and delays, with Docker for cluster management and Grafana for real-time visualization.

Updated Jul 28, 2023
Jupyter Notebook

simbafl / spark-branch-2.4

Star

源码剖析Spark2.4

spark spark-streaming sparksql spark-sql spark-mllib

Updated Jul 22, 2023
Scala

Siddharth1989 / ProspectiveTopUpCustomerPrediction

Star

Developed a model/Spark ML pipeline stream to identify potential customers that may purchase top up services in the future.

pyspark spark-streaming decision-trees decision-tree-classifier prediction-model spark-sql spark-mllib spark-ml correlation-matrix gradient-boosted-trees classification-models

Updated Jun 27, 2023
Jupyter Notebook

huzaifakhan04 / amazon-product-recommendation-system-web-application-using-mongodb-pyspark-and-apache-kafka

Star

This repository includes a web application that is connected to a product recommendation system developed with the comprehensive Amazon Review Data (2018) dataset, consisting of nearly 233.1 million records and occupying approximately 128 gigabytes (GB) of data storage, using MongoDB, PySpark, and Apache Kafka.

machine-learning mongodb nosql amazon web-application pyspark statistical-inference flask-application supervised-learning als apache-kafka inferential-statistics product-reviews spark-mllib customer-reviews alternating-least-squares amazon-reviews product-recommendations product-recommender-system

Updated Jun 26, 2023
Jupyter Notebook

Improve this page

Add a description, image, and links to the spark-mllib topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the spark-mllib topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark-mllib

Here are 205 public repositories matching this topic...

LuisFalva / ophelia

databricks / LearningSparkV2

hazecodeio / spark-sandbox

aliabbasi2000 / Spark

josemarialuna / ExternalValidity

OrvilleX / MachineLearning

CaioBrainer / Hadoop_Ecosystem_Projects

derrickburns / generalized-kmeans-clustering

marcocolangelo / Big-Data-processing-and-Analytics

mikerly131 / project_attempt_on_hold

miguelangel43 / Prediction-Flight-Arrivals-Delays-Spark

ShubhamJagtap2000 / Spark-Python

polaternez / Introduction-to-Big-Data

grishenkovp / apache_spark

BinhMinhs10 / DataMiningExample

lucashomuniz / Project-14

Lucass97 / FlightAnalysis

simbafl / spark-branch-2.4

Siddharth1989 / ProspectiveTopUpCustomerPrediction

huzaifakhan04 / amazon-product-recommendation-system-web-application-using-mongodb-pyspark-and-apache-kafka

Improve this page

Add this topic to your repo