You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Analyzed 157 US Energy stocks (Jan-Dec '23), identified Bullish/Bearish trends and risk categories. Used KMeans, Hierarchical, Spectral Clustering, revealing balanced returns and low volatility. Integrated data with Kafka for seamless subscriptions.
The 2022 Big Data Bowl data contains Next Gen Stats player tracking, play, game, player, and PFF scouting data for all 2018-2020 Special Teams play. Here, you'll find a summary of each data set in the 2022 Data Bowl, a list of key variables to join on, and a description of each variable.
Pyspark serves as a Python interface to Apache Spark, enabling the execution of Python and SQL-like instructions for the manipulation and analysis of data within a distributed processing framework.
We manage employer-employee administrative data and elections data to estimate the causal impact of electing a STEM candidate on epidemiological outcomes
Evaluates the execution time differences between RDD (Resilient Distributed Datasets) and DataFrame data structures in Apache Spark. Also takes into account the file format being used, such as CSV or Parquet.
Solved various big data problems using pySpark . Variety of Tranformations and Actions are applied on RDDs and Data-Frames to extract different insights from various Data-Sets which are very huge in file ranging in GBs.