[FEATURE REQUEST]: Benchmark Spark.NET versus PySpark and SparkR #1147

GeorgeS2019 · 2023-03-25T16:38:19Z

Is it possible to replace the existing Spark.NET with one that takes Spark scalar/java codes or Jars and compile that to .NET using IKVM?

I know this is outside your scope, CC you as community here could start investigating ikvm

ChatGPT reply

PySpark is a Python API for Apache Spark which is a data processing framework. The Spark core is implemented by Scala and Java, but it also provides different wrappers including Python (PySpark), R (SparkR), and SQL (Spark SQL). You can install Spark separately (which would include all of the wrappers), or install Python version only by using pip or conda 1.

SparkR is an R package that provides a light-weight frontend to use Apache Spark from R. It is similar to PySpark but for R users 1.

SparkR versus Sparklyr

sparklyr is an R package developed by RStudio folks and provides a complete dplyr backend to Spark, using the same dplyr syntax. That implies that switching between environments does not require changing of function names. In contrast to SparkR, here we operate on tables/tibbles, which are mapped to Spark DataFrames 1.

GeorgeS2019 added the enhancement New feature or request label Mar 25, 2023

Provide feedback