Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.NET for Apache Spark v2.1.2 in .NET7 ? #1139

Open
6 of 7 tasks
GeorgeS2019 opened this issue Feb 14, 2023 · 20 comments
Open
6 of 7 tasks

.NET for Apache Spark v2.1.2 in .NET7 ? #1139

GeorgeS2019 opened this issue Feb 14, 2023 · 20 comments
Labels
bug Something isn't working

Comments

@GeorgeS2019
Copy link

GeorgeS2019 commented Feb 14, 2023

Update 28thFeb2023

Thanks to @dragorosson
Following the advice by @dragorosson

It is possible in Windows to prepare openjdk-8-jdk, mvm, spark-3.2.3-bin-hadoop3.2 to create

version = 2.1.1
microsoft-spark-3-2/target/microsoft-spark-3-2-3_2.12-[2.1.1].jar

By compiling Microsoft.Spark.Worker in .NET7, instead of .NET6.0, this ensure consistency of addressing the BinaryFormatter compiling error. UDF now works. Delta works.

Next open issue:

  • Key issue left is support for microsoft-spark-3-3.jar
Previous discussions #### Update 23rdFeb2023

Please share your feedback to this observation provided here, but please do this within this issue for tracking purposes.

For Spark, this project is key for .NET developers to stay within .NET when dealing with big data analytics. It is unclear WHY there are questionable and sporadic commitments shown here. If this effort here fails OR with further delay, it could have a ripple effect on the ENTIRE machine learning and deep learning .NET efforts.

The triangular THREE PRONGED .NET efforts to KEEP big data analytics within .NET could be questionable.

  • Machine Learning (ML.NET)
  • PolyGlot
  • .NET for Spark

Update 20thFeb2023

  • Test the WIP .NET6 version
  • Check the WIP .NET6 with microsoft-spark-3-2.jar
    • Check the WIP .NET6 with microsoft-spark-3-2_2.12-2.1.1.jar

      So far, I could only get the WIP merged .NET6 to work with microsoft-spark-3-1_2.12-2.1.1.jar

Update 15thFeb2023

It seems the Azure Synapse team has officially deleted ALL C# .NET for Spark samples for Synapse Jan 2023. Sad!

image

@GeorgeS2019 GeorgeS2019 added the bug Something isn't working label Feb 14, 2023
@GeorgeS2019
Copy link
Author

It seems the support for .NET for Spark is back :-) We will be getting the .NET6 version soon.

@mfidemraizer
Copy link

Anyway, what has changed your mind @GeorgeS2019? The fact that there're recent PRs and commits, how is this demonstrating that Microsoft still supports .NET for Spark? I'm really worried about this, because we adopted .NET for Spark few months ago.

@GeorgeS2019
Copy link
Author

@mfidemraizer

The more feedback and participation from .NET for Spark users, the more commitment we could expect from the dotnet team.

Reading through the issues, PRs, and discussions here, it seems to me that the dotnet team needs MORE participation and feedback to justify MORE commitment to this project.

Nothing to be alarming!

We need more active Participation from users HERE! => put some emoji, here and there, raise some feedback WILL HELP to keep the dotnet team continue the support for .NET for SPARK

@GeorgeS2019
Copy link
Author

Questions for ALL!

How many of you are able to get UDF to work? if so, which version of Microsoft Spark e.g. microsoft-spark-3-2.jar, which version of Microsoft.Spark.dll

@dragorosson
Copy link
Contributor

I am able to use UDFs. I'm compiling Microsoft.Spark.Worker with .Net 7 and would be using the jar from the NuGet package except I needed to run with Spark 3.2.2 not 3.2.1 so I had to build the jars myself. I use the latest NuGet package and copy the custom jar into the output directory whenever I build. The main issue I ran into was dependency resolution but compiling Microsoft.Spark.Worker and my spark program with dotnet publish --self-contained has solved that.

@GeorgeS2019
Copy link
Author

GeorgeS2019 commented Feb 27, 2023

@dragorosson
I am still learning.

I needed to run with Spark 3.2.2 not 3.2.1 so I had to build the jars myself.

microsoft-spark-3-2_2.12-2.1.1.jar is build with the latest commit when running mvn clean package

microsoft-spark-3-2_2.12-2.1.1.jar is distributed with nuget.

Sorry for my ignorance:

  • why the provided microsoft-spark-3-2_2.12-2.1.1.jar from nuget does not allow you to run Spark 3.2.2
object DotnetRunner extends Logging {
  private val DEBUG_PORT = 5567
  private val supportedSparkMajorMinorVersionPrefix = "3.2"
  private val supportedSparkVersions = Set[String]("3.2.0", "3.2.1", "3.2.2", "3.2.3")
  • does the newly build microsoft-spark-3-2_2.12-2.1.1.jar support 3.2.3 ?

  • What need to be done to create microsoft-spark-3-3 folder in the spark\src\scala folder

What are the additional modifications needed to bring the existing spark-3.3 to be compatible with the microsoft-spark-3-3 ?

Ref:

Support for 3.2.2
Support for 3.2.3

@GeorgeS2019
Copy link
Author

@dragorosson

What changes have you made to overcome

UDF use deprecated BinaryFormatter ??

@dragorosson
Copy link
Contributor

* why the provided `microsoft-spark-3-2_2.12-2.1.1.jar ` from nuget does not allow you to run Spark 3.2.2

The most recent nuget package is from 9 months ago https://www.nuget.org/packages/Microsoft.Spark/2.1.1
And the change allowing 3.2.2 was merged 1 month ago #1122
The .Net for Spark team will need to do a new release.

* does the newly build  `microsoft-spark-3-2_2.12-2.1.1.jar ` support 3.2.3 ?

If you just built it locally, yes, it should. If you try to spark-submit with an unsupported version, it will fail quickly and tell you want versions it does support.

* What need to be done to create `microsoft-spark-3-3` folder in the` spark\src\scala` folder

I don't know. Spark 3.3 is not supported yet so I don't think it'd produce that jar.

What are the additional modifications needed to bring the existing spark-3.3 to be compatible with the microsoft-spark-3-3

I have no idea. I don't have time to dig around the internals or look through previous PRs that added support for previous versions, etc.!

What changes have you made to overcome

UDF use deprecated BinaryFormatter ??

#pragma warning disable SYSLIB0011 just like got merged in with the .Net 6 PR 🤷

@GeorgeS2019
Copy link
Author

GeorgeS2019 commented Feb 28, 2023

FYI: UDF

UDF now works. Here is the key

@GeorgeS2019 GeorgeS2019 changed the title [BUG]: Will .NET for Spark still supported in 2023? .NET for Apache Spark v2.1.2 in .NET7 ? Mar 1, 2023
@lloydjatkinson
Copy link

As someone that has never used Apache Spark and seeing this big warning in the first page of the docs, and then seeing this issue, I do not feel confident in using it if it's always going to be so far behind it needs a special warning...

image

.NET for Apache Spark targets an out of support version of .NET (.NET Core 3.1). For more details see the .NET Support Policy.

@dbeavon
Copy link

dbeavon commented Mar 3, 2023

@lloydjatkinson
FYI, The "Synapse Analytics" PG has started putting that warning all over the place. It even appears in some docs that have absolutely nothing to do with Synapse.

Technically it is not even true, since the .Net 6 PR is in place (see #1112 ) At this point it appears that the communication is willfully incorrect.

What happened is that I had run into a bug on the Synapse-Spark platform, which did NOT occur on any other Spark platform. The root cause of the bug was not actually even related to .Net, in the final analysis. Instead of working on the investigation and working on the fix for the bug, this Synapse PG decided they didn't want to support .Net anymore. So they killed .Net in Synapse 3.3, and they started down the scorched-earth path to make sure nobody ever tries to use .Net-for-Spark on any of the competing cloud platforms either (eg. Databricks or HDI).

Given that they are now making people fearful about this project, they are certainly accomplishing their mission!

I've used .Net for Spark many years and it is a beautiful thing. The PR for .Net 6 is going to allow us to look forward to many more years. This project basically allows .Net to piggyback on the Python bindings, and prevents the need for a .Net programmer to resort to another (inferior) language in order to do our MPP data transformations and ETL's.

@lloydjatkinson
Copy link

OK this goes over by head a bit and sounds a bit political? Is Microsoft or the .NET team aware of this? What is the plan going forward?

@Vislesha
Copy link

Vislesha commented Mar 25, 2023

Hi Team: Any timeline for the above PR and possible new release of this library. We are heavily invested in this library (because of our existing .Net dependency) and has been waiting to see a full compatibility of .Net 6.0 & Spark 3.2 (or above) from longtime! I have raised couple of feature requests (#958, #983, #1100) a while back but not much traction on those. Could you please let us know if this library is going to be supported going forward and a possible timeframe for an updated version.

@lloydjatkinson
Copy link

cc @imback82

@Vislesha
Copy link

Vislesha commented Apr 13, 2023

Hi @imback82 , @Niharikadutta , @GeorgeS2019 , @dbeavon,
Could you please let me know if we can depend on this library and if there's any chance we see a new version of this library at all?
Thanks

@dragorosson
Copy link
Contributor

Hi @imback82 , @Niharikadutta , @GeorgeS2019 , @dbeavon, Could you please let me know if we can depend on this library and if there's any chance we see a new version of this library at all? Thanks

^ + @suhsteve @AFFogarty

@GeorgeS2019
Copy link
Author

@dragorosson
I hope everyone here is clear. I am independent, with nothing to do with Dotnet or Microsoft.

@imback82
Copy link
Contributor

I am no longer maintaining this repo (no write access), but I think @AFFogarty / @suhsteve should be able to answer your question.

@GeorgeS2019
Copy link
Author

GeorgeS2019 commented Apr 27, 2023

@imback82

Just curious, hoping for a small comment

Do you see a possibility of using IKVM to eventually make Spark works in .NET?

#IKVM now support .NET6

@tonyqus
Copy link

tonyqus commented Mar 14, 2024

@imback82 Since you are not following Microsoft NDA, I think it's easier from your side. As I investigated, Spark.NET team is inside the Microsoft Fabric team which is different from Azure Synapse team (it belongs to Azure SQL team as I know. Correct me if I'm wrong).

Short Question: Is Microsoft Fabric dying?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants