Skip to content

Commit

Permalink
Merge pull request #2 from pauldeschacht/spark3.3
Browse files Browse the repository at this point in the history
bump Spark from 3.1.1 to 3.3.0
  • Loading branch information
pauldeschacht committed Feb 19, 2024
2 parents bbd9065 + b26d6ef commit efa4944
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 9 deletions.
27 changes: 21 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,25 +15,40 @@ This project creates the Spark UDF function which runs on **JDK 8**

* `sbt compile` to compile the jar
* `sbt assembly` to create the assembly jar to distribute on the Spark cluster
* `build.sbt` defines :
* scala version is set to 2.12.15
* Spark version is set to 3.11
* In case you want to run the local integration test (local Hadoop and Spark), pay attention to the Hadoop cluster version. Hadoop is sensitive to versions of the different components.

Current [requirements for Spark 3.1.1](https://spark.apache.org/docs/3.1.1/)
## Version 0.2

Scala version is set to 2.12.15
Spark version is set to 3.1.1

[Requirements for Spark 3.1.1](https://spark.apache.org/docs/3.1.1/)

* Spark runs on Java 8/11, Scala 2.12,
* For the Scala API, Spark 3.1.1 uses Scala 2.12. You will need to use a compatible Scala version (2.12.x).
* spark-core has a transitive dependency on hadoop-common 3.2

## Version 0.3

Scala version is set to 2.12.15
Spark version is set to 3.3.0

[Requirements for Spark 3.3.0](https://spark.apache.org/docs/3.3.0/)

* Spark runs on Java 8/11/17, Scala 2.12/2.13, Python 3.7+ and R 3.5+.
* Java 8 prior to version 8u201 support is deprecated as of Spark 3.2.0.
* For the Scala API, Spark 3.3.0 uses Scala 2.12. You will need to use a compatible Scala version (2.12.x).


* In case you want to run the local integration test (local Hadoop and Spark), pay attention to the Hadoop cluster version. Hadoop is sensitive to versions of the different components.


## Test Environment

Note: To run the integration test, JDK8 environment is needed.


* `sbt test` to run the local test (no Spark involved)
* `sbt IntegrationTest / testOnly` sets up a local Hadoop/Spark cluster (no docker)
* `sbt IntegrationTest/testOnly` sets up a local Hadoop/Spark cluster (no docker)

The integration test starts a local Hadoop / Spark node. The initialization of the `mini cluster` loads some sample data files `src/it/resources/ips` as parquet files and registers the (free downloaded MaxMind database)[https://dev.maxmind.com/geoip/geolite2-free-geolocation-data#accessing-geolite2-free-geolocation-data] to the Spark environment.

Expand Down
8 changes: 5 additions & 3 deletions build.sbt
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
name := "SparkIpGeo"
version := "0.2"
version := "0.3"

scalaVersion := "2.12.15"
val sparkVersion = "3.1.1"
val hadoopVersion = "3.1.0" // this must be aligned with the %HADOOP_HOME%
val sparkVersion = "3.3.0"
val hadoopVersion = "3.3.4" // this must be aligned with the %HADOOP_HOME%

// https://github.com/s911415/apache-hadoop-3.1.0-winutils/tree/master/bin

Expand Down Expand Up @@ -44,8 +44,10 @@ lazy val global = project
"org.apache.spark" %% "spark-sql" % sparkVersion % "it",
"org.scalatest" %% "scalatest" % "3.1.0" % "test,it",
"org.apache.hadoop" % "hadoop-common" % hadoopVersion % "it",
"org.apache.hadoop" % "hadoop-auth" % hadoopVersion % "it",
"org.apache.hadoop" % "hadoop-hdfs" % hadoopVersion % "it",
"org.apache.hadoop" % "hadoop-minicluster" % hadoopVersion % "it",
"org.mockito" % "mockito-core" % "2.28.2" % "it",
"com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.13.4" % "it",
),
assemblyJarName := s"${name.value}_${scalaBinaryVersion.value}-${sparkVersion}_${version.value}.jar",
Expand Down

0 comments on commit efa4944

Please sign in to comment.