Merge pull request #2 from pauldeschacht/spark3.3

bump Spark from 3.1.1 to 3.3.0
pauldeschacht · Feb 19, 2024 · efa4944 · efa4944
2 parents bbd9065 + b26d6ef
commit efa4944
Show file tree

Hide file tree

Showing 2 changed files with 26 additions and 9 deletions.
diff --git a/README.md b/README.md
@@ -15,25 +15,40 @@ This project creates the Spark UDF function which runs on **JDK 8**
 
 * `sbt compile` to compile the jar 
 * `sbt assembly` to create the assembly jar to distribute on the Spark cluster
-*  `build.sbt` defines :
-   * scala version is set to 2.12.15
-   * Spark version is set to 3.11
-*  In case you want to run the local integration test (local Hadoop and Spark), pay attention to the Hadoop cluster version. Hadoop is sensitive to versions of the different components.
 
-Current [requirements for Spark 3.1.1](https://spark.apache.org/docs/3.1.1/)
+## Version 0.2
+
+Scala version is set to 2.12.15
+Spark version is set to 3.1.1
+
+[Requirements for Spark 3.1.1](https://spark.apache.org/docs/3.1.1/)
 
 * Spark runs on Java 8/11, Scala 2.12,
 * For the Scala API, Spark 3.1.1 uses Scala 2.12. You will need to use a compatible Scala version (2.12.x).
 * spark-core has a transitive dependency on hadoop-common 3.2
 
+## Version 0.3 
+
+Scala version is set to 2.12.15
+Spark version is set to 3.3.0
+
+[Requirements for Spark 3.3.0](https://spark.apache.org/docs/3.3.0/)
+
+* Spark runs on Java 8/11/17, Scala 2.12/2.13, Python 3.7+ and R 3.5+. 
+* Java 8 prior to version 8u201 support is deprecated as of Spark 3.2.0. 
+* For the Scala API, Spark 3.3.0 uses Scala 2.12. You will need to use a compatible Scala version (2.12.x).
+
+
+*  In case you want to run the local integration test (local Hadoop and Spark), pay attention to the Hadoop cluster version. Hadoop is sensitive to versions of the different components.
+
 
 ## Test Environment 
 
 Note: To run the integration test, JDK8 environment is needed.
 
 
 * `sbt test` to run the local test (no Spark involved)
-* `sbt IntegrationTest / testOnly` sets up a local Hadoop/Spark cluster (no docker)
+* `sbt IntegrationTest/testOnly` sets up a local Hadoop/Spark cluster (no docker)
 
 The integration test starts a local Hadoop / Spark node. The initialization of the `mini cluster` loads some sample data files `src/it/resources/ips` as parquet files and registers the (free downloaded MaxMind database)[https://dev.maxmind.com/geoip/geolite2-free-geolocation-data#accessing-geolite2-free-geolocation-data] to the Spark environment.
 

diff --git a/build.sbt b/build.sbt
@@ -1,9 +1,9 @@
 name := "SparkIpGeo"
-version := "0.2"
+version := "0.3"
 
 scalaVersion := "2.12.15"
-val sparkVersion = "3.1.1"
-val hadoopVersion = "3.1.0"  // this must be aligned with the %HADOOP_HOME%
+val sparkVersion = "3.3.0"
+val hadoopVersion = "3.3.4"  // this must be aligned with the %HADOOP_HOME%
 
 // https://github.com/s911415/apache-hadoop-3.1.0-winutils/tree/master/bin
 
@@ -44,8 +44,10 @@ lazy val global = project
       "org.apache.spark" %% "spark-sql" % sparkVersion % "it",
       "org.scalatest" %% "scalatest" % "3.1.0" % "test,it",
       "org.apache.hadoop" % "hadoop-common" % hadoopVersion % "it",
+      "org.apache.hadoop" % "hadoop-auth" % hadoopVersion % "it",
       "org.apache.hadoop" % "hadoop-hdfs" % hadoopVersion % "it",
       "org.apache.hadoop" % "hadoop-minicluster" % hadoopVersion % "it",
+      "org.mockito" % "mockito-core" % "2.28.2" % "it",
       "com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.13.4" % "it",
     ),
     assemblyJarName := s"${name.value}_${scalaBinaryVersion.value}-${sparkVersion}_${version.value}.jar",