Ganymede: Jupyter Notebook Java Kernel

The Ganymede Kernel is a Jupyter Notebook Java kernel. Java code is compiled and interpreted with the Java Shell tool, JShell. This kernel offers the following additional features:

Integrated Project Object Model (POM) for Apache Maven artifact dependency resolution¹
Integrated support for Structured Query Language (SQL) through JDBC and jOOQ
Integrated support for JSR 223 scripting languages including:
- Groovy
- Javascript²
- Kotlin
Templates (via any of Thymeleaf, Markdown (CommonMark) with JMustache, FreeMarker, and Velocity)
Support for Apache Spark and Scala binary distributions

Installation

The Ganymede Kernel is distributed as a single JAR (download here).

⚠️ Only Jupyter Notebook versions before 7 (<7) are fully supported at this time. See the Pipfile in ganymede-notebooks for a minimal Python configuration.

Java 11 or later is required. In addition to Java, the Jupyter Notebook must be installed first and the jupyter and python commands must be on the ${PATH}. Then the typical (and minimal) installation command line:

$ java -jar ganymede-2.1.2.20230910.jar -i

The kernel will be configured to use the same java installation as invoked in the install command above. These additional command line options are supported.

Option	Action	Default
--id-prefix=<prefix>	Adds prefix to kernel ID	<none>
--id=<id>	Specifies kernel ID	`ganymede-${version}-java-${java.specification.version}`
--id-suffix=<suffix>	Adds suffix to kernel ID	<none>
--display-name-prefix=<prefix>	Adds prefix to kernel display name	<none>
--display-name=<name>	Specifies kernel display name	Ganymede `${version}` (Java `${java.specification.version}`)
--display-name-suffix=<suffix>	Adds suffix to kernel display name	<none>
--env	Specify NAME=VALUE pair(s) to add to kernel environment
--copy-jar=<boolean>	Copies the Ganymede Kernel JAR to the `kernelspec` directory	true
--sys-prefix or --user	Install in the system prefix or user path (see the `jupyter kernelspec install` command).	--user

The following Java system properties may be configured.

System Properties

Action

Default(s)

maven.repo.local

Configures the local Maven repository

--sys-prefix	${jupyter.data}/repository/
--user	${user.home}/.m2/

The following OS environment variables may be configured:

Environment Variable	Option	Action
SPARK_HOME	--spark-home=<path>	If configured, the kernel will add the Apache Spark JARs to the kernel's classpath.
HIVE_HOME	--hive-home=<path>	If configured, the kernel will add the Apache Hive JARs to the kernel's classpath.

For example, a sophisticated configuration to test a snapshot out of a user's local Maven repository:

$ export JAVA_HOME=$(/usr/libexec/java_home -v 11)
$ ${JAVA_HOME}/bin/java \
      -jar ${HOME}/.m2/repository/ganymede/ganymede/2.2.0-SNAPSHOT/ganymede-2.2.0-SNAPSHOT.jar \
      -i --sys-prefix --copy-jar=false \
      --id-suffix=spark-3.3.3 --display-name-suffix="with Spark 3.3.3" \
      --spark_home=/path/to/spark-home --hive_home=/path/to/hive-home
$ jupyter kernelspec list
Available kernels:
...
  ganymede-2.2.0-java-11-spark-3.3.3             /.../share/jupyter/kernels/ganymede-2.2.0-java-11-spark-3.3.3
...

would result in the configured ${jupyter.data}/kernels/ganymede-2.2.0-java-11-spark-3.3.3/kernel.json kernelspec:

{
  "argv": [
    "/Library/Java/JavaVirtualMachines/graalvm-ce-java11-22.3.0/Contents/Home/bin/java",
    "--add-opens",
    "java.base/jdk.internal.misc=ALL-UNNAMED",
    "--illegal-access=permit",
    "-Djava.awt.headless=true",
    "-Djdk.disableLastUsageTracking=true",
    "-Dmaven.repo.local=/Users/ball/Notebooks/.venv/share/jupyter/repository",
    "-jar",
    "/Users/ball/.m2/repository/dev/hcf/ganymede/ganymede/2.2.0-SNAPSHOT/ganymede-2.2.0-SNAPSHOT.jar",
    "-f",
    "{connection_file}"
  ],
  "display_name": "Ganymede 2.2.0 (Java 11) with Spark 3.3.3",
  "env": {
    "JUPYTER_CONFIG_DIR": "/Users/ball/.jupyter",
    "JUPYTER_CONFIG_PATH": "/Users/ball/.jupyter:/Users/ball/Notebooks/.venv/etc/jupyter:/usr/local/etc/jupyter:/etc/jupyter",
    "JUPYTER_DATA_DIR": "/Users/ball/Library/Jupyter",
    "JUPYTER_RUNTIME_DIR": "/Users/ball/Library/Jupyter/runtime",
    "SPARK_HOME": "/path/to/spark-home"
  },
  "interrupt_mode": "message",
  "language": "java"
}

The kernel makes extensive use of templates and POM fragments. While not strictly required, the authors suggest that the Hide Input extension is enabled so notebook authors can hide the input templates and POMs for any finished product. This may be set from the command line with:

$ jupyter nbextension enable hide_input/main --sys-prefix

(or --user as appropriate).

Features and Usage

The following subsections outline many of the features of the kernel.

Java

The Java REPL is JShell and has all the Java features of the installed JVM. The minimum required Java version is 11 and subsequent versions are supported.

The JShell environment includes builtin functions implemented through methods that wrap the public methods defined in NotebookContext class annotated with @NotebookFunction. These functions include:

Method	Description
print(Object)	Render the Object to a Notebook format
display(Object)	Render the Object to a Notebook format
asJson(Object)	Convert argument to JsonNode
asYaml(Object)	Convert argument to YAML (String)

The builtin functions are mostly concerned with "printing" or displaying (rendering) Objects to multimedia formats. For example, print(byte[]) will render the byte array as an image. Integrated renderers for chart and plot objects include:

The trig.ipynb notebook demonstrates rendering of an XChart.

As discussed in the next section, the magic identifier for java is %%java. A cell identified with %%java with no code will provide a table of variable bindings in the context with types and values. The types are links to the corresponding javadoc (if known).

Name	Type	Value
$$	`ganymede.notebook.NotebookContext`	NotebookContext(super=ganymede.notebook.NotebookContext@af7e376)
by_state	`org.apache.spark.sql.Dataset<Row>`	[Country/Region: string, Province/State: string ... 1 more field]
chart	`org.knowm.xchart.PieChart`	org.knowm.xchart.PieChart@767f4a69
countries_aggregated	`org.apache.spark.sql.Dataset<Row>`	[Date: date, Country: string ... 3 more fields]
dates	`org.apache.spark.sql.Dataset<Row>`	[Date: date]
interval	`org.apache.spark.sql.Row`	[2020-01-22,2022-04-16]
key_countries_pivoted	`org.apache.spark.sql.Dataset<Row>`	[Date: date, China: int ... 7 more fields]
reader	`org.apache.spark.sql.DataFrameReader`	org.apache.spark.sql.DataFrameReader@5a88849
reference	`org.apache.spark.sql.Dataset<Row>`	[UID: int, iso2: string ... 10 more fields]
session	`org.apache.spark.sql.SparkSession`	org.apache.spark.sql.SparkSession@1b6683c4
snapshot	`org.apache.spark.sql.Dataset<Row>`	[Country/Region: string, Deaths: int]
time_series_19_covid_combined	`org.apache.spark.sql.Dataset<Row>`	[Date: date, Country/Region: string ... 4 more fields]
us_confirmed	`org.apache.spark.sql.Dataset<Row>`	[Admin2: string, Date: date ... 3 more fields]
us_deaths	`org.apache.spark.sql.Dataset<Row>`	[Admin2: string, Date: date ... 3 more fields]
us_simplified	`org.apache.spark.sql.Dataset<Row>`	[Date: date, Admin2: string ... 4 more fields]
worldwide_aggregate	`org.apache.spark.sql.Dataset<Row>`	[Date: date, Confirmed: int ... 3 more fields]

Magics

Cell magic commands are identified by %% starting the first line of a code cell. The list of available magic commands is shown below. The default cell magic is java.

Name(s)	Description
!, script	Execute script with the argument command
bash	Execute script with 'bash' command
classpath	Add to or print JShell classpath
env	Add/Update or print the environment
freemarker	FreeMarker template evaluator
groovy	Execute code in groovy REPL
html	HTML template evaluator
java	Execute code in Java REPL
javascript, js	Execute code in javascript REPL
kotlin	Execute code in kotlin REPL
magics	Lists available cell magics
markdown	Markdown template evaluator
mustache, handlebars	Mustache template evaluator
perl	Execute script with 'perl' command
pom	Define the Notebook's Project Object Model
ruby	Execute script with 'ruby' command
scala	Execute code in scala REPL
sh	Execute script with 'sh' command
spark-session	Configure and start a Spark session
sql	Execute code in SQL REPL
thymeleaf	Thymeleaf template evaluator
velocity	Velocity template evaluator

script, bash, perl, etc. are executed by creating a Process instance. groovy, javascript, kotlin, etc. are provided through their respective JSR 223 interfaces.³ Dependency and classpath management are provided with the classpath and pom magics and are described in detail in a subsequent subsection. thymeleaf and html provide Thymeleaf template evaluation.

The kernel does not implement any "line" magics.

Dependency and Classpath Management

The classpath magic adds JAR and directory paths to the JShell classpath. The pom magic resolves and downloads Maven artifacts and then adds those artifacts to the classpath.

The trig.ipynb notebook demonstrates the use of the pom magic to resolve the org.knowm.xchart:xchart:LATEST artifact and its transient dependencies.

%%pom
dependencies:
- org.knowm.xchart:xchart:LATEST

The POM is expressed in YAML and repositories and dependencies may be expressed. The Notebook's POM may be split across multiple cells since each repository and dependency is added or merged and dependency resolution is attempted whenever a pom cell is executed. The default/initial Notebook POM is:

repositories:
  - id: central
    layout: default
    url: https://repo1.maven.org/maven2
    snapshots:
      enabled: false

Dependencies may either be expressed in "expanded" YAML or in groupId:artifactId[:extension]:version format:

dependencies:
  - groupId: groupA
    artifactId: groupAartifact1
    version: 1.0
  - groupB:groupB-artifact2:2.0

The specific attributes for repositories and dependencies are defined by the Apache Maven Artifact Resolver classes RemoteRepository (with RepositoryPolicy) and Dependency. (Note that these classes are slightly different than their Maven settings counterparts.)

Whenever a JAR is added to the classpath, it is analyzed to determine if its Maven coordinates can be determined and, if they can be determined, the JAR is added as an artifact to the resolver. The following checks are made before adding the JAR to the JShell classpath:

It is a new, unique path
No previously resolved artifact with the same groupId:artifactId on the classpath
Special heuristics for logging configuration:

a. Ignore commons-logging:commons-logging:jar

b. Allow only one of org.slf4j:jcl-over-slf4j:jar or org.springframework:spring-jcl:jar to be configured

c. Allow only one of org.slf4j:slf4j-log4j12:jar and ch.qos.logback:logback-classic:jar to be configured

Artifacts that fail any of the above checks will be (mostly silently) ignored. Because only the first version of a resolved artifact is ever added to the classpath, the kernel must be restarted if a different version of the same artifact is specified for the change to take effect.

Finally, the kernel provides special processing to add artifacts from Apache Spark binary distributions. The dependencies for Spark SQL and corresponding Scala compiler artifacts for currently available Spark binary distributions as resources. The kernel searches the ${SPARK_HOME} for JARs for which it has the corresponding dependencies and then resolves the dependencies from the ${SPARK_HOME} hierarchy with the heuristics described above.

SQL

The SQL Magic provides the client interface to database servers through JDBC and jOOQ. Its usage is as follows:

    Usage: sql [--[no-]print] [<url>] [<username>] [<password>]
          [<url>]        JDBC Connection URL
          [<username>]   JDBC Connection Username
          [<password>]   JDBC Connection Password
          --[no-]print   Print query results.  true by default

For example:

%%sql jdbc:mysql://127.0.0.1:33061/epg?serverTimezone=UTC
SELECT * FROM schedules LIMIT 3;

airDateTime	stationID	json	duration	md5	programID
1533945600	10139	{ "programID" : "EP009370080215", "airDateTime" : "2018-08-11T00:00:00Z", "duration" : 3600, "md5" : "S1UDH1R60Eagc1E3V5Qslw", "audioProperties" : [ "cc" ], "ratings" : [ { "body" : "USA Parental Rating", "code" : "TVPG" } ] }	3600	S1UDH1R60Eagc1E3V5Qslw	EP009370080215
1533945600	10142	{ "programID" : "EP006062993248", "airDateTime" : "2018-08-11T00:00:00Z", "duration" : 3600, "md5" : "2FQ8y5PsXl1vtxcmUBeppg", "new" : true, "audioProperties" : [ "cc" ], "ratings" : [ { "body" : "USA Parental Rating", "code" : "TVPG" } ] }	3600	2FQ8y5PsXl1vtxcmUBeppg	EP006062993248
1533945600	10145	{ "programID" : "EP022439260394", "airDateTime" : "2018-08-11T00:00:00Z", "duration" : 1800, "md5" : "mUewfiqM8+dh24WQg2WfpQ", "audioProperties" : [ "cc" ] }	1800	mUewfiqM8+dh24WQg2WfpQ	EP022439260394

The SQL Magic accepts the --print/--no-print options to print or suppress query results. If no JDBC URL is specified, the most recently used connection will be used. The List of most recent jOOQ Queries are stored in $$.sql.queries with $$.sql.results containing the corresponding Results. For example:

%%sql --no-print
SELECT COUNT(*) FROM programs;

%%java
print($$.sql.results.get(0));

count(*)
1024495

MySQL and PostgreSQL JDBC drivers are provided in the Ganymede runtime.

Spark

The spark-session magic is provided to initialize Apache Spark sessions.

    Usage: spark-session [--[no-]enable-hive-if-available] [<master>] [<appName>]
          [<master>]    Spark master
          [<appName>]   Spark appName
          --[no-]enable-hive-if-available
                        Enable Hive if available.  true by default

Its typical usage:

%%spark-session local[*] covid-19
# Optional name/value pairs parsed as Properties

is roughly equivalent to:

var config = new SparkConf();
/*
 * Properties copied to SparkConf instance.
 */
var session =
    SparkSession.builder()
    .config(config)
    .master("local").appName("covid-19")
    .getOrCreate();

The SparkSession can then be accessed in Java and other JVM code with the SparkSession.active() static method.

Other Laguages (JSR 223)

The kernel leverages the java.scripting API to provide groovy, javascript, kotlin, and scala.⁴

Shells

The script magic (with the alias !) may be used to run an operating system command with the remaining code in the cell fed to the Process's standard input. bash, perl, ruby, and sh are provided as aliases for %%!bash, %%!perl, etc., respectively.

Templates

A number of templating languages are supported as magics:

Markdown (CommonMark preprocessed with JMustache)
Apache FreeMarker
Apache Velocity
JMustache
Thymeleaf

The following subsections provide examples of the markdown and thymeleaf magics but the other template magics are similar. Please refer to the installation instructions for discussion of enabling the Hide Input extension so only the template output is displayed in the notebook.

Markdown and JMustache

The template magic markdown provides Markdown processing with JMustache preprocessing:

%%java
import java.util.stream.Stream;

import static java.util.stream.Collectors.toList;

var fib =
    Stream.iterate(new int[] { 0, 1 }, t -> new int[] { t[1], t[0] + t[1] })
    .mapToInt(t -> t[0])
    .limit(10)
    .boxed()
    .collect(toList());

%%markdown
| Index | Value |
| --- | --- |
{{#fib}}| {{-index}} | {{this}} |
{{/fib}}

Index	Value
0	0
1	1
2	1
3	2
4	3
5	5
6	8
7	13
8	21
9	34

Thymeleaf

The template magics thymeleaf and html offer templating with Thymeleaf. All defined Java variables are bound into the Thymeleaf context before evaluation. For example (Java implementation detail removed):

%%java
...
var map = new TreeMap<Ranking,List<Card>>(Ranking.COMPARATOR.reversed());
...
var rankings = Arrays.asList(Ranking.values());
...

%%html
<table>
  <tr th:each="ranking : ${rankings}">
    <th:block th:if="${map.containsKey(ranking)}">
      <th th:text="${ranking}"/><td th:each="card : ${map.get(ranking)}" th:text="${card}"/>
    </th:block>
  </tr>
  <tr><th>Remaining</th><td th:each="card : ${deck}" th:text="${card}"/></tr>
</table>

Would generate:

RoyalFlush	A-♤	K-♤	Q-♤	J-♤	10-♤
StraightFlush	K-♡	Q-♡	J-♡	10-♡	9-♡
FourOfAKind	8-♤	8-♡	8-♢	8-♧	2-♧
FullHouse	A-♡	A-♢	A-♧	K-♢	K-♧
Flush	Q-♢	J-♢	10-♢	9-♢	7-♢
Straight	7-♤	6-♤	5-♤	4-♤	3-♤
ThreeOfAKind	6-♡	6-♢	6-♧	3-♧	4-♧
TwoPair	9-♤	9-♧	7-♡	7-♧	5-♧
Pair	5-♡	5-♢	10-♧	J-♧	2-♢
HighCard	Q-♧	3-♢	4-♢	2-♡	3-♡
Remaining	4-♡	2-♤

Documentation

Javadoc is published at https://allen-ball.github.io/ganymede.

License

Ganymede Kernel is released under the Apache License, Version 2.0, January 2004.

Endnotes

[1] Implemented with Apache Maven Artifact Resolver. ↩

[2] With the built-in Oracle Nashorn engine. ↩

[3] scala is special cased: It requires additional dependencies be specified at runtime and is optimized to be used with Apache Spark. ↩

[4] Ibid. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 602 Commits
README		README
ganymede-client		ganymede-client
ganymede-kernel		ganymede-kernel
ganymede-notebook		ganymede-notebook
ganymede		ganymede
jupyter-client		jupyter-client
jupyterlab-client		jupyterlab-client
src/main/javadoc		src/main/javadoc
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

License

allen-ball/ganymede

Folders and files

Latest commit

History

Repository files navigation

Ganymede: Jupyter Notebook Java Kernel

Installation

Features and Usage

Java

Magics

Dependency and Classpath Management

SQL

Spark

Other Laguages (JSR 223)

Shells

Templates

Markdown and JMustache

Thymeleaf

Documentation

License

Endnotes

About

Topics

Resources

License

Stars

Watchers

Forks

Languages