Releases: spark-notebook/spark-notebook
Spark 2.x - better UX & stability - notebook versioning - sbt project generation -etc
Spark 2.x - better UX & stability - notebook versioning - sbt project generation - etc
Note: for
Spark < 2.0
, see v0.7.0-pre2
This is (likely) the last release which supports
Scala 2.10
.
Various fixes and improvements, among others:
- redesigned UI to be more user-friendly (minimalistic UX, cell context menu, improve sidebar)
- better
Scala 2.11
support (code autocompletion; fixed kernel failures; improvedcustomDeps
fetching) - Use
coursier
for faster deps resolving during the build - code cleanup and other stability fixes
- ease usage of Mesos in Spark 2.1: it now includes the
spark-mesos
lib by default (added-Dwith.mesos=true
build option)
New features:
- SBT project generation from a notebook (experimental)
- Notebook edit versioning, and storage on Git (experimental)
- Viewer-only mode - a build option which makes notebooks not editable
Removed:
- removed
:dp, :cp, :local-repo, :remote-repo
commands (useEdit -> Notebook metadata
instead) - removed old plotting libs:
Rickshawts
,TauChart
,LinePlot
,Bokeh
(all superseeded by Plotly)
Following v0.7.0 for Spark pre 2
THIS IS FOR SPARK PRE 2
Based on v0.7.0, it uses its fixes, optimization and most new features unless spark 2 specific (SparkSession
for instance).
Spark 2.0 - Fixes - Better Viz
Note: for
Spark < 2.0
, see v0.7.0-pre2
- Add spark 2 support
- Many fixes for better stability (more lenient for user input, avoid kernels crash)
- Lot of optimization for the viz, also replaced most Dimple with C3.
- Introducing Plotly.js wrappers
- Better debian support
- Greater download as Markdown as zip with charts rendered as PNG referred in a images folder
- Better doc available at all time in the
doc
folder - Cell dirtiness detection based on variables dependency graph
- New default port to 9001 to avoid conflict with HDFS
- Removed Wisp and Highcharts (in favor of plotly.js)
- Code cleanup
Last stable release supporting Spark 1.6 and earlier
for spark <=1.6
, use this release or the stale/spark-1.6-and-older branch
Early availability of awesomeness before next minor
Aside the stabilization with the all the bugs fixed, new features are:
- improvement of the PivotChart
- improvement of completion with type args and more
- better sampling for automatic/default plots
- added tests and travis
- spark jobs are tracked by cells, cells have now ids
- hardened the observables init
- improved scala 2.11 support
- improve Flow widget, added Custom box taking scala code directly as logic
- job for a cell can be cancelled
- read_only mode
- notebooks are now sync wrt cell output (including reactive), but not cell add/dels and cell'content changes
- panels have landed:
- general spark monitor
- defined variables and types
- chat room
- cleaner docker build
- added taucharts viz lib support
- added
-Dguava.version
to support integration tools like cassandra connector from 1.5+
Again, we'd like to thank the community for their work and their support!
YOU'RE ALL AWESOME!
Better Plotting, Hardening and Spark versions support include nightly builds
- build information in the UI
- better https support for web socket connections
- use the presentation compiler for completion
- fix restart kernel
- log the server/spark forwarded to the browser's console
- chart are plotting 25 entries by default (extendable using maxPoints) but this cap is changeable using a reactive HTML input
- spark jobs' monitor/progress bar is now always live (still in progress, needs some UI hardening and enhancements)
- graph plots are reactive
- table chart using dynatable
- HTTP proxy support for dependency managements
- generic spark version support in a best effort way for any new spark versions (including nightly builds)
- nightly build repos can be detected and injected with the
spark.resolver.search
jvm property set totrue
- presentation mode added, including UI tuning via
- variables environment support in metadata: loca repo, vm arguments and spark configuration
- Better
DataFrame
viz support PivotChart
tuning, including viz and state managment- support
%%
in the deps definition to take care of the used scala version - support the current spark version in the deps definition using
_
like"org.apache.spark %% spark-streaming-kafka % _"
- added
user_custom.css
for users' extensions or redefinings of the CSS - report the Spark UI link on the left hand side of the notebook
- URL Query parameter
action=recompute_now
to automatically recompute everything at loading time - default logging less verbose
- added CSV downloader from DataFrame capability (directly in HDFS using spark-csv)!
- new C3 based widgets
- new GeoChart widget -- support for JTS geometries, GeoJSON and String
- new Flow for visual flow management using boxes and arrows (needs hardening and improvements)
- UI cleaning (menubars, ...)
- kernel auto starts can be disabled (useful for view only mode like presentation):
autostartOnNotebookOpen
in conf - UI shows when kernel isn't ready
- died kernel are now reported throughout the UI too
- added
manager.notebooks.override
to override and merge default values with metadata provided before starting a notebook - new examples notebooks:
- Machine Learning
- C3
- Geospatial
- Flow
- more documentation (not enough...)
Special thanks to @vidma for his amazing work on many new and killing features! 👏 👏 👏
Spark 1.5.0 and PivotChart, and more
- ADD_JARS support (add jars to context)
- NB metadata saved at ok
- fix 2.11 :dp and :cp
- hide tachyon ui
- YARN_CONF_DIR support
- customArgs in metadata (application.conf, ...) → adding JVM arguments to spawned process for a notebook
- spark 1.5.0 support
- tachyon 0.7.1 integration for spark 1.5.0
- added reactive slider + example in
misc
- old X and Y renaming of tuples' field name discarded, back to _1, _2
- example of cassandra connector (@maasg)
- reactive
widgets.PivotChart
support for simpler analysis of scala data - fixes fixes fixes
Hardening, Extended viz (DataFrame, Geo, Graph), Printing, Doc and Fixes
- a loooooot of fixes \o/
- a loooooot of documentation including on how to install and run the spark notebook on distros and clusters (yarn, mapr, EMR, ...)
- support for HADOOP_CONF_DIR and EXTRA_CLASSPATH to include spark cluster specific classpath entries, like hadoop conf dir, but also lzo jar and
so on. This updates both the classpath of the notebook server and the notebooks processes. - the custom repos specified in the metadata or application.conf have an higher priority
- support for spark 1.4.1
- mesos is added to the docker distro
- code is now run asynchronously, allowing the introduction of the flame button, that can cancel all running spark jobs
- added many new notebooks, included @Data-Fellas ML and ADAM examples or anomaly detection by @radek1st
- LOGO :-D
- added :markdown, :javascript, :jpeg, :png, :latex, :svg, :pdf, :html, :plain, :pdf that support interpolation (using scala variables)
- clusters can be deleted from the ui
- spark packages repo is available by default
- spark package format is now supported :
groupId:artifactId:version
- added
with.parquet
modifier to include parquet deps spark.app.name
uses the name of the notebook by default (easier to track in clusters)- Dynamic table renderer for
DataFrame
- Added a users sections in the README
- Tachyon can be disabled by setting
manager.tachyon.enabled
tofalse
- support for printing from the browser (CTRL+P)
- added
:ldp
for local dependency definitions (so not added to spark context) - Graph (nodes-edges) can be plotted easily using the
Node
andEdge
types → seeviz/Graph Plots.snb
- Geo data viz added using latlon data → see
viz/Geo Data (Map).snb
- Enhanced the twitter stream example to show tweets in a map
- Enhanced the WISP examples including Histogram, BoxPot. Wisp plots can now be build using the lower api for Highchart
- Adding the commons lib in the spark context to enable extended viz using spark jobs
Bringing shareable sessions and Tachyon support
Main
Besides quite some fixes, this version brings two major features:
Session
When opening a notebook, a session is created allowing anybody to join it.
But mostly, the user can now close the tab and get back to the analysis later by reopening the notebook.
Very helpful, f.i., when long processes are launched.
Tachyon
The tachyon support has been integrated with several functionalities:
- connecting to a provided (configuration) tachyon cluster
- starting a tachyon local embed cluster if none is available in the config
- a small-ish UI-ish on the right hand side of the notebook panel that allows the user to browse the content hence, the persisted computations or even simply files
In the first and second points, all notebooks will be automatically configured (read the SparkContext
) to use the available tachyon cluster without requiring any action from the user
Others
There are other stuffs that are worth mentioning:
- the notebooks directory by default is now under the root folder anymore
- the parquet deps have been discarded which could have been a pain with previous releases
- the logs are now per session/notebook, so it's now even easier to track the job
- the background logger (the yellow box on the left) has been removed since it didn't brought much info, but was interacting badly in some cases with the closure serializer...
- support for
https
- more information is provided when errors occur, specially when the code is not complete (missing parenthesis)
- execution time is now included in each result block
- scala 2.10 now uses SBT to download deps, scala 2.11 is still using aether at the moment
HADOOP_CONF_DIR
has to be used to pass the hadoop conf dir when using Yarn