Skip to content

Apache Superset step by step hands-on and review using open data related to Accidents involving kids in France in 2019

Notifications You must be signed in to change notification settings

terman37/FrenchAccidents_Viz_With_Apache_SuperSet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Apache SuperSet analysing French kids accidents

Apache Superset is a modern data exploration and visualization platform allowing to create nice and interactive dashboard for data visualization.

I wanted to test its capabilities through analysis of open dataset regarding road accidents in France (especially involving kids)

Installation

  • Version: 0.999.0dev

Run with docker compose

Clone Superset's repo in your terminal with the following command:

git clone https://github.com/apache/superset.git

Once that command completes successfully, you should see a new superset folder in your current directory.

cd superset
docker-compose up

Connect and create user

navigate to:127.0.0.1:8088 and login as admin / admin

Pretty straight forward: in the top right corner: settings / users

add_user

Used Dataset

Description

Dataset found here on data.gouv.fr

Complete field description here in french only, sorry :-(

Database

Launch

from docker-compose in my repo

docker-compose up

Create MySQL user

connect to container

docker ps
docker exec -it <CONTAINER ID> /bin/bash
mysql -u root -p

from mysql shell:

CREATE USER 'user'@'%' IDENTIFIED WITH mysql_native_password BY 'password';
GRANT ALL PRIVILEGES ON *.* TO 'user'@'%';
FLUSH PRIVILEGES;

Datas preprocessing

I made a script to import datas to mysql:

python src/import.py

import_to_mysql

Give Superset access to datas

Connection driver

You might need to install connection driver if you want to access particular database. Follow superset instructions

Add Database connection

in menu: data/databases click on + DATABASE and fill in necessary informations:

Add Table as dataset

in menu: data/datasets click on `+ DATASET and fill in necessary informations:

add_datasetdatasets

Add calculated field

In the datasets view click on edit button at the end of the dataset row

add_calculated_column

Add Virtual Dataset

If you need to have more than one table, you manually create a SQL query (using SQL Lab Editor)

Then by clicking Explore, you will be able to save it as a virtual dataset and use it to create reports

Dashboards & Charts

Available Visualizations

A lot of nice visualizations are available, lets' checkout some of them

FilterBox

BigNumber

Table

Map ScatterPlot using MapBox

Before using any visualization using MapBox you need to specify you token to access MapBox API

Create an account on Mapbox.com and create a token.

Copy the token and add it in your superset .env file

cd superset
echo MAPBOX_API_KEY=<you token> > docker/.env

Area Chart

Bar Chart

HeatMap

Dashboards

Dashboard UI is quite simple:

  • create your layout with components: row, column, tabs ...

  • Place charts on it

  • Resize elements

Comments on Superset

Apache Superset is really a nice and easy tool for data visualization.

It's super easy to setup and feature some advanced capabilities:

  • Support for many databases using SQLAlchemy: MySQL, Postgres, Oracle, MS SQL Server, MariaDB, Redshift ...

  • user/roles/permission granularity

  • Can use OpenID, Oauth, LDAP authentication

  • Interactive SQL editor allowing full control with exposed datas

  • Can perform some time series predictive analysis using fbprophet

  • Custom visualization can be developped and added (not tested)

  • Can run on kubernetes and scale with needs.

Maybe, the less positive point for me, is the fact that chart creation UI is not always coherent. Depending on the chosen visualization, the visual settings for example will be in the DATA or in the CUSTOMIZE tab. Or you may find some grouping options not available on some chart types whereas it would have make perfect sense.

I would say, even if it's not yet at Tableau level, it can be an alternative in some use cases, considering the fact Superset is open source.

About

Apache Superset step by step hands-on and review using open data related to Accidents involving kids in France in 2019

Topics

Resources

Stars

Watchers

Forks

Languages