Skip to content

Review of existing data formats

Rick Lupton edited this page Mar 26, 2018 · 3 revisions

We need to define a data format for floWeaver that can represent flow data of many kinds. It should not be specific to a particular domain: it should fit as well with describing flows of steel through a steelworks as with the relationship between agricultural land grades and their output.

  • e!Sankey
  • SankeyMatic
  • Circular Sankey
  • d3-sankey format
  • d3-sankey-diagram format
  • Unpublished "Captain Sankey" format
  • ...?

Existing Sankey diagram formats: without layout

These tools calculate the positions of nodes automatically, so their input data includes only the logical connections.

SankeyMatic

SankeyMatic is an online tool for drawing Sankey diagrams based the d3 Sankey library. It uses a simple text-based format that can be easily generated in Excel by combining columns:

A [8] C
B [4] C

Colours and opacity can be customised, for example to set individual flow colours and opacity:

A [8] C #DD0000.9
B [4] C #00CC00.4
D [4] C

and to set node colours, and optionally the colour of adjoining flows:

:C #990099 >>
A [8] C
B [4] C
C [6] D
C [6] E

d3-sankey format

d3-sankey is the original d3 Sankey diagram plugin. It expects data in JSON format with a list of nodes and a list of links.

The nodes must be objects but they have no required properties -- the user can choose whether to identify them by their numeric index, an id, or another property.

The links must be objects with the properties:

  • link.source - the link's source node
  • link.target - the link's target node
  • link.value - the link's numeric value

The Sankey plugin assigns these properties:

  • node.sourceLinks - the array of outgoing links which have this node as their source
  • node.targetLinks - the array of incoming links which have this node as their target
  • node.value - the node’s value; the sum of link.value for the node’s incoming links
  • node.index - the node’s zero-based index within the array of nodes
  • node.depth - the node’s zero-based graph depth, derived from the graph topology
  • node.height - the node’s zero-based graph height, derived from the graph topology
  • node.x0 - the node’s minimum horizontal position, derived from node.depth
  • node.x1 - the node’s maximum horizontal position (node.x0 + sankey.nodeWidth)
  • node.y0 - the node’s minimum vertical position
  • node.y1 - the node’s maximum vertical position (node.y1 - node.y0 is proportional to node.value)
  • link.y0 - the link’s vertical starting position (at source node)
  • link.y1 - the link’s vertical end position (at target node)
  • link.width - the link’s width (proportional to link.value)
  • link.index - the zero-based index of link within the array of links

d3-sankey-diagram format

d3-sankey-diagram is another d3-based Sankey diagram plugin that is used by default within floWeaver. It adds some extra features over the original d3-sankey plugin that make its data model a bit more complicated:

  • Link types: you can have multiple types of link between the same two nodes, so each link has a type property
  • Node directions: to handle loops, nodes have a direction property. Currently only "left" and "right" are handled, but in principle this could be extended to include other directions.
  • Ports: in simple Sankey diagrams, links enter the left side of nodes in one block, and leave the right side in one block. But sometimes you want more control, such as to group links by type (see example) or to expand detail. This is done by optionally replacing the link source and target with (source, sourcePort) and (target, targetPort) pairs: all links entering/leaving the same port are grouped together.

Existing Sankey diagram formats: with positions

The tools above focus on automatic layout of Sankey diagrams, so their formats do not include details of node coordinates etc. But once the diagram is first drawn you usually want to edit and tweak it, so it is also necessary to save Sankey diagram data with coordinates, curvatures, text placements etc.

e!Sankey

e!Sankey is commercial software for drawing Sankey diagrams.

TODO: how does it structure saved data?

Circular Sankey

Circular Sankey is a web-based tool for creating Sankey diagrams. It reads data in a text-based format with a sheet for nodes and a sheet for links.

Nodes:

[name] [color] [orientation] [width] [height] [x_position] [y_position]
[A] [(0,191,255)] [270] [40.00] [35.31] [98] [242]
[B] [(0,191,255)] [0] [40.00] [82.40] [320] [68]

Links:

[source] [value] [flow color] [ab] [target]
[A]  [15]  [(0,191,255)] [ab] [B]

where

  • a is: 'C': Cubic Bezier ...
  • b is: 'n': no arrow, '1': arrow type 1, '2': arrow type 2, ...

The same columns can also be read from an Excel workbook with two sheets.

Draft "Captain Sankey" format

This is a JSON based format. For example, a node:

    {
      "id": "A",
      "metadata": {},
      "geometry": {
        "x": 599.25,
        "y": 122.31878372426897,
        "w": 0,
        "h": 50
      },
      "title": {
        "label": "Use",
        "vanchor": "top",
        "hanchor": "left"
      },
      "style": {
        "hidden": false,
        "direction": "r"
      }
    }

and a link:

    {
      "source": "A",
      "target": "B",
      "type": "Scrap",
      "data": {
        "value": 375
      },
      "head": {
        "x": 0,
        "y": 13.125,
        "t": 18.75,
        "r": 20
      },
      "tail": {
        "x": 0,
        "y": 9.375,
        "t": 18.75,
        "r": 20
      },
      "style": {
        "color": "#7BC"
      }
    },

The geometry in geometry, head and tail are optional before the diagram has been laid out.

Existing flow data formats

These tools focus on modelling rather than visualisation (although they can also draw Sankey diagrams).

STAN

STAN is a tool for Material Flow Analysis. It stores its data in a compressed XML-based format. It includes other data such as units and material types, but the relevant parts are the Process and Flow objects.

A Process has explicit ProcessInputs and ProcessOutputs:

  <Process>
    <ProcessID>4</ProcessID>
    <MfaSystemID>1</MfaSystemID>
    <MainProcessID>1</MainProcessID>
    <Position>3</Position>
    <WithStock>false</WithStock>
    <ApplyAllTKs>false</ApplyAllTKs>
    <ProcessType>2</ProcessType>
    <CalcBalance>true</CalcBalance>
    <MatchCode>Ki</MatchCode>
    <Name>Kiln</Name>
  </Process>
  <ProcessInput>
    <ProcessInputID>5</ProcessInputID>
    <ProcessID>4</ProcessID>
    <Position>1</Position>
  </ProcessInput>
  <ProcessOutput>
    <ProcessOutputID>4</ProcessOutputID>
    <ProcessID>4</ProcessID>
    <Position>1</Position>
  </ProcessOutput>
  <ProcessOutput>
    <ProcessOutputID>5</ProcessOutputID>
    <ProcessID>4</ProcessID>
    <Position>2</Position>
  </ProcessOutput>
  <ProcessOutput>
    <ProcessOutputID>8</ProcessOutputID>
    <ProcessID>4</ProcessID>
    <Position>3</Position>
  </ProcessOutput>

The Flow links a ProcessInput and a ProcessOutput, rather than directly linking Processes:

  <Flow>
    <FlowID>8</FlowID>
    <ProcessInputID>9</ProcessInputID>
    <ProcessOutputID>8</ProcessOutputID>
    <Position>8</Position>
    <MatchCode>F8</MatchCode>
    <Name>Clinker</Name>
    <NullValueText>?</NullValueText>
  </Flow>

This is similar to the idea of "ports" discussed above.

The actual Flow and Stock values are stored separately:

  <FlowValue>
    <FlowValueID>8</FlowValueID>
    <FlowID>8</FlowID>
    <FlowLayerID>1</FlowLayerID>
    <PeriodID>1</PeriodID>
    <MFNumUnitID>19</MFNumUnitID>
    <MFDenomUnitID>16</MFDenomUnitID>
    <MFUNumUnitID>19</MFUNumUnitID>
    <MFUDenomUnitID>16</MFUDenomUnitID>
    <MFInput>7197</MFInput>
    <MFUncertInput>647.73</MFUncertInput>
    <MFCalc>7479.19503893667</MFCalc>
    <MFUncertCalc>548.41918967126992</MFUncertCalc>
  </FlowValue>

  <Stock>
    <StockID>2</StockID>
    <ProcessID>4</ProcessID>
    <FlowLayerID>1</FlowLayerID>
    <PeriodID>1</PeriodID>
  </Stock>

It appears to separately store Shapes for each Process and Flow:

  <Shape>
    <ShapeID>12</ShapeID>
    <DiagramID>0</DiagramID>
    <ShapeGuid>0fc4ae76-df48-4fda-8891-162ced2cfc43</ShapeGuid>
    <ShapeType>1</ShapeType>
    <IsDeleted>false</IsDeleted>
    <ProcessID>4</ProcessID>
  </Shape>

  <Shape>
    <ShapeID>30</ShapeID>
    <DiagramID>0</DiagramID>
    <ShapeGuid>4ec19052-b95e-4903-a2d6-9b269dd2c5f3</ShapeGuid>
    <ShapeType>6</ShapeType>
    <IsDeleted>false</IsDeleted>
    <FlowID>8</FlowID>
  </Shape>
  • ...?