Skip to content

Data Conventions

jimdale edited this page May 3, 2024 · 1 revision

Data conventions

viewser relies on conventions to ensure that data is accepted by various packages.

Time-unit and space-unit double-indexed pandas dataframes

The main data format used both by the ViEWS team is the doubly-indexed pandas data frame, with the first and most slowly-varying index being a time unit (usually month or year) and the second and most quickly-varying index being a space unit, usually country or priogrid. The index is ordered by time-unit, and by space-unit within each time-unit. Note that not all countries exist for all time units


 month_id   country_id │ variable_a │ variable_b │ variable_c
 ──────────────────────┼────────────┼────────────┼────────────
 121          10       │ 0.1        │ a          │ 1
              11       │ 0.1        │ a          │ 2
              14       │ 0.2        │ b          │ 5
              20       │ 0.2        │ b          │ 3
 122          10       │ 0.3        │ a          │ 2
              15       │ 0.3        │ b          │ 3
              20       │ 0.4        │ c          │ 2

                       ...

Data with this indexing scheme is returned by viewser, and is expected by most ViEWS 3 tools, including stepshift.views.StepshiftedModels and views_partitioning.DataPartitioner.