Skip to content

Extract, clean, resample and enumerate load profile and survey data from a local file hierarchy retrieved from the South African Domestic Electrical Load (DEL) database.

Notifications You must be signed in to change notification settings

wiebket/delprocess

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

95 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOI
DEL Logo

South African
Domestic Electrical Load Study
Data Processing

About this package

This package contains tools to process primary data from the South African Domestic Electric Load (DEL) database. It requires access to csv or feather file hierarchy extracted from the original General_LR4 database produced during the NRS Load Research study.

Notes on data access:

Data can be accessed and set up as follows:

  1. From Data First at the University of Cape Town (UCT). On site access to the complete 5 minute data is available through their secure server room.
  2. For those with access to the original SQL database, delretrieve can be used to retrieve the data and create the file hierarchy for further processing.
  3. Several datasets with aggregated views are available online and can be accessed for academic purposes. If you use them, you will not need to install this package.

Package structure

delprocess
    |-- delprocess
        |-- data
            |-- geometa
                |-- 2016_Boundaries_Local
                |-- ...
            |-- specs
                |-- app_broken_00.txt
                |-- appliance_00.txt
                |-- appliance_94.txt	
                |-- behaviour_00.txt
                |-- binned_base_00.txt
                |-- binned_base_94.txt
                |-- dist_base_00.txt
                |-- dist_base_94.txt	
        |-- __init.py__
        |-- command_line.py
        |-- loadprofiles.py
        |-- plotprofiles.py
        |-- support.py
        |-- surveys.py
    |-- MANIFEST.in
    |-- README.md
    |-- setup.py

Setup instructions

Ensure that python 3 is installed on your computer. A simple way of getting it is to install it with Anaconda. Once python has been installed, the delprocess package can be installed.

  1. Clone this repository from github.
  2. Navigate to the root directory (delprocess) and run python setup.py install (run from Anaconda Prompt or other bash with access to python if running on Windows).
  3. You will be asked to confirm the data directories that contain your data. Paste the full path name when prompted. You can change this setting at a later stage by modifying the file your_home_dir/del_data/usr/store_path.txt .

This package only works if the data structure is exactly like the directory hierarchy in del_data if created with the package delretrieve:

your_home_dir/del_data
    |-- observations
        |-- profiles
            |-- raw
                |-- unit
                    |-- GroupYear
        |-- tables
            |-- ...
    |-- survey_features
    |-- usr
        |-- specs (automatically copied from delprocess/data/specs during setup)
        |-- store_path.txt (generated during setup)

Data processing

This package runs a processing pipeline from the command line or can be accessed via python directy with import delprocess.

Modules: surveys, loadprofiles, plotprofiles

Timeseries data (DEL Metering data)

From the command line

  1. Execute delprocess_profiles -i [interval] from the command line (equivalent to loadprofiles.saveReducedProfiles())
  2. Options: -s [data start year] and -e [data end year] as optional arguments: if omitted you will be prompted to add them on the command line. Must be between 1994 and 2014 inclusive
  3. Additional command line options: -c or [--csv]: Format and save output as csv files (default feather)

In python

Run delprocess.loadprofiles.saveReducedProfiles()

Additional profile processing methods:

loadRawProfiles(year, month, unit) 
reduceRawProfiles(year, unit, interval)
loadReducedProfiles(year, unit, interval)
genX(year_range, drop_0=False, **kwargs)

Data output

All files are saved in your_home_dir/del_data/resampled_profiles/[interval].

Feather file format

Feather is the devalt format for temporary data storage of the large metering dataset as it is a fast and efficient file format for storing and retrieving data frames. It is compatible with both R and python. Feather files should be stored for working purposes only as the file format is not suitable for archiving. All feather files have been built under feather.__version__ = 0.4.0. If your feather package is of a later version, you may have trouble reading the files and will need to reconstruct them from the raw MSSQL database. Learn more about feather.

Survey data (DEL Survey data)

From the command line

If you know what survey data you want for your analysis, it is easiest to extract it from the command line.

  1. Create a pair of spec files *_94.txt and *_00.txt with your specifications
  2. Execute delprocess_surveys -f [filename] (equivalent to running genS())
  3. Options: -s [data start year] and -e [data end year] as optional arguments: if omitted you will be prompted to add them on the command line. Must be between 1994 and 2014 inclusive.

In python

Import the package to use the following functions:

searchQuestions(searchterm)
searchAnswers(searchterm)
genS(spec_files, year_start, year_end)

The search is not case sensitive and has been implemented as a simple str.contains(searchterm, case=False), searching all the strings of all the Question column entries in the questions.csv data file. The searchterm must be specified as a single string, but can consist of different words separated by whitespace. The search function removes the whitespace between words and joins them, so the order of words is important. For example, 'hot water' will yield results, but 'water hot' will not!

Data output

All files are saved in .csv format in your_home_dir/del_data/survey_features/.

Spec file format

Spec file templates are copied to your_home_dir/del_data/usr/specs during setup. These can be used directly to retrieve standard responses for appliance, behavioural and demographic related questions, or be adapted to create custom datasets from the household survey data.

The spec file is a dictionary of lists and dictionaries. It is loaded as a json file and all inputs must be strings, with key:value pairs separated by commas. The specfile must contain the following keys:

key value
year_range list year range for which specs are valid; must be ["1994", "1999"] or ["2000","2014"]
features list of user-defined variable names, eg. ["fridge_freezer","geyser"]
searchlist list of database question search terms, eg. ["fridgefreezerNumber" ,"geyserNumber"]
transform dict of simple data transformations such as addition. Keys must be one of the variables in the features list, while the transformation variables must come from searchlist, eg. {"fridge_freezer" : "x['fridgefreezerNumber'] - x['fridgefreezerBroken']"}
bins dict of lists specifying bin intervals for numerical data. Keys must be one of the variables in the features list, eg. {"floor_area" : ["0", "50", "80"]}
labels dict of lists specifying bin labels for numerical data. Keys must be one of the variables in the features list, eg. {"floor_area" : ["0-50", "50-80"]}}
cut dict of dicts specifying details of bin segments for numerical data. Keys must be one of the variables in the features list. right indicates whether bins includes the rightmost edge or not. include_lowest indicates whether the first interval should be left-inclusive or not, eg {"monthly_income":{"right":"False", "include_lowest":"True"}}
replace dict of dicts specifying the coding for replacing feature values. Keys must be one of the variables in the features list, eg. {"water_access": {"1":"nearby river/dam/borehole"}}
geo string specifying geographic location detail (can be "Municipality","District" or "Province")

If no transform, bins, labels, cuts, replace or geo is required, the value should be replaced with an empty dict {}.

Creating a custom spec file

To create a custome spec file, the following process is recommended:

  1. Copy an existing spec file template and delete all values (but keep the keys and formatting!)
  2. Use the searchQuestions() function to find all the questions that relate to a variable that you are interested in. Use this to construct your searchlist.
  3. Use the searchAnswers() function to get the responses to your search.
  4. Interrogate the responses to decide if any transform, bins and replacements are needed.
  5. If bins are needed, decided whether labels and cut are required.
  6. Decide whether high level geographic information should be added to the responses and update geo accordingly.
  7. Save the file as name_94.txt or name_00.txt.

NB: Surveys were changed in 2000 and questions vary between the years from 1994 - 1999 and 2000 - 2014. Survey data is thus extracted in two batches and requires two spec files with appropriate search terms matched to the questionaire. For example, the best search term to retrieve household income for the years 1994 - 1999 is 'income', while for 2000 - 2014 it is 'earn per month'.

Acknowledgements

Citation

Toussaint, Wiebke. delprocess: Data Processing of the South African Domestic Electrical Load Study, version 1.01. Zenodo. https://doi.org/10.5281/zenodo.3605422 (2019).

Funding

This code has been developed by the Energy Research Centre at the University of Cape Town with funding from the South African National Energy Development Initiative under the CESAR programme.

Developed by Funded by
ERC Logo Sanedi Logo

About

Extract, clean, resample and enumerate load profile and survey data from a local file hierarchy retrieved from the South African Domestic Electrical Load (DEL) database.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages