I am trying to convert the AMPDS2 dataset to work with the REDD dataset #965

NortonGuilherme · 2023-03-19T14:02:41Z

I am trying to convert the AMPDS2 dataset to work with the REDD dataset, so that I can test it by training it on REDD. I followed the user guide (https://github.com/nilmtk/nilmtk/blob/master/docs/manual/user_guide/disaggregation_and_metrics.ipynb) and imported both the REDD and AMPDS2 datasets successfully. However, I am encountering issues with the "predict" function.

I am getting an error at this point in the code:

gt[i][meter] = next(meter.load(physical_quantity = 'power', ac_type = 'active', sample_period=sample_period))

I am not sure how to modify this to work with AMPDS2 and REDD datasets. Is there a way to convert the AMPDS2 dataset to work with REDD? Or could you provide guidance on how to modify this code to work with both datasets?

Here is the code I have used so far:

import nilmtk
from nilmtk import DataSet
from future import print_function, division
import time
import pytz
import os
from matplotlib import rcParams
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from six import iteritems

from nilmtk import DataSet, TimeFrame, MeterGroup, HDFDataStore
from nilmtk.legacy.disaggregate import CombinatorialOptimisation, FHMM
import nilmtk.utils

Load REDD dataset

train = DataSet('redd.h5')
train.set_window(start="2011-04-21", end="2011-04-30")
elec = train.buildings[1].elec

Load AMPds2 dataset

ampds2 = DataSet('AMPds2.h5')
ampds2.set_window(start="2013-10-01", end="2013-11-05")
test_elec = ampds2.buildings[1].elec

Function to predict

def predict(clf, test_elec, sample_period, timezone):
pred = {}
gt= {}

for i, chunk in enumerate(test_elec.mains().load(physical_quantity = 'power', ac_type = 'apparent', sample_period=sample_period)):
    chunk_drop_na = chunk.dropna()
    pred[i] = clf.disaggregate_chunk( chunk_drop_na)
    gt[i]={}

    for meter in test_elec.submeters().meters:
        gt[i][meter] = next(meter.load(physical_quantity = 'power', ac_type = 'active', sample_period=sample_period))
    gt[i] = pd.DataFrame({k:v.squeeze() for k,v in iteritems(gt[i]) if len(v)}, index=next(iter(gt[i].values())).index).dropna()
    
gt_overall = pd.concat(gt)
gt_overall.index = gt_overall.index.droplevel()
pred_overall = pd.concat(pred)
pred_overall.index = pred_overall.index.droplevel()

gt_overall = gt_overall[pred_overall.columns]

gt_index = gt_overall.index.tz_convert(timezone)
pred_index = pred_overall.index.tz_convert(timezone)
common_index = gt_index.intersection(pred_index)

gt_overall = gt_overall.loc[common_index]
pred_overall = pred_overall.loc[common_index]
appliance_labels = [m for m in gt_overall.columns.values]
gt_overall.columns = appliance_labels
pred_overall.columns = appliance_labels
return gt_overall, pred_over

classifiers = {'CO':CombinatorialOptimisation(), 'FHMM':FHMM()}
predictions = {}
sample_period = 1800

os.environ['OMP_NUM_THREADS'] = '8'
for clf_name, clf in classifiers.items():
print("*"20)
print(clf_name)
print("" *20)
start = time.time()
# Note that we have given the sample period to downsample the data to 1 minute.
# If instead of top_5 we wanted to train on all appliance, we would write
# fhmm.train(train_elec, sample_period=60)
clf.train(top_5_train_elec, sample_period=sample_period)
end = time.time()
print("Runtime =", end-start, "seconds.")
gt, predictions[clf_name] = predict(clf, test_elec, sample_period, train.metadata['timezone'])

CO

Training model for submeter 'ElecMeter(instance=5, building=1, dataset='REDD', appliances=[Appliance(type='fridge', instance=1)])'
C:\Users\norto\miniconda3\envs\nilmtk_env\lib\site-packages\sklearn\cluster_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=2.
warnings.warn(
C:\Users\norto\miniconda3\envs\nilmtk_env\lib\site-packages\sklearn\cluster_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=2.
warnings.warn(
Training model for submeter 'ElecMeter(instance=9, building=1, dataset='REDD', appliances=[Appliance(type='light', instance=1)])'
C:\Users\norto\miniconda3\envs\nilmtk_env\lib\site-packages\sklearn\cluster_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1.
warnings.warn(
C:\Users\norto\miniconda3\envs\nilmtk_env\lib\site-packages\sklearn\cluster_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1.
warnings.warn(
Training model for submeter 'ElecMeter(instance=8, building=1, dataset='REDD', appliances=[Appliance(type='sockets', instance=2)])'
C:\Users\norto\miniconda3\envs\nilmtk_env\lib\site-packages\sklearn\cluster_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=2.
warnings.warn(
C:\Users\norto\miniconda3\envs\nilmtk_env\lib\site-packages\sklearn\cluster_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=2.
warnings.warn(
Training model for submeter 'ElecMeter(instance=11, building=1, dataset='REDD', appliances=[Appliance(type='microwave', instance=1)])'
Training model for submeter 'MeterGroup(meters=
ElecMeter(instance=10, building=1, dataset='REDD', appliances=[Appliance(type='washer dryer', instance=1)])
ElecMeter(instance=20, building=1, dataset='REDD', appliances=[Appliance(type='washer dryer', instance=1)])
)'
Loading data for meter ElecMeterID(instance=10, building=1, dataset='REDD')
C:\Users\norto\miniconda3\envs\nilmtk_env\lib\site-packages\sklearn\cluster_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1.
warnings.warn(
C:\Users\norto\miniconda3\envs\nilmtk_env\lib\site-packages\sklearn\cluster_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1.
warnings.warn(
Loading data for meter ElecMeterID(instance=20, building=1, dataset='REDD')
Done loading data all meters for this chunk.
Done training!
Runtime = 2.4630203247070312 seconds.
Estimating power demand for 'ElecMeter(instance=5, building=1, dataset='REDD', appliances=[Appliance(type='fridge', instance=1)])'
Estimating power demand for 'ElecMeter(instance=9, building=1, dataset='REDD', appliances=[Appliance(type='light', instance=1)])'
Estimating power demand for 'ElecMeter(instance=8, building=1, dataset='REDD', appliances=[Appliance(type='sockets', instance=2)])'
Estimating power demand for 'ElecMeter(instance=11, building=1, dataset='REDD', appliances=[Appliance(type='microwave', instance=1)])'
Estimating power demand for 'MeterGroup(meters=
ElecMeter(instance=10, building=1, dataset='REDD', appliances=[Appliance(type='washer dryer', instance=1)])
ElecMeter(instance=20, building=1, dataset='REDD', appliances=[Appliance(type='washer dryer', instance=1)])
)'
C:\Users\norto\miniconda3\envs\nilmtk_env\lib\site-packages\sklearn\cluster_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=8.
warnings.warn(
C:\Users\norto\miniconda3\envs\nilmtk_env\lib\site-packages\sklearn\cluster_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=8.
warnings.warn(
C:\Users\norto\miniconda3\envs\nilmtk_env\lib\site-packages\nilmtk\nilmtk\feature_detectors\cluster.py:70: ConvergenceWarning: Number of distinct clusters (1) found smaller than n_clusters (2). Possibly due to duplicate points in X.
k_means.fit(X)

KeyError Traceback (most recent call last)
Cell In[8], line 17
15 end = time.time()
16 print("Runtime =", end-start, "seconds.")
---> 17 gt, predictions[clf_name] = predict(clf, test_elec, sample_period, train.metadata['timezone'])

Cell In[5], line 24, in predict(clf, test_elec, sample_period, timezone)
21 pred_overall.index = pred_overall.index.droplevel()
23 # Having the same order of columns
---> 24 gt_overall = gt_overall[pred_overall.columns]
26 #Intersection of index
27 gt_index = gt_overall.index.tz_convert(timezone)

File ~\miniconda3\envs\nilmtk_env\lib\site-packages\pandas\core\frame.py:3001, in DataFrame.getitem(self, key)
2999 if is_iterator(key):
3000 key = list(key)
-> 3001 indexer = self.loc._convert_to_indexer(key, axis=1, raise_missing=True)
3003 # take() does not accept boolean indexers
3004 if getattr(indexer, "dtype", None) == bool:

File ~\miniconda3\envs\nilmtk_env\lib\site-packages\pandas\core\indexing.py:1285, in _NDFrameIndexer._convert_to_indexer(self, obj, axis, is_setter, raise_missing)
1282 else:
1283 # When setting, missing keys are not allowed, even with .loc:
1284 kwargs = {"raise_missing": True if is_setter else raise_missing}
-> 1285 return self._get_listlike_indexer(obj, axis, **kwargs)[1]
1286 else:
1287 try:

File ~\miniconda3\envs\nilmtk_env\lib\site-packages\pandas\core\indexing.py:1091, in _NDFrameIndexer._get_listlike_indexer(self, key, axis, raise_missing)
1088 else:
1089 keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
-> 1091 self._validate_read_indexer(
1092 keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
1093 )
1094 return keyarr, indexer

File ~\miniconda3\envs\nilmtk_env\lib\site-packages\pandas\core\indexing.py:1175, in _NDFrameIndexer._validate_read_indexer(self, key, indexer, axis, raise_missing)
1173 if missing:
1174 if missing == len(indexer):
-> 1175 raise KeyError(
1176 "None of [{key}] are in the [{axis}]".format(
1177 key=key, axis=self.obj._get_axis_name(axis)
1178 )
1179 )
1181 # We (temporarily) allow for some missing keys with .loc, except in
1182 # some cases (e.g. setting) in which "raise_missing" will be False
1183 if not (self.name == "loc" and not raise_missing):

KeyError: "None of [Index([ ElecMeter(instance=5, building=1, dataset='REDD', appliances=[Appliance(type='fridge', instance=1)]),\n ElecMeter(instance=9, building=1, dataset='REDD', appliances=[Appliance(type='light', instance=1)]),\n ElecMeter(instance=8, building=1, dataset='REDD', appliances=[Appliance(type='sockets', instance=2)]),\n ElecMeter(instance=11, building=1, dataset='REDD', appliances=[Appliance(type='microwave', instance=1)]),\n MeterGroup(meters=\n ElecMeter(instance=10, building=1, dataset='REDD', appliances=[Appliance(type='washer dryer', instance=1)])\n ElecMeter(instance=20, building=1, dataset='REDD', appliances=[Appliance(type='washer dryer', instance=1)])\n)],\n dtype='object')] are in the [columns]"

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I am trying to convert the AMPDS2 dataset to work with the REDD dataset #965

I am trying to convert the AMPDS2 dataset to work with the REDD dataset #965

NortonGuilherme commented Mar 19, 2023

I am trying to convert the AMPDS2 dataset to work with the REDD dataset #965

I am trying to convert the AMPDS2 dataset to work with the REDD dataset #965

Comments

NortonGuilherme commented Mar 19, 2023

Load REDD dataset

Load AMPds2 dataset

Function to predict