Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I am trying to convert the AMPDS2 dataset to work with the REDD dataset #965

Open
NortonGuilherme opened this issue Mar 19, 2023 · 0 comments

Comments

@NortonGuilherme
Copy link

I am trying to convert the AMPDS2 dataset to work with the REDD dataset, so that I can test it by training it on REDD. I followed the user guide (https://github.com/nilmtk/nilmtk/blob/master/docs/manual/user_guide/disaggregation_and_metrics.ipynb) and imported both the REDD and AMPDS2 datasets successfully. However, I am encountering issues with the "predict" function.

I am getting an error at this point in the code:

gt[i][meter] = next(meter.load(physical_quantity = 'power', ac_type = 'active', sample_period=sample_period))

I am not sure how to modify this to work with AMPDS2 and REDD datasets. Is there a way to convert the AMPDS2 dataset to work with REDD? Or could you provide guidance on how to modify this code to work with both datasets?

Here is the code I have used so far:

import nilmtk
from nilmtk import DataSet
from future import print_function, division
import time
import pytz
import os
from matplotlib import rcParams
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from six import iteritems

from nilmtk import DataSet, TimeFrame, MeterGroup, HDFDataStore
from nilmtk.legacy.disaggregate import CombinatorialOptimisation, FHMM
import nilmtk.utils

Load REDD dataset

train = DataSet('redd.h5')
train.set_window(start="2011-04-21", end="2011-04-30")
elec = train.buildings[1].elec

Load AMPds2 dataset

ampds2 = DataSet('AMPds2.h5')
ampds2.set_window(start="2013-10-01", end="2013-11-05")
test_elec = ampds2.buildings[1].elec

Function to predict

def predict(clf, test_elec, sample_period, timezone):
pred = {}
gt= {}

for i, chunk in enumerate(test_elec.mains().load(physical_quantity = 'power', ac_type = 'apparent', sample_period=sample_period)):
    chunk_drop_na = chunk.dropna()
    pred[i] = clf.disaggregate_chunk( chunk_drop_na)
    gt[i]={}

    for meter in test_elec.submeters().meters:
        gt[i][meter] = next(meter.load(physical_quantity = 'power', ac_type = 'active', sample_period=sample_period))
    gt[i] = pd.DataFrame({k:v.squeeze() for k,v in iteritems(gt[i]) if len(v)}, index=next(iter(gt[i].values())).index).dropna()
    
gt_overall = pd.concat(gt)
gt_overall.index = gt_overall.index.droplevel()
pred_overall = pd.concat(pred)
pred_overall.index = pred_overall.index.droplevel()

gt_overall = gt_overall[pred_overall.columns]

gt_index = gt_overall.index.tz_convert(timezone)
pred_index = pred_overall.index.tz_convert(timezone)
common_index = gt_index.intersection(pred_index)

gt_overall = gt_overall.loc[common_index]
pred_overall = pred_overall.loc[common_index]
appliance_labels = [m for m in gt_overall.columns.values]
gt_overall.columns = appliance_labels
pred_overall.columns = appliance_labels
return gt_overall, pred_over

classifiers = {'CO':CombinatorialOptimisation(), 'FHMM':FHMM()}
predictions = {}
sample_period = 1800

os.environ['OMP_NUM_THREADS'] = '8'
for clf_name, clf in classifiers.items():
print("*"20)
print(clf_name)
print("
" *20)
start = time.time()
# Note that we have given the sample period to downsample the data to 1 minute.
# If instead of top_5 we wanted to train on all appliance, we would write
# fhmm.train(train_elec, sample_period=60)
clf.train(top_5_train_elec, sample_period=sample_period)
end = time.time()
print("Runtime =", end-start, "seconds.")
gt, predictions[clf_name] = predict(clf, test_elec, sample_period, train.metadata['timezone'])


CO


Training model for submeter 'ElecMeter(instance=5, building=1, dataset='REDD', appliances=[Appliance(type='fridge', instance=1)])'
C:\Users\norto\miniconda3\envs\nilmtk_env\lib\site-packages\sklearn\cluster_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=2.
warnings.warn(
C:\Users\norto\miniconda3\envs\nilmtk_env\lib\site-packages\sklearn\cluster_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=2.
warnings.warn(
Training model for submeter 'ElecMeter(instance=9, building=1, dataset='REDD', appliances=[Appliance(type='light', instance=1)])'
C:\Users\norto\miniconda3\envs\nilmtk_env\lib\site-packages\sklearn\cluster_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1.
warnings.warn(
C:\Users\norto\miniconda3\envs\nilmtk_env\lib\site-packages\sklearn\cluster_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1.
warnings.warn(
Training model for submeter 'ElecMeter(instance=8, building=1, dataset='REDD', appliances=[Appliance(type='sockets', instance=2)])'
C:\Users\norto\miniconda3\envs\nilmtk_env\lib\site-packages\sklearn\cluster_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=2.
warnings.warn(
C:\Users\norto\miniconda3\envs\nilmtk_env\lib\site-packages\sklearn\cluster_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=2.
warnings.warn(
Training model for submeter 'ElecMeter(instance=11, building=1, dataset='REDD', appliances=[Appliance(type='microwave', instance=1)])'
Training model for submeter 'MeterGroup(meters=
ElecMeter(instance=10, building=1, dataset='REDD', appliances=[Appliance(type='washer dryer', instance=1)])
ElecMeter(instance=20, building=1, dataset='REDD', appliances=[Appliance(type='washer dryer', instance=1)])
)'
Loading data for meter ElecMeterID(instance=10, building=1, dataset='REDD')
C:\Users\norto\miniconda3\envs\nilmtk_env\lib\site-packages\sklearn\cluster_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1.
warnings.warn(
C:\Users\norto\miniconda3\envs\nilmtk_env\lib\site-packages\sklearn\cluster_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1.
warnings.warn(
Loading data for meter ElecMeterID(instance=20, building=1, dataset='REDD')
Done loading data all meters for this chunk.
Done training!
Runtime = 2.4630203247070312 seconds.
Estimating power demand for 'ElecMeter(instance=5, building=1, dataset='REDD', appliances=[Appliance(type='fridge', instance=1)])'
Estimating power demand for 'ElecMeter(instance=9, building=1, dataset='REDD', appliances=[Appliance(type='light', instance=1)])'
Estimating power demand for 'ElecMeter(instance=8, building=1, dataset='REDD', appliances=[Appliance(type='sockets', instance=2)])'
Estimating power demand for 'ElecMeter(instance=11, building=1, dataset='REDD', appliances=[Appliance(type='microwave', instance=1)])'
Estimating power demand for 'MeterGroup(meters=
ElecMeter(instance=10, building=1, dataset='REDD', appliances=[Appliance(type='washer dryer', instance=1)])
ElecMeter(instance=20, building=1, dataset='REDD', appliances=[Appliance(type='washer dryer', instance=1)])
)'
C:\Users\norto\miniconda3\envs\nilmtk_env\lib\site-packages\sklearn\cluster_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=8.
warnings.warn(
C:\Users\norto\miniconda3\envs\nilmtk_env\lib\site-packages\sklearn\cluster_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=8.
warnings.warn(
C:\Users\norto\miniconda3\envs\nilmtk_env\lib\site-packages\nilmtk\nilmtk\feature_detectors\cluster.py:70: ConvergenceWarning: Number of distinct clusters (1) found smaller than n_clusters (2). Possibly due to duplicate points in X.
k_means.fit(X)

KeyError Traceback (most recent call last)
Cell In[8], line 17
15 end = time.time()
16 print("Runtime =", end-start, "seconds.")
---> 17 gt, predictions[clf_name] = predict(clf, test_elec, sample_period, train.metadata['timezone'])

Cell In[5], line 24, in predict(clf, test_elec, sample_period, timezone)
21 pred_overall.index = pred_overall.index.droplevel()
23 # Having the same order of columns
---> 24 gt_overall = gt_overall[pred_overall.columns]
26 #Intersection of index
27 gt_index = gt_overall.index.tz_convert(timezone)

File ~\miniconda3\envs\nilmtk_env\lib\site-packages\pandas\core\frame.py:3001, in DataFrame.getitem(self, key)
2999 if is_iterator(key):
3000 key = list(key)
-> 3001 indexer = self.loc._convert_to_indexer(key, axis=1, raise_missing=True)
3003 # take() does not accept boolean indexers
3004 if getattr(indexer, "dtype", None) == bool:

File ~\miniconda3\envs\nilmtk_env\lib\site-packages\pandas\core\indexing.py:1285, in _NDFrameIndexer._convert_to_indexer(self, obj, axis, is_setter, raise_missing)
1282 else:
1283 # When setting, missing keys are not allowed, even with .loc:
1284 kwargs = {"raise_missing": True if is_setter else raise_missing}
-> 1285 return self._get_listlike_indexer(obj, axis, **kwargs)[1]
1286 else:
1287 try:

File ~\miniconda3\envs\nilmtk_env\lib\site-packages\pandas\core\indexing.py:1091, in _NDFrameIndexer._get_listlike_indexer(self, key, axis, raise_missing)
1088 else:
1089 keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
-> 1091 self._validate_read_indexer(
1092 keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
1093 )
1094 return keyarr, indexer

File ~\miniconda3\envs\nilmtk_env\lib\site-packages\pandas\core\indexing.py:1175, in _NDFrameIndexer._validate_read_indexer(self, key, indexer, axis, raise_missing)
1173 if missing:
1174 if missing == len(indexer):
-> 1175 raise KeyError(
1176 "None of [{key}] are in the [{axis}]".format(
1177 key=key, axis=self.obj._get_axis_name(axis)
1178 )
1179 )
1181 # We (temporarily) allow for some missing keys with .loc, except in
1182 # some cases (e.g. setting) in which "raise_missing" will be False
1183 if not (self.name == "loc" and not raise_missing):

KeyError: "None of [Index([ ElecMeter(instance=5, building=1, dataset='REDD', appliances=[Appliance(type='fridge', instance=1)]),\n ElecMeter(instance=9, building=1, dataset='REDD', appliances=[Appliance(type='light', instance=1)]),\n ElecMeter(instance=8, building=1, dataset='REDD', appliances=[Appliance(type='sockets', instance=2)]),\n ElecMeter(instance=11, building=1, dataset='REDD', appliances=[Appliance(type='microwave', instance=1)]),\n MeterGroup(meters=\n ElecMeter(instance=10, building=1, dataset='REDD', appliances=[Appliance(type='washer dryer', instance=1)])\n ElecMeter(instance=20, building=1, dataset='REDD', appliances=[Appliance(type='washer dryer', instance=1)])\n)],\n dtype='object')] are in the [columns]"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant