Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifying AMPDS2 Dataset for Appliance-Level Analysis #977

Open
NortonGuilherme opened this issue Mar 26, 2024 · 3 comments
Open

Modifying AMPDS2 Dataset for Appliance-Level Analysis #977

NortonGuilherme opened this issue Mar 26, 2024 · 3 comments

Comments

@NortonGuilherme
Copy link

NortonGuilherme commented Mar 26, 2024

I am currently engaged in academic research focused on Non-Intrusive Load Monitoring (NILM) and am utilizing the AMPDS2 dataset for my analysis. My objective is to study appliance-level energy consumption patterns and their impact on overall energy usage. To this end, I am looking to modify the AMPDS2 dataset in a specific way - by zeroing out the energy consumption data for certain appliances (e.g., refrigerators) and subsequently removing these zeroed values from the total energy sum. This modification is crucial for my study, allowing me to analyze the impact of individual appliances on total energy consumption more accurately.

Despite my efforts, I've encountered challenges in identifying the correct approach to achieve this within the AMPDS2 dataset structure, especially considering its storage in the HDF5 format. I am seeking guidance or suggestions from the community on how to effectively modify appliance data within the AMPDS2 dataset. Specifically, I am interested in:

Techniques or tools recommended for directly editing or manipulating data within HDF5 files, particularly for zeroing out specific appliance data.
Strategies for ensuring that modifications to individual appliance data are accurately reflected in the total energy consumption calculations.
Any scripts, code snippets, or utilities within NILMTK or external tools that could facilitate this process.
My work aims to contribute valuable insights into appliance-level energy consumption and efficiency, and I believe that the modifications described above are essential for isolating the impact of specific appliances. I would greatly appreciate any advice, resources, or examples that could assist me in this endeavor.

Thank you in advance for your time and assistance. I am eager to learn from your expertise and contribute to the NILMTK community with my findings.

@jeduapf
Copy link

jeduapf commented Mar 27, 2024

Hello Norton, I believe you're brazilian, so I can recommend my previous work in the subject in the link TCC. There are several mistakes and lack of comments sometimes, but feel free to ask and I will try to help.

About the HDF5 format it isn't that difficult to read and write it in python. Here's a simple guide.

Hope I could help !

@NortonGuilherme
Copy link
Author

OI, sim sou brasileiro e também estou escrevendo meu tcc, o que não consigo acessar no AMPDS2 é quando chego nesta parte
`import h5py
import pandas as pd
import numpy as np

caminho_arquivo = r'C:\Users\norton.santos\Downloads\AMPds2.h5'

with h5py.File(caminho_arquivo, 'r') as arquivo:

dataset = arquivo['building1/elec/meter1/table']

dados = np.array(dataset)

df = pd.DataFrame(data=dados, columns=dataset.dtype.names)

print(df.head()) `

a questão é que sempre aparece a mesma mensagem de erro OSError: Can't synchronously read data (can't open directory) ou ValueError: Per-column arrays must each be 1-dimensional
procurei outras formas de conseguir acessar a tabela,

@NortonGuilherme
Copy link
Author

NortonGuilherme commented Mar 28, 2024

tentei utilizar a seguinte solução https://github.com/nilmtk/nilmtk/blob/master/nilmtk/dataset_converters/ampds/convert_ampds.py , criar o conjunto AMPDS2 a partir dos dados em csv, segue o codigo que utilizei, contudo existe algum problema em fazer dessa maneira porque não consigo rodar o CO ou FHMM, caso alguem saiba como me auxiliar para resolver.
segue codigo modificado que utilizei, caso alguém possa me ajudar

`import numpy as np
import pandas as pd
from os.path import join, isfile, isdir
from os import listdir
from nilmtk.datastore import Key
from nilmtk.measurement import LEVEL_NAMES
from nilmtk.utils import check_directory_exists, get_datastore, get_module_directory
from nilm_metadata import convert_yaml_to_hdf5
import datetime

# Configurações iniciais
input_path = r'C:\Users\norto\Downloads\dataverse_files'  # Atualize isso para o seu diretório de arquivos CSV
output_filename = r'C:\Users\norto\Downloads\dataverse_files\ampds2_modificado.h5'  # Defina o caminho do arquivo de saída HDF5
columnNameMapping = {
    'V': ('voltage', ''),
    'I': ('current', ''),
    'f': ('frequency', ''),
    'DPF': ('power factor', 'real'),
    'APF': ('power factor', 'apparent'),
    'P': ('power', 'active'),
    'Pt': ('energy', 'active'),
    'Q': ('power', 'reactive'),
    'Qt': ('energy', 'reactive'),
    'S': ('power', 'apparent'),
    'St': ('energy', 'apparent')
}
TIMESTAMP_COLUMN_NAME = "unix_ts"
TIMEZONE = "America/Vancouver"

def convert_ampds(input_path, output_filename, format='HDF'):
    check_directory_exists(input_path)
    files = [f for f in listdir(input_path) if isfile(join(input_path, f)) and f.endswith('.csv') and not f.endswith('.swp')]
    files.sort()

    assert isdir(input_path)
    store = get_datastore(output_filename, format, mode='w')
    
    for i, csv_file in enumerate(files):
        key = Key(building=1, meter=(i + 1))
        print('Loading file #', (i + 1), ' : ', csv_file, '. Please wait...')
        df = pd.read_csv(join(input_path, csv_file))

        # Normaliza o nome da coluna de timestamp para minúsculas e verifica a capitalização
        timestamp_column = TIMESTAMP_COLUMN_NAME.lower() if TIMESTAMP_COLUMN_NAME.lower() in df.columns else TIMESTAMP_COLUMN_NAME.upper()
        
        df.columns = [x.replace(" ", "") for x in df.columns]
        df.index = pd.to_datetime(df[timestamp_column], unit='s', utc=True)
        df = df.drop(columns=[timestamp_column])
        df = df.tz_convert(TIMEZONE)
        
        # Ignora colunas desconhecidas e avisa sobre elas
        mapped_columns = []
        for x in df.columns:
            if x in columnNameMapping:
                mapped_columns.append(columnNameMapping[x])
            else:
                print(f"Aviso: Coluna '{x}' não mapeada. Ignorando...")
        
        if mapped_columns:
            df.columns = pd.MultiIndex.from_tuples(mapped_columns, names=LEVEL_NAMES)
        
        df = df.apply(pd.to_numeric, errors='ignore')
        df = df.dropna()
        df = df.astype(np.float32)
        store.put(str(key), df)
        print("Done with file #", (i + 1))

    store.close()
    metadata_path = join(get_module_directory(), 'dataset_converters', 'ampds', 'metadata')
    print('Processing metadata...')
    convert_yaml_to_hdf5(metadata_path, output_filename)

convert_ampds(input_path, output_filename)
`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants