Skip to content

mpinta/tracevec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tracevec

Learning word embedding models (Word2Vec and Doc2Vec) based on the electrical consumption of various home appliances.

Requirements

  • Python 3.8.10+
  • Pip / Anaconda
  • Jupyter Notebook

Other necessary dependencies are:

numpy, scipy, torch, pandas, seaborn, matplotlib, scikit_learn, gensim and matplotlib_venn

Dependencies and their version details are listed in the requirements.txt file. They can be easily installed with the setup.py script:

$ git clone https://github.com/mpinta/tracevec
$ cd tracevec
$ python setup.py install

Usage

The project consists of five connecting parts, which are:

  1. Training word embedding models (using Gensim topic modelling library)
  2. Clustering (Doc2Vec vectors into clusters)
  3. Classification (of the electrical device type using Doc2Vec vectors)
  4. Prediction (of the next electricity consumption category using Word2Vec vectors)
  5. RNN Forecasting (the next electricity consumption category using RNN with GRU)

First, prepare your Pip or Anaconda environment and make sure you have all of the above dependencies installed. Then open the tracevec.ipynb notebook file, which stores and describes all the results of our training and model analysis. You can also run and modify the code yourself, as it is fully equipped with the descriptive comments. You can find our Word2Vec and Doc2Vec models in the models directory (skip the model part training if you don’t want to create new ones).

Datasets

All data sets required to run the code are included in the repository. If you are running code without the included data sets, it is only necessary to clone the tracebase repository, which represents projects main data set, into the datasets directory. All the other modified data sets (consumptions, samples, forecast-train and forecast-test) are gradually created by the notebook code itself. The tracebase data set is not our property and is used only as a depencency (submodule) - we appreciate the work done by the authors. Make sure to initialize the submodule with:

$ git submodule init
$ git submodule update

Publications

The code was originally used in the following publications:

Pintarič Matic, (2022).
S strojnim učenjem podprta analiza vzorcev vektorizirane porabe električne energije.
Maribor: University of Maribor, Faculty of Electrical Engineering and Computer Science.

Acknowledgements

Contains information from the tracebase data set, which is made available at http://www.tracebase.org under the Open Database License (ODbL).