GitHub - rpytel1/log-strategy: Project conducted for Seminar in Machine Learning for Software Engineering. Aim of our research was to explore possible directions of Deep Learning solutions for log detection in a snippet of code.

Data Extraction and Preprocessing

Run the App.java class in DataPreprocessor/TermExtractor/src/ the Parser with "-Xmx4g -Xmx8g -XX:+UseG1GC" compiler options. Wait for py4j to open a gateway.
Afterwards run DataMiner.py in directory DataPreprocessor and set INPUT_PATH to a directory containing all java classes as an argument.
You can find the results in results directory in filteredCode2 and filteredRNN.

Already Custom Preprocessed Data can be downloaded from

https://drive.google.com/drive/folders/1z2A7IRtdZ6gCILysk_ai9eGGG4po6ZHa?usp=sharing

Generate Code Vectors from Custom Trained code2vec

Preprocess your data as described in section "Data Extraction and Preprocessing"
Split your data from result/filteredCode2Vec/ into train, test and validation
Move the split data to directory /code2vec into folders train, test and val
Go to code2vec/JavaExtractor/JPredict/src/main/java/JavaExtractor and run App.java. Wait for py4j to open a gateway.
Go to code2vec/preprocess.sh and run it.
Move produced files from data/preprocessed_code to JackTheLoggerNet/data/preprocessed_code/ (Be sure that preprocessed_code.dict.c2v is a dictionary on which model was trained)
Run

python3 extract_code_vectors.py --resume PATH_TO_MODEL --file my_dataset.train.c2v

Code vectors will be in jan_train.txt

code2vec Pretrained Codevectors

To generate the codevectors with code2vec run code2vec.py in the code2vec folder with these arguments

   python3 code2vec.py --load PATH-TO-REPO/log-strategy/code2vec/models/java14_model/saved_model_iter8.release\
        --inputData PATH-TO-REPO/log-strategy/DataPreprocessor/data/filteredCode2Vec/ --representation

This creates a single .txt file which can be used as a training or testing set. Files that are inside the --inputData path are included in the .txt file.

To balance the data use the

Alternatively, you can download the pretrained code2vec train and test sets used in the paper from the links stated in the issues

Training (NN)

Adjust the path to the training and testing set, whether you want to use a gpu, and other factors in the config files. To run training of neural network approaches navigate to JackTheLoggerNet and invoke:

python3 train.py -c config/config_test.json

Available configurations:

char-based approaches are in config/char_based/
word-based approach are in config/word_config/
code2vec (custom) approach is in config/code_2_vec/
for single layer NN using pretrained vectors use config/config_singlenn.json

Train SVM and RFC

Extract the methods first and preprocess it, as explained in "Data Extraction and Preprocessing" section
Go to Classifier/Trainer
1. Set the variable TRAINING_DATA_PATH to a .txt file with your labeled code vectors for training
2. Set the variable POSITIVE_RATIO to the desired amount of positive labels in your train data
Run Trainer.py
1. The trained classifiers are saved to result/Classifier/
To evaluate your classifier got to Classifier/Evaluation:
1. Set TEST_DATA_PATH to a .txt file with your labeled code vectors for testing
Run Evaluation.py
1. The evaluation will contain accuracy, jaccard index, precision, recall and balanced accuracy
2. The evaluation results are saved to "/result/Classifier/Classifier_Evaluation_Statistics.txt"

Visualization (NN)

To observe how learning rate is changing while training you have to open tensorboard. If training is happening on the cloud you need to open another session to the machine tunneling on 6006 port and invoke command:

tensorboard --logdir saved/

Then open localhost:6006 to observe learning rates and more.

Testing (NN)

To run testing of neural network approaches invoke:

python3 test.py -r saved/MODELNAME/RUNTIMESTAMP/model_best.pth

Where MODELNAME = the name of the model you trained (see config)

and RUNTIMESTAMP = a timestamp in the form: 1016_174555 for the 16th of october 17:45:55

The results will be printed on screen.

Requirements

Python >= 3.6
Java >= 8
Maven >= 2
Python libraries specified in requirements.txt

To install all of the required libraries for Python run:

pip3 install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 242 Commits
Classifier		Classifier
DataPreprocessor		DataPreprocessor
JackTheLoggerNet		JackTheLoggerNet
code2vec @ 504ad67		code2vec @ 504ad67
result		result
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classifier

Classifier

DataPreprocessor

DataPreprocessor

JackTheLoggerNet

JackTheLoggerNet

code2vec @ 504ad67

code2vec @ 504ad67

result

result

.gitignore

.gitignore

.gitmodules

.gitmodules

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Data Extraction and Preprocessing

Already Custom Preprocessed Data can be downloaded from

Generate Code Vectors from Custom Trained code2vec

code2vec Pretrained Codevectors

Training (NN)

Train SVM and RFC

Visualization (NN)

Testing (NN)

Requirements

About

Releases

Packages

Contributors 4

Languages

rpytel1/log-strategy

Folders and files

Latest commit

History

Repository files navigation

Data Extraction and Preprocessing

Already Custom Preprocessed Data can be downloaded from

Generate Code Vectors from Custom Trained code2vec

code2vec Pretrained Codevectors

Training (NN)

Train SVM and RFC

Visualization (NN)

Testing (NN)

Requirements

About

Topics

Resources

Stars

Watchers

Forks

Languages