FeXT AutoEncoder: Extraction of Images Features

1. Project Overview

This project is dedicated to the training, evaluation, and application of a Convolutional AutoEncoder, specifically designed for image feature extraction. The architecture of this autoencoder is based on the renowned VGG16 model, a deep learning model frequently employed in tasks such as image reconstruction, anomaly detection, and feature extraction. The FEXT AutoEncoder mirrors the structure of the VGG16 model, comprising two primary components (the encoder and the decoder), which are responsabile for extracting relevant features from the images and compressing this information within a vector with lower dimensionality when compared to the original images with size of 256x256x3. The FEXT AutoEncoder model has been trained and tested on the Flickr 30K dataset, typically used for image captioning, and is versatile enough to be trained on any image dataset of your choice, as long as the input have been properly shaped.

2. Applications

In terms of application scope, this autoencoder can be used in a variety of tasks. For example, it can be used for image reconstruction tasks where the goal is to recreate an input image after it has been encoded. They can also be used for anomaly detection tasks where the goal is to identify data points that do not conform to expected behavior. Furthermore, they can be used for feature extraction tasks where the goal is to identify and extract meaningful features from input data. This project is focused on this latter purpose, where the idea is to transform the images in a vectorized representation that holds the most important information.

3. FEXT AutoEncoder model

3.1 Encoder module

The encoder part of the FEXT autoencoder is composed of convolutional layers that progressively downsample the input image, extracting features at various scales. This part of the network is responsible for transforming the input data into a lower-dimensional representation, namely a 1D vector of size 2048. Input images are resized to shape 256x256x3 to be compatible with the encoder input. The output of the encoder is a compressed features vector of shape 8x8x512.

3.2 Decoder module

The decoder part of the VGG16 autoencoder is typically composed of layers that perform the opposite operation to the encoder, upsampling the lower-dimensional representation back to the original input size. This part of the network is responsible for reconstructing the input data from its lower-dimensional representation, obtaining on output with shape 256x256x3.

4. Installation

The installation process is designed for simplicity, using .bat scripts to automatically create a virtual environment with all necessary dependencies. Please ensure that Anaconda or Miniconda is installed on your system before proceeding.

To set up a CPU-only environment, run setup/create_cpu_environment.bat. This script installs the base version of TensorFlow, which is lighter and does not include CUDA libraries.
For GPU support, which is necessary for model training on a GPU, use setup/create_gpu_environment.bat. This script includes all required CUDA dependencies to enable GPU utilization.

4.1 Additional Package for XLA Acceleration

XLA is designed to optimize computations for speed and efficiency, particularly beneficial when working with TensorFlow and other machine learning frameworks that support XLA. By incorporating XLA acceleration, you can achieve significant performance improvements in numerical computations, especially for large-scale machine learning models. XLA integration is directly available in TensorFlow but may require enabling specific settings or flags.

To enable XLA acceleration globally across your system, you need to set an environment variable named XLA_FLAGS. The value of this variable should be --xla_gpu_cuda_data_dir=path\to\XLA, where path\to\XLA must be replaced with the actual directory path that leads to the folder containing the nvvm subdirectory. It is crucial that this path directs to the location where the file libdevice.10.bc resides, as this file is essential for the optimal functioning of XLA. This setup ensures that XLA can efficiently interface with the necessary CUDA components for GPU acceleration.

5. How to use

The project is organized into subfolders, each dedicated to specific tasks. The FEXT/utils folder houses crucial components utilized by various scripts. It's critical to avoid modifying these files, as doing so could compromise the overall integrity and functionality of the program.

Data: this folder contains the data utilized for the model training (images are loaded in FEXT/data/images). Run the jupyter notebook FEXT/data/data_validation.ipynb to conduct an Explorative Data Analysis (EDA) of the image dataset, with the results being saved in FEXT/data/validation.

Training: contained within this folder are the necessary files for conducting model training and evaluation. The training model checkpoints are saved in FEXT AutoEncoder/training/checkpoints. Run FEXT/training/model_training.py to initiate the training process for the autoencoder, or launch the jupyter notebook FEXT/training/model_evaluation.py to evaluate the performance of pretrained model checkpoints using different metrics.

Inference: run FEXT/inference/features_extraction.py to use pretrained model to extract compressed vectorized features from images located within FEXT/inference/images. The resulting .csv file is then saved within the same directory.

5.1 Configurations

For customization, you can modify the main script parameters via the FEXT/config/configurations.py file.

Category	Setting	Description
Advanced settings	MIXED_PRECISION	use mixed precision for faster training (float16/32)
	USE_TENSORBOARD	Activate/deactivate tensorboard logging
	XLA_STATE	Use linear algebra acceleration for faster training
	ML_DEVICE	Select the training device (CPU or GPU)
	NUM_PROCESSORS	Number of processors (cores) to use; 1 disables multiprocessing
Training settings	EPOCHS	Number of training iterations
	LEARNING_RATE	Learning rate of the model
	BATCH_SIZE	Size of batches for model training
Model settings	IMG_SHAPE	Full shape of the images as (height, width, channels)
	SAVE_MODEL_PLOT	Generate/save 2D model graph (as .png file)
Data settings	TRAIN_SAMPLES	Number of images for model training
	TEST_SAMPLES	Number of samples for validation data
	IMG_AUGMENT	Perform data augmentation on images (affects training time)
General settings	SEED	Global random seed
	SPLIT_SEED	Seed for dataset splitting

License

This project is licensed under the terms of the MIT license. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
FEXT		FEXT
setup		setup
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEXT

FEXT

setup

setup

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

FeXT AutoEncoder: Extraction of Images Features

1. Project Overview

2. Applications

3. FEXT AutoEncoder model

3.1 Encoder module

3.2 Decoder module

4. Installation

4.1 Additional Package for XLA Acceleration

5. How to use

5.1 Configurations

License

About

Releases

Packages

Languages

License

CTCycle/FEXT-Autoencoder

Folders and files

Latest commit

History

Repository files navigation

FeXT AutoEncoder: Extraction of Images Features

1. Project Overview

2. Applications

3. FEXT AutoEncoder model

3.1 Encoder module

3.2 Decoder module

4. Installation

4.1 Additional Package for XLA Acceleration

5. How to use

5.1 Configurations

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages