Skip to content

Audio processing, Video processing and Computer Vision (UC3M - C2.350.16508)

Notifications You must be signed in to change notification settings


Repository files navigation

Audio processing, Video processing and Computer vision - UC3M

Table of contents

  1. Description and Installation
  2. Scale-Space Blob Detector
  3. Melanoma Segmentation
  4. Melanoma Classification with CNNs
  5. Object Detection with Faster-RCNN
  6. Feature Selection for Audio Classification
  7. Audio Speech Recognition with DeepSpeech2


Audio processing, Video processing and Computer Vision Laboratories (UC3M - C2.350.16508).


Create a Python 3.6 virtual environment and run the following command:

pip install -r requirements.txt

Or specify the name of the project to install specific requirements.

pip install -r <PROJECT NAME>/requirements.txt

Installation PyTorch for CUDA 11.3


pip3 install torch==1.10.0+cu113 torchvision==0.11.1+cu113 -f


conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch


1. Scale-Space Blob Detector

Scale-space blob detector based on the Laplacian of Gaussian (LoG) filter. Full guideline here.

2. Melanoma Segmentation

Pre-processing, segmentation and post-processing for melanoma images using thresholding and clustering techniques. Full guideline here.

3. Melanoma Classification with CNNs

Testing of several CNN architectures for melanoma classification (no melanoma, melanoma, keratosis) Full lab here.

4. Object Detection with Faster-RCNN

Faster-RCNN implementation for object detection and classification using a subset of the PASCAL VOC 2012 database. Full lab here.

5. Feature Selection for Audio Classification

Feature extraction and selection for classifying dogs and cats audios using SVM. Full guideline here.

6. Audio Speech Recognition with DeepSpeech2

Comparison of 3 speech recognition architectures based on DeepSpeech2 altering the GRU layer implementation. Full lab here.