Skip to content

Visual speaker recognition method using HMM for classification and regression trees for features extraction

Notifications You must be signed in to change notification settings

joseamoroso/Visual-Speaker-Recognition

Repository files navigation

Visual Speaker Recognition biometric

This work presents a biometric system for speaker recognition using visual-only speech features. The dataset used can be found at AV DATASET, which is an open dataset.

The libraries used in all programs are listed in requirements.txt and can be installed with Pip using the following command:

Run: pip install -r requirements.txt

Files and directory structure

├── auxiliars
│   ├── faceDetection.py    	  	#Detect faces in a frame
│   └── lipsExtraction.py		#Extract coordinates of lips and save lips points (txt file) and images with their points and curves around.
├── AVOriginalDataset			#Folder that contains the AV dataset mentioned before.
|   ├──Phrases
|   |	├── **/*			#Folders containing .mp4 and csv timestamps     
|   └──Digits
|	├── **/*			#Folders containing .mp4 and csv timestamps
├── AVSegmentedDataset			#Folder that contains the AV dataset, segmented by utterances.
|	├── Digits
|	|	├── Normal		# Folders containing .mp4 segment by utterance and speech mode
|	|	├── Whispered
|	|	├── Silent
|	├── Phrases
|	|	├── Normal
|	|	├── Whispered
|	|	├── Silent
├── LipsFrames				#Folder containing all lips images generated.
|   ├──	**/*.jpg			
├── modelsFaceRecognition		#Contains neccesary files for computer vision functions. (if not included, could be found on internet)
|   ├── haarcascade_frontalface_alt.xml
|   ├── opencv_face_detector_uint8.pb
|   ├── opencv_face_detector.pbtxt
|   └── shape_predictor_68_face_landmarks.dat
├── AV_lips_coordinates_v0.txt		#File containing dictiory with all lip coordinates of uterances (is generated by lipsCoordExtraction.py)
├── featuresProcessing.py		#Functions that process the coordinates.
├── hmm.py				#Program that uses features to generate HMM
├── README.md
├── lipsCoordExtraction.py		#Program that generate lips coordinates for all utterances and also the lips images.
├── requirements.txt			#Libraries needed run the programs
└── segmentVideos.py			#Script to segment original videos into utterances ussing CSV files timestamps in each video.

Videos segmentation

First we run the segmentVideos.py, which will generate the videos separated by each speech mode and specific utterance. This script uses the timestamp provided by the dataset to segment each utterance.

Note: The script will generate only the segmentation for phrases, to apply segmentation for digits user should change the paths used in the script.

Lips coordinates extraction

Now we execute:

python3 lipsCoordExtraction.py

With this, we generate files containing the coordinates of the lips in each frame for all videos. To specify number of coordinates, type of face location algorithm and dataset, we need to change commented parts in code.

Features processing (lips coordinates)

Function used for feature processing in this case, normalization of lip coordinates can be found in featuresProcessing.py, these function are used directly when required in the step of generating HMMs.

Generate HMMs from features

Identify speakers using HMM for each utterance

About

Visual speaker recognition method using HMM for classification and regression trees for features extraction

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages