Visual Speaker Recognition biometric

This work presents a biometric system for speaker recognition using visual-only speech features. The dataset used can be found at AV DATASET, which is an open dataset.

The libraries used in all programs are listed in requirements.txt and can be installed with Pip using the following command:

Run: pip install -r requirements.txt

Files and directory structure

├── auxiliars
│   ├── faceDetection.py    	  	#Detect faces in a frame
│   └── lipsExtraction.py		#Extract coordinates of lips and save lips points (txt file) and images with their points and curves around.
├── AVOriginalDataset			#Folder that contains the AV dataset mentioned before.
|   ├──Phrases
|   |	├── **/*			#Folders containing .mp4 and csv timestamps     
|   └──Digits
|	├── **/*			#Folders containing .mp4 and csv timestamps
├── AVSegmentedDataset			#Folder that contains the AV dataset, segmented by utterances.
|	├── Digits
|	|	├── Normal		# Folders containing .mp4 segment by utterance and speech mode
|	|	├── Whispered
|	|	├── Silent
|	├── Phrases
|	|	├── Normal
|	|	├── Whispered
|	|	├── Silent
├── LipsFrames				#Folder containing all lips images generated.
|   ├──	**/*.jpg			
├── modelsFaceRecognition		#Contains neccesary files for computer vision functions. (if not included, could be found on internet)
|   ├── haarcascade_frontalface_alt.xml
|   ├── opencv_face_detector_uint8.pb
|   ├── opencv_face_detector.pbtxt
|   └── shape_predictor_68_face_landmarks.dat
├── AV_lips_coordinates_v0.txt		#File containing dictiory with all lip coordinates of uterances (is generated by lipsCoordExtraction.py)
├── featuresProcessing.py		#Functions that process the coordinates.
├── hmm.py				#Program that uses features to generate HMM
├── README.md
├── lipsCoordExtraction.py		#Program that generate lips coordinates for all utterances and also the lips images.
├── requirements.txt			#Libraries needed run the programs
└── segmentVideos.py			#Script to segment original videos into utterances ussing CSV files timestamps in each video.

Videos segmentation

First we run the segmentVideos.py, which will generate the videos separated by each speech mode and specific utterance. This script uses the timestamp provided by the dataset to segment each utterance.

Note: The script will generate only the segmentation for phrases, to apply segmentation for digits user should change the paths used in the script.

Lips coordinates extraction

Now we execute:

python3 lipsCoordExtraction.py

With this, we generate files containing the coordinates of the lips in each frame for all videos. To specify number of coordinates, type of face location algorithm and dataset, we need to change commented parts in code.

Features processing (lips coordinates)

Function used for feature processing in this case, normalization of lip coordinates can be found in featuresProcessing.py, these function are used directly when required in the step of generating HMMs.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
AVSegmentedDataset/Digits		AVSegmentedDataset/Digits
auxiliars		auxiliars
exp1Results		exp1Results
modelsFaceRecognition		modelsFaceRecognition
.gitignore		.gitignore
LipsCoordinates_Normal_12coor_Digits.txt		LipsCoordinates_Normal_12coor_Digits.txt
LipsCoordinates_Normal_12coor_Phrases.txt		LipsCoordinates_Normal_12coor_Phrases.txt
LipsCoordinates_Silent_12coor_Digits.txt		LipsCoordinates_Silent_12coor_Digits.txt
LipsCoordinates_Silent_12coor_Phrases.txt		LipsCoordinates_Silent_12coor_Phrases.txt
LipsCoordinates_Whispered_12coor_Digits.txt		LipsCoordinates_Whispered_12coor_Digits.txt
LipsCoordinates_Whispered_12coor_Phrases.txt		LipsCoordinates_Whispered_12coor_Phrases.txt
README.md		README.md
crossExp2.txt		crossExp2.txt
crossValidation.py		crossValidation.py
featuresProcessing.py		featuresProcessing.py
hmm.py		hmm.py
hmmGreedyEval.py		hmmGreedyEval.py
lipsCoordExtraction.py		lipsCoordExtraction.py
opticalFlow.py		opticalFlow.py
requirements.txt		requirements.txt
results_test_2_normalized_normal.txt		results_test_2_normalized_normal.txt
results_test_2_normalized_silent.txt		results_test_2_normalized_silent.txt
segmentVideos.py		segmentVideos.py

joseamoroso/Visual-Speaker-Recognition

Folders and files

Latest commit

History

Repository files navigation

Visual Speaker Recognition biometric

Files and directory structure

Videos segmentation

Lips coordinates extraction

Features processing (lips coordinates)

Generate HMMs from features

Identify speakers using HMM for each utterance

About

Topics

Resources

Stars

Watchers

Forks

Languages