Emotion Recognition with Facial Attention and Objective Activation Functions

This projected explored the effect of introducing channel and spatial attention mechanisms, namely SEN-Net, ECA-Net, and CBAM to existing CNN vision-based models such as VGGNet, ResNet, and ResNetV2 to perform the Facial Emotion Recognition task.

The project displayed that not only attention can significantly improve the performance of these models, but also that combining them with a different activation function can further help increase the performance of these models.

Paper for this project was published in the The 29th International Conference on Neural Information Processing (ICONIP 2022) Paper Link Original Disertation

Development

Pre-Processing

Face Detection

This paper uses state-of-the-art facial detector built on top of YOLO framework [Refrence YOLO5 Face Detection].

YOLO [Refrence YOLO] architecture was chosen due to its efficient one-stage object detection capability that is comparable to the performances of a two-stage detectors whilst offering significantly better computational performance.

CNN Models

VGGNet

Authors in proposed the Visual Geometry Group Network (VGGNet) architecture. VGGNet showed that it is possible to increase the depth of a CNN through small-sized kernels. VGGNet showed significant improvement over prior architecture by increasing the depth of the network to 16–19 layers.

ResNet V1

Authors in devised the ResNet architecture, which introduced the concept of residual learning and proposed their computational block, the “residual” block. ResNet aimed at solving the issues found in deep CNN architectures: the vanishing gradient and the degradation problem.

ResNet V2

After the release of ResNet, it was discovered that the degradation problem was still present when the depth of the network exceeded 200 layers. The degradation problem inspired the development of ResNet V2, which fully solved the issue of both the vanishing gradient and the degradation problem by implementing pre-activations in the residual blocks. The new version of ResNet increased accuracy for ultra-deep networks exceeding 1001 layers.

Attention Modules

SEN-Net

SENet (Squeeze-and-Excitation Network) incorporates a channel-wise attention mechanism into CNNs to recalibrate channel-wise feature responses adaptively. It introduces two additional operations called "squeeze" and "excitation." The squeeze operation aggregates spatial information globally, and the excitation operation models the interdependencies between channels. By explicitly emphasizing informative channels and suppressing less important ones, SENet enables CNNs to focus on more discriminative features.

ECA-Net

ECA-Net (Efficient Channel Attention Network) introduces the "Efficient Channel Attention" module, which aims to capture channel-wise dependencies effectively while minimizing computational overhead. It utilizes a 1D convolutional kernel to model the interdependencies between channels. By adaptively recalibrating channel-wise feature responses, ECA-Net can enhance the discriminative power of CNNs.

CBAM

CBAM (Convolutional Block Attention Module is a versatile attention module that combines both spatial and channel attention mechanisms. It includes two branches: one for spatial attention and the other for channel attention. The spatial attention branch captures the interdependencies between spatial locations within a feature map, while the channel attention branch models the interdependencies between channels. By adaptively attending to relevant spatial locations and informative channels, CBAM enhances the representational power of CNNs.

Results

FER Datasets

Project was evaluted over 3 datasets of diffrent sizes. The small JAFFA dataset , medium sized CK+ dataset and large FER2013 dataset.

Evaluation of CNN architectures with ELU on CK+, JAFFE and FER2013

Architecture	Parameters	CK+ Accuracy	JAFFE Accuracy	FER2013 Accuracy
VGG-16	39.92M	87.91%	64.44%	60.66%
VGG-19	42.87M	90.66	68.89%	60.92%
ResNet-50	23.49M	87.91%	73.33%	58.61%
ResNet-101	42.46M	88.46%	60.00%	58.67%
ResNet-152	58.08M	85.71%	15.66%	59.36%
ResNetV2-50	23.48M	88.46%	77.78%	58.72%
ResNetV2-101	42.44M	88.62%	62.22%	59.07%
ResNetV2-152	58.05M	89.01%	66.67%	59.40%

CNNs with Different Attention Mechanisms

Architecture	Param	CK+ Accuracy	JAFFE Accuracy	FER2013 Accuracy
VGG-16	39.92 M	87.91%	64.44%	60.66%
VGG-16 + SEN-Net	39.95M	88.46%	68.89%	63.05%
VGG-16 + ECA-Net	39.92M	89.01%	73.33%	62.72%
VGG-16 + CBAM	39.95M	89.56%	75.56%	63.46%
VGG-19	42.87M	90.66%	68.89%	60.92%
VGG-19 + SEN-Net	45.26M	91.21%	73.33%	63.23%
VGG-19 + ECA-Net	45.23M	91.76%	75.56%	63.49%
VGG-19 + CBAM	45.26M	92.31% (↑ 1.65%)	77.78%	64.07% (↑ 3.15%)
ResNet-50	23.49M	87.91%	73.33%	58.61%
ResNet-50 + SEN-Net	26.02M	89.01%	75.56%	58.84%
ResNet-50 + ECA-Net	23.65M	90.11%	77.78%	59.73%
ResNet-50 + CBAM	26.02M	91.21%	82.22%	59.90%
ResNet-101	42.46M	88.46%	60.00%	58.67%
ResNet-101 + SEN-Net	47.24M	89.01%	68.89%	58.92%
ResNet-101 + ECA-Net	42.81M	89.56%	73.33%	60.15%
ResNet-101 + CBAM	47.24M	90.11%	75.56%	60.92%
ResNet-152	58.08M	85.71%	15.66%	59.36%
ResNet-152 + SEN-Net	64,71M	88.46%	15.66%	59.73%
ResNet-152 + ECA-Net	58.60M	89.56%	15.66%	60.92%
ResNet-152 + CBAM	64.71M	90.11%	15.66%	61.54%
ResNetV2-50	23.48M	88.46%	77.78%	58.72%
ResNetV2-50 + SEN-Net	26.01M	88.66%	82.22%	59.36%
ResNetV2-50 + ECA-Net	23.64M	88.91%	82.22%	59.73%
ResNetV2-50 + CBAM	26.01M	89.01%	84.44%(↑ 6.55%)	60.15%
ResNetV2-101	42.44M	88.62%	62.22%	59.07%
ResNetV2-101 + SEN-Net	47,22M	89.01%	68.89%	59.73%
ResNetV2-101 + ECA-Net	42.79M	89.56%	70.83%	60.15%
ResNetV2-101 + CBAM	47.22M	90.66%	73.33%	60.92%
ResNetV2-152	58.05M	89.01%	66.67%	59.40%
ResNetV2-152 + SEN-Net	64.68M	89.56%	68.89%	60.72%
ResNetV2-152 + ECA-Net	58.57M	89.82%	73.33%	61.54%
ResNetV2-152 + CBAM	64.69M	90.11%	77.78%	62.05%

Conclusions

This project aimed to classify the seven basic human emotions using deep learning models. The project focused on implementing and modifying popular CNN models such as VGGNet, ResNet, and ResNetV2. By replacing the internal activation function with ELU, the models effectively addressed the bias-shift problem and improved their performance.

The project also explored the concept of attention to refine extracted features and enhance model generalization. Three attention modules, SEN-Net, ECA-Net, and CBAM, were implemented to improve the CNNs' performance. SEN-Net and ECA-Net focused on channel attention, while CBAM utilized both channel and spatial attention. It was observed that CBAM achieved the most significant performance improvement, benefiting from attention in both aspects.

To verify the impact of attention on CNN performance, extensive experiments were conducted on three datasets of different sizes: CK+, JAFFE, and FER2013. Results demonstrated that each attention module outperformed the baseline models on all datasets, showcasing the generalization improvement. Notably, CBAM consistently achieved the best performance across all experiments. Overall, attention modules improved the baseline models by 1.65% for CK+, 6.55% for JAFFE, and 3.15% for FER2013, achieving accuracies of 92.31%, 84.44%, and 64.07%, respectively. The accuracy achieved on FER2013 even matched human performance.

In conclusion, this project successfully demonstrated the effectiveness of attention modules in enhancing the performance of CNNs for emotion recognition tasks. The findings highlight the importance of attention in refining feature representation and improving generalization across datasets of various sizes. By combining deep learning models with attention mechanisms, significant performance gains can be achieved in emotion recognition applications.

Refrences

YOLO
- Yolo5
- YOLO5Face
Activation Functions
- SELU , ELU
CNN Architectures
- VGGNet , ResNetV1 , ResNetV2
Attention Modules
- SEN-Net, ECA-Net and CBAM
Datasets
- CK+ , JAFFE , FER2013

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.idea		.idea
docs		docs
models		models
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
attention_modules.py		attention_modules.py
cnn_models.py		cnn_models.py
cropped.jpg		cropped.jpg
detect_face.py		detect_face.py
layers.py		layers.py
model.png		model.png
requirements.txt		requirements.txt
result.jpg		result.jpg
traning.ipynb		traning.ipynb

AndrzejMiskow/FER-with-Attention-and-Objective-Activation-Functions

Folders and files

Latest commit

History

Repository files navigation