Table of Contents
The purpose of the project is to integrate an ONNX-based Face Detection CNN Model into a Universal Windows Platform application.
In this project, I use Windows ML, ONNX Runtime, and Emgu CV for inference on Ultra-lightweight face detection model. The implementation of Windows ML is in UltraFaceDetector.cs, the implementation of ONNX Runtime is in UltraFaceDetector2.cs and the implementation of Emgu CV is in UltraFaceDetector3.cs (default). Below is the performance detail.
Class | Model | FPS |
---|---|---|
UltraFaceDetector | version-RFB-320.onnx | 17 |
UltraFaceDetector2 | version-RFB-320.onnx | 11 |
UltraFaceDetector3 | version-RFB-320_without_postprocessing.onnx | 100 |
- Emgu.CV v4.5.1
- Emgu.CV.Bitmap v4.5.1
- Emgu.CV.runtime.windows v4.5.1
- Microsoft.ML.OnnxRuntime v1.6.0
- Microsoft.Toolkit.Uwp v6.1.1
- Sentry v3.0.8
- Microsoft.NETCore.UniversalWindowsPlatform v6.2.10
- Microsoft.Xaml.Behaviors.Uwp.Managed v2.0.1
- Microsoft.NET.Test.Sdk v16.6.1
- xunit v2.4.1
- xunit.analyzers v0.10.0
- xunit.runner.console v2.4.1
- xunit.runner.visualstudio v2.4.1 (Important)
- Windows 10 (Version 1809 or higher)
- Windows 10 SDK (Build 16299 or higher)
- Visual Studio Community 2019
- Clone the repository and open file src/FaceDetection.sln in Visual Studio
- Install required NuGet Packages
- In Solution Explorer window, right-click on solution, select Restore NuGet Packages
- Change project configuration
- In Solution Explorer window, right-click on solution, select Configuration Manager
- On Active solution platform, select x86
- (Optional) Run test
- On Menu bar, select Test, select Processor Architecture for AnyCPU Projects, select x86
- In Solution Explorer window, right click on UnitTest project, select Run Tests
- Run main application
- Build and run FaceDetection project
- Click Image Button (top button on the right), select any image
- Click Detect Button (bottom button on the right) to enable/disable face detection function
- Click Camera Button (middle button on the right) to enable/disable camera streaming
- Click Detect Button to enable/disable face detection function
- This function is automatically enabled when the face detection function is enabled
- To change the face detection implementation, change the property _faceDetectorClass of MainPageViewModel to the corresponding class. There are 3 implementation classes including UltraFaceDetector, UltraFaceDetector2, and UltraFaceDetector3.
- The application uses full-stack monitoring Sentry for error reporting
- In Release build, the application will report any error to my Sentry Report dashboard
User can perform 3 actions which are opening an image, starting camera stream, and detecting faces on the image/camera frame. These 3 actions are independent. Each action depends on some other actions. The Use Case diagram above briefly describes all the actions.
This Data Flow diagram shows the data flow in the application. Whenever the user loads an image or turns on the camera, the image or camera frame will be converted into a uniform format which is SoftwareBitmap. The FrameModel will store this data for further processes.
Whenever FrameModel updates new data, it will notify MainViewModel to preview the new data (image) on the screen on the preview layer
When the user enables the Face Detection function, the Face Detector will get the current data in the FrameModel as the input to perform face detection on this data. Face Detector's outputs are the bounding boxes of all the detected faces. These bounding boxes will be displayed on the Canvas layer on top of preview layer.
Face Detector's outputs are the input of the Distance Estimator as well. The distance from each face to the camera will then be display on the Canvas layer.
By making the module's input/output like this, each module will know clearly about their responsibility and the coupling can be reduced in the class design phase.
These sequence diagrams below intuitively describe the data flow.
The new face detector must implement IFaceDetector interface as below. After the detection is finished, the FaceDetected event should be triggered.
public class FaceDetectedEventArgs : EventArgs
{
public IReadOnlyList<FaceBoundingBox> BoundingBoxes;
public Size OriginalSize;
}
public delegate void FaceDetectedEventHandler(object sender, FaceDetectedEventArgs eventArgs);
public interface IFaceDetector
{
event FaceDetectedEventHandler FaceDetected;
void LoadConfig(IConfig config);
Task LoadModel();
bool IsModelLoaded();
Task Detect(Mat input);
}
Usually, the face detector will come along with some configurations such as the model file path, confidence threshold, or intersection-over-union threshold, etc.
The configuration class where those model configurations are stored should implement IConfig as below
public interface IConfig
{
Task ReadAsync(StorageFile file);
}
The new face detector class's constructor can receive the corresponding configuration class for later usage.
All the configuration class which inherits the IConfig interface should be registered in AppConfig instance. The AppConfig class is defined as a Singleton class so any class can retrieve the configurations of face detectors, main application, etc from anywhere without loading the configurations again.
The most convenient way that I figured out is to convert the HDF5 format model to SavedModel format first. Then we use tf2onnx to convert SavedModel format model to ONNX. This way is also recommended by tf2onnx team. You can find the utility functions inside ModelConverters/hdf5_to_savedmodel.py for your usage.
It's also recommended by tf2onnx team to convert this frozen format model to SavedModel format first and use tf2onnx to convert SavedModel format model to ONNX. You can find the utility functions inside ModelConverters/frozen_to_savedmodel_.py for your usage.
Opset stands for operator set. For example, convolution is an operator. When people design a new model, they might create a new operator. By default tf2onnx uses the opset 9 to generate the graph. So sometimes it doesn't contain the required opset. We just need to try with a bigger opset.
Deep neural networks in Tensorflow are represented as graphs where every node is a transformation of its inputs (like Convolution or MaxPooling).
OpenCV needs an extra configuration file to import object detection models from Tensorflow. It's based on a text version of the same serialized graph in protocol buffers format (protobuf).
Follow this link to generate that extra file from Tensorflow Object detection models.
Windows ML is a high-performance API for deploying hardware-accelerate ML inferences on Windows devices. ONNX Runtime is a cross-platform inference and training machine-learning accelerator compatible with deep learning frameworks, PyTorch and TensorFlow/Keras, as well as classical machine learning libraries such as scikit-learn, and more.
For the NuGet package, Windows ML is built into Microsoft.ai.machinelearning.dll. It does not contain an embedded ONNX runtime, instead the ONNX runtime is built into the file: onnxruntime.dll. Follow this link for more details.
The pinhole camera generates a uniform relationship between the object and the image. Using this relationship, we form 3 equations as below (refresh Github if you cannot see the equations):
where f (pixels) is Focal Length, d (cm) is the distance between the camera and the face, R (cm) is the face Height, r (pixels) is the face height on the screen.
Firstly, I adjust my face in front of the camera with a fixed distance of d. Then I use the application to detect my face at that d distance and record the height of the detected bounding box which is r. I also need to measure my face height which is R. Finally, I calculate the focal length f by using the second equation above.
In the application, I use the third equation to estimate the distance between the face and the camera. In the third equation, f and R are fixed. r is given by the Face Detector which is the height of the bounding box.
This approach has its limitations. When people look down or look up, their face height changes. This will affect the estimated distance. We also need to compute the focal length of camera on the new device.
Future works could implement facial landmark detection to measure the distance between eyes. Then we can use some linear relationship between the eyes distance and the face height/width to estimate the distance between the face and the camera.
Contributions make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/FeatureName
) - Commit your Changes (
git commit -m 'Add some FeatureName'
) - Push to the Branch (
git push origin feature/FeatureName
) - Open a Pull Request
Distributed under the MIT License. See LICENSE for more information.
Tung Dao - LinkedIn
Project Link: https://github.com/dao-duc-tung/face-detection-uwp-onnx