Skip to content

MickeyLLG/UVAGaze

Repository files navigation

UVAGaze: Unsupervised 1-to-2 Views Adaptation for Gaze Estimation

arXiv Python 3.8 Pytorch 1.12.1 CUDA 12.1 License CC BY-NC

Our paper is accepted by AAAI-2024

Teaser

Picture: Overview of the proposed Unsupervised 1-to-2 Views Adaption framework for adapting a single-view estimator to flexible dual views.

Main image

Picture: The proposed architecture.


Results

Teaser

This repository contains the official PyTorch implementation of the following paper:

UVAGaze: Unsupervised 1-to-2 Views Adaptation for Gaze Estimation
Ruicong Liu and Feng Lu

Abstract: Gaze estimation has become a subject of growing interest in recent research. Most of the current methods rely on single-view facial images as input. Yet, it is hard for these approaches to handle large head angles, leading to potential inaccuracies in the estimation. To address this issue, adding a second-view camera can help better capture eye appearance. However, existing multi-view methods have two limitations. 1) They require multi-view annotations for training, which are expensive. 2) More importantly, during testing, the exact positions of the multiple cameras must be known and match those used in training, which limits the application scenario. To address these challenges, we propose a novel 1-view-to-2-views (1-to-2 views) adaptation solution in this paper, the Unsupervised 1-to-2 Views Adaptation framework for Gaze estimation (UVAGaze). Our method adapts a traditional single-view gaze estimator for flexibly placed dual cameras. Here, the "flexibly" means we place the dual cameras in arbitrary places regardless of the training data, without knowing their extrinsic parameters. Specifically, the UVAGaze builds a dual-view mutual supervision adaptation strategy, which takes advantage of the intrinsic consistency of gaze directions between both views. In this way, our method can not only benefit from common single-view pre-training, but also achieve more advanced dual-view gaze estimation. The experimental results show that a single-view estimator, when adapted for dual views, can achieve much higher accuracy, especially in cross-dataset settings, with a substantial improvement of 47.0%.

Resources

Material related to our paper is available via the following links:

System requirements

  • Only Linux is tested.
  • 64-bit Python 3.8 installation.

Playing with pre-trained networks and training

Data preparation

Please download the pre-trained models and ETH-MV dataset first. Preparing Gaze360 dataset is optional. Assuming the pre-trained models, ETH-MV, and Gaze360 are stored under ${DATA_DIR}. The structure of ${DATA_DIR} follows

- ${DATA_DIR}
    - AAAI24-UVAGaze-pretrain
        - 10.log
        - 10_100k.log
        - 20.log
        - 20_100k.log
        - Iter_10_eth.pt
        - Iter_20_gaze360.pt       
    - eth-mv
        - Image
        - Label
        - Label_train
        - Label_test
        - Label100k
        - Label100k_train
        - Label100k_test
    - Gaze360
        - Image
        - Label

Please run the following command to register the pre-trained model and data.

cp ${DATA_DIR}/AAAI24-UVAGaze-pretrain/* pretrain
mkdir data
ln -s ${DATA_DIR}/eth-mv data
ln -s ${DATA_DIR}/Gaze360 data

1-to-2 Views Adaptation (Training)

run.sh provides a complete procedure of training and testing.

We provide two optional arguments, --stb and --pre. They repersent two different network components, which could be found in our paper.

--source and --target represent the datasets used as the pre-training set and the dataset adapting to. It is recommended to use gaze360, eth-mv-train as --source and use eth-mv as --target. Please see config.yaml for the dataset configuration.

--pairID represents the index of dual-camera pair to adapt, ranging from 0 to 8.

--i represents the index of person which is used as the testing set. It is recommended to set it as -1 for using all the person as the training set.

--pic represents the number of image pairs for adaptation.

We also provide other arguments for adjusting the hyperparameters in our UVAGaze architecture, which could be found in our paper.

For example, run code like:

python3 adapt.py --i -1 --cams 18 --pic 256 --bs 32  --pairID 0 --savepath eth2eth --source eth-mv-train --target eth-mv --gpu 0 --stb --pre

Test

--i, --savepath, --target, --pairID are the same as training. In addition to eth-mv, using eth-mv100k (a subset of ETH-MV) as --target is recommended for a faster testing.

For example, run code like:

python3 test_pair.py --pairID 0 --savepath eth2eth --target eth-mv100k --gpu 0

Note: the result printed by test_pair.py is NOT the final result on the specific dual-camera pair. It contains evaluation results on the FULL testing set.

Running calc_metric.py is needed to get four metrics on the pair we adapt to. These four metrics are the final results, which are described in our paper.

python3 calc_metric.py --pairID 0 --savepath eth2eth --source eth-mv-train --target eth-mv100k

We have provided the evaluation result of baseline model. Its result can be seen after running the above code, "base pair: ...". The result should be like:

base pair: Mono err: xx; Dual-S err: xx; Dual-A err: xx; HP err: xx
1: ...
2: ...
...
10: ...

The improvements brought by our method can be seen by comparing with the baseline results.

Please refer to run.sh for a complete procedure from training to testing.

Citation

If this work or code is helpful in your research, please cite:

@inproceedings{liu2024uvagaze,
  title={UVAGaze: Unsupervised 1-to-2 Views Adaptation for Gaze Estimation},
  author={Liu, Ruicong and Lu, Feng},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={38},
  number={4},
  pages={3693--3701},
  year={2024}
}

If you are using our ETH-MV dataset, please also cite the original paper of ETH-XGaze:

@inproceedings{zhang2020eth,
  title={Eth-xgaze: A large scale dataset for gaze estimation under extreme head pose and gaze variation},
  author={Zhang, Xucong and Park, Seonwook and Beeler, Thabo and Bradley, Derek and Tang, Siyu and Hilliges, Otmar},
  booktitle={Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part V 16},
  pages={365--381},
  year={2020},
  organization={Springer}
}

Contact

For any questions, including algorithms and datasets, feel free to contact me by email: liuruicong(at)buaa.edu.cn

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published