How to Boost Face Recognition with StyleGAN?

ICCV 2023

Artem Sevastopolsky

TUM

University of Naples Federico II, TUM

Abstract

State-of-the-art face recognition systems require huge amounts of labeled training data. Given the priority of privacy in face recognition applications, the data is limited to celebrity web crawls, which have issues such as skewed distributions of ethnicities and limited numbers of identities. On the other hand, the self-supervised revolution in the industry motivates research on adaptation of the related techniques to facial recognition. One of the most popular practical tricks is to augment the dataset by the samples drawn from the high-resolution high-fidelity models (e.g. StyleGAN-like), while preserving the identity. We show that a simple approach based on fine-tuning an encoder for StyleGAN allows to improve upon the state-of-the-art facial recognition and performs better compared to training on synthetic face identities. We also collect large-scale unlabeled datasets with controllable ethnic constitution — AfricanFaceSet-5M (5 million images of different people) and AsianFaceSet-3M (3 million images of different people) and we show that pretraining on each of them improves recognition of the respective ethnicities (as well as also others), while combining all unlabeled datasets results in the biggest performance increase. Our self-supervised strategy is the most useful with limited amounts of labeled training data, which can be beneficial for more tailored face recognition tasks and when facing privacy concerns. Evaluation is provided based on a standard RFW dataset and a new large-scale RB-WebFace benchmark.

Video

Main Contributions

We show that the proposed pretraining increases the face recognition performance on the standard large-scale RFW benchmark.

The StyleGAN pretraining provides the most significant improvement of face recognition systems when only limited amounts of labeled data are available.

The system is pretrained on random face collections. We release 5M African and 3M Asian faces gathered from random frames of YouTube videos.

A new fairness-concerned testing benchmark RB-WebFace is proposed, which contains 72K people, 360K positive and 648M negative pairs.

Citation

BibTeX:

@article{sevastopolsky2023boost,
  title={How to Boost Face Recognition with StyleGAN?},
  author={Sevastopolsky, Artem and Malkov, Yury and Durasov, Nikita and Verdoliva, Luisa and Nie{\ss}ner, Matthias},
  journal={International Conference on Computer Vision (ICCV)},
  year={2023}
}