Towards Real-world Video Face Restoration: A New Benchmark

1Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences.
2The Chinese University of Hong Kong.    3Shanghai AI Laboratory.

FOS Face in the Wild

Samples from FOS-V, a real video test dataset. (The media may take seconds to load)

Samples from FOS-real a real image test dataset. (The media may take seconds to load)

Samples from FOS-syn, a synthetic image test dataset.

Abstract

Blind face restoration (BFR) on images has significantly progressed over the last several years, while real-world video face restoration (VFR), which is more challenging for more complex face motions such as moving gaze directions and facial orientations involved, remains unsolved. Typical BFR methods are evaluated on privately synthesized datasets or self-collected real-world low-quality face images, which are limited in their coverage of real-world video frames. In this work, we introduced new real-world datasets named FOS with a taxonomy of "Full, Occluded, and Side" faces from mainly video frames to study the applicability of current methods on videos. Compared with existing test datasets, FOS datasets cover more diverse degradations and involve face samples from more complex scenarios, which helps to revisit current face restoration approaches more comprehensively. Given the established datasets, we benchmarked both the state-of-the-art BFR methods and the video super resolution (VSR) methods to comprehensively study current approaches, identifying their potential and limitations in VFR tasks. In addition, we studied the effectiveness of the commonly used image quality assessment (IQA) metrics and face IQA (FIQA) metrics by leveraging a subjective user study. With extensive experimental results and detailed analysis provided, we gained insights from the successes and failures of both current BFR and VSR methods. These results also pose challenges to current face restoration approaches, which we hope stimulate future advances in VFR research.



Statistics

Interpolation end reference image.

The clip length distribution of FOS-V test set ranges in [50, 1500] frames, most of the clips has a length in [50, 250] frames.


Downloads

FOS Test Datasets

Name Size Samples Description
FOS-real 369 MB Download from 百度网盘 | OneDrive
├ fos-real_aligned.zip 320 MB 4253 Aligned 512x512 FOS-real images.
├ fos_real.zip 49 MB 4253 Raw FOS-real images with a size of 128x128.
├ fos_real.pathlist - - Path list file of FOS-real.
└ fos_real_158.pathlist - - Path list file of FOS-real(#158) for user study.

Name Size Samples Description
FOS-V 11.35 GB - Download from 百度网盘 | OneDrive
├ fos_v_clips.zip 489 MB 3,316 Real-world FOS-V clips with size of 128x128.
├ fos_v_frames_interval5_aligned.zip 8.82 GB 3,316 Aligned 512x512 frames of FOS-V with an interval of 5 frames.
├ fos_v_108_frames_interval1_aligned.zip 2.05 GB - Aligned 512x512 frames of FOS-V(#108) with an interval of 1 frames for user study in the paper.
└ fos_v_108.pathlist - - Path list file of FOS-V(#108) for user study in the paper.

Name Size Samples Description
FOS-syn 1.48 GB - Download form 百度网盘 | OneDrive
├ fos_syn.zip 346 MB 3,150 Synthesized 128x128 LQ images based on a subset of CelebA-HQ Test(5k).
├ fos_syn_gt.zip 1.14 GB 3,150 A subset of CelebA-HQ Test(5k).
└ fos_syn.pathlist - - Path list file of FOS-syn.

Data Source

All the real-world data of FOS test datasets are derived from the following two parts.

  • Videos from the publicly available datasets, YTCeleb and YTFace.
  • Self-collected video data from YouTube, named as YTW, for which we provide the source metadata in forms of YouTube video id as YTW meta.

  • All collected raw videos were processed by face tracking and cropping to form FOS-V. The processing scripts can be found in our GitHub page.
    Here we provide the processing metadata of each clip to obtain FOS-V from raw video collections as FOS-V meta. Face detections and timestamps are provided in the metadata file of each clip.



    YTW meta: 百度网盘 | OneDrive
    FOS-V meta: 百度网盘  | OneDrive
    For more details about the datasets, please refer to the paper or this README.

    Benchmarking Results

    Metrics Study

    Interpolation end reference image.

    SROCC v.s PLCC results based on subjective scores and quantitative performance of 6 methods on FOS-real(#158) and FOS-V(#108), with 10 IQA/FIQA algorithms and proposed stability evaluation metric VIDD evaluated.


    Agreement

    • The FOS datasets are only available to download for non-commercial research purposes. The copyright remains with the original owners of the video. A complete version of the license can be found here and we refer to the license of VoxCeleb.
    • All images/videos of the FOS datasets are obtained from the Internet which are not property of our institutions. Our institution are not responsible for the content nor the meaning of these videos.
    • You agree not to reproduce, duplicate, copy, sell, trade, resell or exploit for any commercial purposes, any portion of the videos and any portion of derived data. You agree not to further copy, publish or distribute any portion of the FOS datasets.
    • The distribution of identities in the FOS datasets may not be representative of the global human population. Please be careful of unintended societal, gender, racial and other biases when training or deploying models trained on this data.