ReF-LDM: A Latent Diffusion Model for Reference-based Face Image Restoration

Chi-Wei Hsiao^†, Yu-Lun Liu^‡, Cheng-Kun Yang^†, Sheng-Po Kuo^†, Yucheun Kevin Jou^†, Chia-Ping Chen^†

^†MediaTek, ^‡National Yang Ming Chiao Tung University

NeurIPS 2024

Paper Poster Code Data

ReF-LDM leverages a flexible number of reference images to restore a low-quality face image.

🔥Key highlights include:

CacheKV: efficiently incorporates a flexible number of reference images into an LDM.
Timestep-scaled identity loss: addresses diffusion training 1-step prediction OOD issue.
FFHQ-Ref dataset: 20,405 face images with corresponding reference images.

Model Pipeline

We first extract and cache the key and value tokens of the reference images just once. These cached KV tokens can then be utlized repeatedly in each of the T timesteps of the main denoising process.

CacheKV Ablation

We also experimented with other designs for integrating the reference images. However, they are either ineffective (channel-concatenation, cross-attention) or inefficient (spatial-concatenation).

*Note: tested with 5 reference images on a single GTX 1080.

Timestep-scaled Identity Loss

We found that training with identity loss degrades the image quality of the diffusion model. The reason may be that the one-step model prediction from a large timestep is out-of-distrubution for the pretrained ArcFace model. Our solution is simple and effective: downscaling the identity loss when a larger timestep is sampled for training.

Qualitative Results

Our ReF-LDM achieves great identity similarity by successfully leveraging the reference images.

BibTeX

@inproceedings{hsiao2024refldm,
  title={ReF-LDM: A Latent Diffusion Model for Reference-based Face Image Restoration},
  author={Chi-Wei Hsiao and Yu-Lun Liu and Cheng-Kun Yang and Sheng-Po Kuo and Yucheun Kevin Jou and Chia-Ping Chen},
  journal={Advances in Neural Information Processing Systems},
  year={2024}
}