AnyFace: Free-style Text-to-Face Synthesis and Manipulation

by   Jianxin Sun, et al.

Existing text-to-image synthesis methods generally are only applicable to words in the training dataset. However, human faces are so variable to be described with limited words. So this paper proposes the first free-style text-to-face method namely AnyFace enabling much wider open world applications such as metaverse, social media, cosmetics, forensics, etc. AnyFace has a novel two-stream framework for face image synthesis and manipulation given arbitrary descriptions of the human face. Specifically, one stream performs text-to-face generation and the other conducts face image reconstruction. Facial text and image features are extracted using the CLIP (Contrastive Language-Image Pre-training) encoders. And a collaborative Cross Modal Distillation (CMD) module is designed to align the linguistic and visual features across these two streams. Furthermore, a Diverse Triplet Loss (DT loss) is developed to model fine-grained features and improve facial diversity. Extensive experiments on Multi-modal CelebA-HQ and CelebAText-HQ demonstrate significant advantages of AnyFace over state-of-the-art methods. AnyFace can achieve high-quality, high-resolution, and high-diversity face synthesis and manipulation results without any constraints on the number and content of input captions.


page 1

page 4

page 6

page 7

page 8


TediGAN: Text-Guided Diverse Face Image Generation and Manipulation

In this work, we propose TediGAN, a novel framework for multi-modal imag...

Learning Aligned Cross-modal Representations for Referring Image Segmentation

Referring image segmentation aims to segment the image region of interes...

Faces à la Carte: Text-to-Face Generation via Attribute Disentanglement

Text-to-Face (TTF) synthesis is a challenging task with great potential ...

StyO: Stylize Your Face in Only One-Shot

This paper focuses on face stylization with a single artistic target. Ex...

Multi-Modal Face Stylization with a Generative Prior

In this work, we introduce a new approach for artistic face stylization....

Make a Face: Towards Arbitrary High Fidelity Face Manipulation

Recent studies have shown remarkable success in face manipulation task w...

Voice2Mesh: Cross-Modal 3D Face Model Generation from Voices

This work focuses on the analysis that whether 3D face models can be lea...

Please sign up or login with your details

Forgot password? Click here to reset