Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images

by   Cuican Yu, et al.

Generating 3D faces from textual descriptions has a multitude of applications, such as gaming, movie, and robotics. Recent progresses have demonstrated the success of unconditional 3D face generation and text-to-3D shape generation. However, due to the limited text-3D face data pairs, text-driven 3D face generation remains an open problem. In this paper, we propose a text-guided 3D faces generation method, refer as TG-3DFace, for generating realistic 3D faces using text guidance. Specifically, we adopt an unconditional 3D face generation framework and equip it with text conditions, which learns the text-guided 3D face generation with only text-2D face data. On top of that, we propose two text-to-face cross-modal alignment techniques, including the global contrastive learning and the fine-grained alignment module, to facilitate high semantic consistency between generated 3D faces and input texts. Besides, we present directional classifier guidance during the inference process, which encourages creativity for out-of-domain generations. Compared to the existing methods, TG-3DFace creates more realistic and aesthetically pleasing 3D faces, boosting 9 Latent3D. The rendered face images generated by TG-3DFace achieve higher FID and CLIP score than text-to-2D face/image generation models, demonstrating our superiority in generating realistic and semantic-consistent textures.


page 1

page 4

page 5

page 7

page 8


Text2FaceGAN: Face Generation from Fine Grained Textual Descriptions

Powerful generative adversarial networks (GAN) have been developed to au...

Face0: Instantaneously Conditioning a Text-to-Image Model on a Face

We present Face0, a novel way to instantaneously condition a text-to-ima...

TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision

In this paper, we investigate an open research task of generating contro...

Towards Open-World Text-Guided Face Image Generation and Manipulation

The existing text-guided image synthesis methods can only produce limite...

TextCLIP: Text-Guided Face Image Generation And Manipulation Without Adversarial Training

Text-guided image generation aimed to generate desired images conditione...

Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face Descriptions

The past few years have witnessed renewed interest in NLP tasks at the i...

CPNet: Exploiting CLIP-based Attention Condenser and Probability Map Guidance for High-fidelity Talking Face Generation

Recently, talking face generation has drawn ever-increasing attention fr...

Please sign up or login with your details

Forgot password? Click here to reset