Chenfei Wu

research

∙ 09/18/2023

LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models

Graphic layout generation, a growing research field, plays a significant...

0 Zecheng Tang, et al. ∙

research

∙ 08/26/2023

ORES: Open-vocabulary Responsible Visual Synthesis

Avoiding synthesizing specific visual concepts is an essential challenge...

0 Minheng Ni, et al. ∙

research

∙ 08/19/2023

GameEval: Evaluating LLMs on Conversational Games

The rapid advancements in large language models (LLMs) have presented ch...

0 Dan Qiao, et al. ∙

research

∙ 08/16/2023

DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory

Controllable video generation has gained significant attention in recent...

0 Shengming Yin, et al. ∙

research

∙ 05/31/2023

ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

Two-Tower Vision-Language (VL) models have shown promising improvements ...

0 Xiao Xu, et al. ∙

research

∙ 04/26/2023

Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining

Medical artificial general intelligence (MAGI) enables one foundation mo...

17 Bingqian Lin, et al. ∙

research

∙ 04/20/2023

Learning to Program with Natural Language

Large Language Models (LLMs) have shown remarkable performance in variou...

0 Yiduo Guo, et al. ∙

research

∙ 04/17/2023

Low-code LLM: Visual Programming over LLMs

Effectively utilizing LLMs for complex tasks is challenging, often invol...

0 Yuzhe Cai, et al. ∙

research

∙ 03/29/2023

TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs

Artificial Intelligence (AI) has made incredible progress recently. On t...

0 Yaobo Liang, et al. ∙

research

∙ 03/08/2023

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

ChatGPT is attracting a cross-field interest as it provides a language i...

0 Chenfei Wu, et al. ∙

research

∙ 02/21/2023

Learning 3D Photography Videos via Self-supervised Diffusion on Single Images

3D photography renders a static image into a video with appealing 3D vis...

0 Xiaodong Wang, et al. ∙

research

∙ 07/20/2022

NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis

In this paper, we present NUWA-Infinity, a generative model for infinite...

4 Chenfei Wu, et al. ∙

research

∙ 06/17/2022

Bridge-Tower: Building Bridges Between Encoders in Vision-Language Representation Learning

Vision-Language (VL) models with the Two-Tower architecture have dominat...

9 Xiao Xu, et al. ∙

research

∙ 06/01/2022

DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder

Recently most successful image synthesis models are multi stage process ...

0 Jie Shi, et al. ∙

research

∙ 03/30/2022

VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers

Breakthroughs in transformer-based models have revolutionized not only t...

0 Estelle Aflalo, et al. ∙

research

∙ 02/10/2022

NÜWA-LIP: Language Guided Image Inpainting with Defect-free VQGAN

Language guided image inpainting aims to fill in the defective regions o...

1 Minheng Ni, et al. ∙

research

∙ 11/24/2021

NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion

This paper presents a unified multimodal pre-trained model called NÜWA t...

26 Chenfei Wu, et al. ∙

research

∙ 09/22/2021

KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation

Self-supervised vision-and-language pretraining (VLP) aims to learn tran...

5 Yongfei Liu, et al. ∙

research

∙ 06/18/2021

GEM: A General Evaluation Benchmark for Multimodal Tasks

In this paper, we present GEM as a General Evaluation benchmark for Mult...

0 Lin Su, et al. ∙

research

∙ 04/30/2021

GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions

Generating videos from text is a challenging task due to its high comput...

14 Chenfei Wu, et al. ∙

research

∙ 05/24/2019

Deep Reason: A Strong Baseline for Real-World Visual Reasoning

This paper presents a strong baseline for real-world visual reasoning (G...

0 Chenfei Wu, et al. ∙

Chenfei Wu

Featured Co-authors

Sign in with Google

Consider DeepAI Pro