Graphic layout generation, a growing research field, plays a significant...
Avoiding synthesizing specific visual concepts is an essential challenge...
The rapid advancements in large language models (LLMs) have presented
ch...
Controllable video generation has gained significant attention in recent...
Two-Tower Vision-Language (VL) models have shown promising improvements ...
Medical artificial general intelligence (MAGI) enables one foundation mo...
Large Language Models (LLMs) have shown remarkable performance in variou...
Effectively utilizing LLMs for complex tasks is challenging, often invol...
Artificial Intelligence (AI) has made incredible progress recently. On t...
ChatGPT is attracting a cross-field interest as it provides a language
i...
3D photography renders a static image into a video with appealing 3D vis...
In this paper, we present NUWA-Infinity, a generative model for infinite...
Vision-Language (VL) models with the Two-Tower architecture have dominat...
Recently most successful image synthesis models are multi stage process ...
Breakthroughs in transformer-based models have revolutionized not only t...
Language guided image inpainting aims to fill in the defective regions o...
This paper presents a unified multimodal pre-trained model called NÜWA t...
Self-supervised vision-and-language pretraining (VLP) aims to learn
tran...
In this paper, we present GEM as a General Evaluation benchmark for
Mult...
Generating videos from text is a challenging task due to its high
comput...
This paper presents a strong baseline for real-world visual reasoning (G...