Deep Reason: A Strong Baseline for Real-World Visual Reasoning

05/24/2019
by   Chenfei Wu, et al.
0

This paper presents a strong baseline for real-world visual reasoning (GQA), which achieves 60.93 large dataset with 22M questions involving spatial understanding and multi-step inference. To help further research in this area, we identified three crucial parts that improve the performance, namely: multi-source features, fine-grained encoder, and score-weighted ensemble. We provide a series of analysis on their impact on performance.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset