项目作者: jokieleung
项目描述 :
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
高级语言:
项目地址: git://github.com/jokieleung/awesome-visual-question-answering.git
Awesome Visual Question Answering:

A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
Contributing
Please feel free to send me pull requests or email (leungjokie@gmail.com) to add links.
Markdown format:
- [Paper Name](link) - Author 1 et al, **Conference Year**. [[code]](link)
Change Log
- Mar.3rd,2019 The First version released.
Table of Contents
Papers
Survey
2022
EMNLP 2022
NeurIPS 2022
ACL 2022
CVPR 2022
ICLR 2022
AAAI 2022
IJCAI 2022
BMVC 2022
2021
NeurIPS 2021
EMNLP 2021
ICCV 2021
- Just Ask: Learning To Answer Questions From Millions of Narrated Videos - Antoine Yang et al, ICCV 2021.
- Env-QA: A Video Question Answering Benchmark for Comprehensive Understanding of Dynamic Environments - Difei Gao et al, ICCV 2021.
- On The Hidden Treasure of Dialog in Video Question Answering - Deniz Engin et al, ICCV 2021.
- Unshuffling Data for Improved Generalization in Visual Question Answering - Damien Teney et al, ICCV 2021.
- TRAR: Routing the Attention Spans in Transformer for Visual Question Answering - Yiyi Zhou et al, ICCV 2021.
- Greedy Gradient Ensemble for Robust Visual Question Answering - Xinzhe Han et al, ICCV 2021.
- Pano-AVQA: Grounded Audio-Visual Question Answering on 360deg Videos - Heeseung Yun et al, ICCV 2021.
- Weakly Supervised Relative Spatial Reasoning for Visual Question Answering - Pratyay Banerjee et al, ICCV 2021.
- Linguistically Routing Capsule Network for Out-of-Distribution Visual Question Answering - Qingxing Cao et al, ICCV 2021.
- Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering - Corentin Dancette et al, ICCV 2021.
- Auto-Parsing Network for Image Captioning and Visual Question Answering - Xu Yang et al, ICCV 2021.
- Unified Questioner Transformer for Descriptive Question Generation in Goal-Oriented Visual Dialogue - Shoya Matsumori et al, ICCV 2021.
ACL 2021
SIGIR 2021
CVPR 2021
- Separating Skills and Concepts for Novel Visual Question Answering - Spencer Whitehead et al, CVPR 2021.
- Roses Are Red, Violets Are Blue… but Should VQA Expect Them To? - Corentin Kervadec et al, CVPR 2021 [code]
- Predicting Human Scanpaths in Visual Question Answering - Xianyu Chen et al, CVPR 2021
- Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules - Aisha Urooj et al, CVPR 2021
- TAP: Text-Aware Pre-Training for Text-VQA and Text-Caption - Zhengyuan Yang et al, CVPR 2021
- Counterfactual VQA: A Cause-Effect Look at Language Bias - Yulei Niu et al, CVPR 2021 [code]
- KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA - Kenneth Marino et al, CVPR 2021
- Perception Matters: Detecting Perception Failures of VQA Models Using Metamorphic Testing - Yuanyuan Yuan et al, CVPR 2021
- How Transferable Are Reasoning Patterns in VQA? - Corentin Kervadec et al, CVPR 2021
- Domain-Robust VQA With Diverse Datasets and Methods but No Target Labels - Mingda Zhang et al, CVPR 2021
- Learning Better Visual Dialog Agents With Pretrained Visual-Linguistic Representation - Tao Tu et al, CVPR 2021
ICLR 2021
NAACL-HLT 2021
AAAI 2021
2020
EMNLP 2020
NeurIPS 2020
ECCV 2020
CVPR 2020
- Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text - Difei Gao et al, CVPR 2020. [code]
- On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering - Xinyu Wang et al, CVPR 2020.
- In Defense of Grid Features for Visual Question Answering - Huaizu Jiang et al, CVPR 2020.
- Counterfactual Samples Synthesizing for Robust Visual Question Answering - Long Chen et al, CVPR 2020.
- Counterfactual Vision and Language Learning - Ehsan Abbasnejad et al, CVPR 2020.
- Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA - Ronghang Hu et al, CVPR 2020.
- Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing - Vedika Agarwal et al, CVPR 2020.
- SQuINTing at VQA Models: Introspecting VQA Models With Sub-Questions - Ramprasaath R. Selvaraju et al, CVPR 2020.
- TA-Student VQA: Multi-Agents Training by Self-Questioning - Peixi Xiong et al, CVPR 2020.
- VQA With No Questions-Answers Training - Ben-Zion Vatashsky et al, CVPR 2020.
- Hierarchical Conditional Relation Networks for Video Question Answering - Thao Minh Le et al, CVPR 2020.
- Modality Shifting Attention Network for Multi-Modal Video Question Answering - Junyeong Kim et al, CVPR 2020.
- Webly Supervised Knowledge Embedding Model for Visual Reasoning - Wenbo Zheng et al, CVPR 2020.
- Differentiable Adaptive Computation Time for Visual Reasoning - Cristobal Eyzaguirre et al, CVPR 2020.
ACL 2020
WACV 2020
AAAI 2020
2019
ACL 2019
ICCV 2019
NeurIPS 2019
CVPR 2019
- Deep Modular Co-Attention Networks for Visual Question Answering - Zhou Yu et al, CVPR 2019. [code]
- Information Maximizing Visual Question Generation - Ranjay Krishna et al, CVPR 2019. [code]
- Social-IQ: A Question Answering Benchmark for Artificial Social Intelligence - Amir Zadeh et al, CVPR 2019. [code]
- Learning to Compose Dynamic Tree Structures for Visual Contexts - Kaihua Tang et al, CVPR 2019. [code]
- Transfer Learning via Unsupervised Task Discovery for Visual Question Answering - Hyeonwoo Noh et al, CVPR 2019. [code]
- Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph - Yao-Hung Hubert Tsai et al, CVPR 2019. [code]
- Explainable and Explicit Visual Reasoning over Scene Graphs - Jiaxin Shi et al, CVPR 2019. [code]
- MUREL: Multimodal Relational Reasoning for Visual Question Answering - Remi Cadene et al, CVPR 2019. [code]
- Image-Question-Answer Synergistic Network for Visual Dialog - Dalu Guo et al, CVPR 2019. [code]
- RAVEN: A Dataset for Relational and Analogical Visual rEasoNing - Chi Zhang et al, CVPR 2019. [project page]
- Cycle-Consistency for Robust Visual Question Answering - Meet Shah et al, CVPR 2019.
- It’s Not About the Journey; It’s About the Destination: Following Soft Paths Under Question-Guidance for Visual Reasoning - Monica Haurilet et al, CVPR 2019.
- OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge - Kenneth Marino et al, CVPR 2019.
- Visual Question Answering as Reading Comprehension - Hui Li et al, CVPR 2019.
- Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering - Peng Gao et al, CVPR 2019.
- Explicit Bias Discovery in Visual Question Answering Models - Varun Manjunatha et al, CVPR 2019.
- Answer Them All! Toward Universal Visual Question Answering Models - Robik Shrestha et al, CVPR 2019.
- Visual Query Answering by Entity-Attribute Graph Matching and Reasoning - Peixi Xiong et al, CVPR 2019.
AAAI 2019
OTHER
2018
NIPS 2018
AAAI 2018
IJCAI 2018
CVPR 2018
- Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering - Peter Anderson et al, CVPR 2018. [code(author)] [code(pythiaV0.1)] [code(Pytorch Reimplementation)]
- Tips and Tricks for Visual Question Answering: Learnings From the 2017 Challenge - Damien Teney et al, CVPR 2018. [code]
- Learning by Asking Questions - Ishan Misra et al, CVPR 2018. [code]
- Embodied Question Answering - Abhishek Das et al, CVPR 2018. [code]
- VizWiz Grand Challenge: Answering Visual Questions From Blind People - Danna Gurari et al, CVPR 2018. [code]
- Textbook Question Answering Under Instructor Guidance With Memory Networks - Juzheng Li et al, CVPR 2018. [code]
- IQA: Visual Question Answering in Interactive Environments - Daniel Gordon et al, CVPR 2018. [sample video]
- Don’t Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering - Aishwarya Agrawal et al, CVPR 2018. [code]
- Learning Answer Embeddings for Visual Question Answering - Hexiang Hu et al, CVPR 2018. [code]
- DVQA: Understanding Data Visualizations via Question Answering - Kushal Kafle et al, CVPR 2018. [code]
- Cross-Dataset Adaptation for Visual Question Answering - Wei-Lun Chao et al, CVPR 2018. [code]
- Two Can Play This Game: Visual Dialog With Discriminative Question Generation and Answering - Unnat Jain et al, CVPR 2018. [code]
- Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering - Duy-Kien Nguyen et al, CVPR 2018. [code]
- Visual Question Generation as Dual Task of Visual Question Answering - Yikang Li et al, CVPR 2018. [code]
- Focal Visual-Text Attention for Visual Question Answering - Junwei Liang et al, CVPR 2018. [code]
- Motion-Appearance Co-Memory Networks for Video Question Answering - Jiyang Gao et al, CVPR 2018. [code]
- Visual Question Answering With Memory-Augmented Networks - Chao Ma et al, CVPR 2018. [code]
- Visual Question Reasoning on General Dependency Tree - Qingxing Cao et al, CVPR 2018. [code]
- Differential Attention for Visual Question Answering - Badri Patro et al, CVPR 2018. [code]
- Learning Visual Knowledge Memory Networks for Visual Question Answering - Zhou Su et al, CVPR 2018. [code]
- IVQA: Inverse Visual Question Answering - Feng Liu et al, CVPR 2018. [code]
- Customized Image Narrative Generation via Interactive Visual Question Generation and Answering - Andrew Shin et al, CVPR 2018. [code]
ACM MM 2018
ECCV 2018
- Visual Question Answering as a Meta Learning Task - Damien Teney et al, ECCV 2018. [code]
- Question-Guided Hybrid Convolution for Visual Question Answering - Peng Gao et al, ECCV 2018. [code]
- Goal-Oriented Visual Question Generation via Intermediate Rewards - Junjie Zhang et al, ECCV 2018. [code]
- Multimodal Dual Attention Memory for Video Story Question Answering - Kyung-Min Kim et al, ECCV 2018. [code]
- A Joint Sequence Fusion Model for Video Question Answering and Retrieval - Youngjae Yu et al, ECCV 2018. [code]
- Deep Attention Neural Tensor Network for Visual Question Answering - Yalong Bai et al, ECCV 2018. [code]
- Question Type Guided Attention in Visual Question Answering - Yang Shi et al, ECCV 2018. [code]
- Learning Visual Question Answering by Bootstrapping Hard Attention - Mateusz Malinowski et al, ECCV 2018. [code]
- Straight to the Facts: Learning Knowledge Base Retrieval for Factual Visual Question Answering - Medhini Narasimhan et al, ECCV 2018. [code]
- Visual Question Generation for Class Acquisition of Unknown Objects - Kohei Uehara et al, ECCV 2018. [code]
OTHER
2017-2015
OTHER
Please check the other papers list from VQA area between 2017-2015 in awesome-vqa from JamesChuanggg, it seems that he hasn’t maintained that project for a long time. Really appreciate for his work. I will merge his work to this list in the future.Stay tuned…
ICCV 2017
VQA Challenge Leaderboard
I will collect the leaderboard’s implementations in the future.Stay tuned…
test-std 2018
VQA-CP
Licenses

To the extent possible under law, Jokie Leung has waived all copyright and related or neighboring rights to this work.
Reference and Acknowledgement
Really appreciate for their contributions in this area.