Clarify or Answer: Reinforcement Learning for Agentic VQA with Context Under-specification

Jan 20, 2026ยท
Zhen Cao
Bingbing Wen
Bingbing Wen
,
Lucy Lu Wang
ยท 1 min read
Abstract
We propose a reinforcement learning framework for agentic visual question answering (VQA) under context under-specification, enabling agents to decide when to clarify missing information versus directly answer.
Type
Publication
arXiv preprint

We introduce a reinforcement learning framework for agentic VQA that explicitly models whether an agent should ask for clarification or answer directly when faced with underspecified visual and textual context.