Clarify or Answer: Reinforcement Learning for Agentic VQA with Context Under-specification
Jan 20, 2026ยท
,ยท
1 min read
Zhen Cao
Bingbing Wen
Lucy Lu Wang

Abstract
We propose a reinforcement learning framework for agentic visual question answering (VQA) under context under-specification, enabling agents to decide when to clarify missing information versus directly answer.
Type
Publication
arXiv preprint
We introduce a reinforcement learning framework for agentic VQA that explicitly models whether an agent should ask for clarification or answer directly when faced with underspecified visual and textual context.