Publications

(2026). SusBench: An Online Benchmark for Evaluating Dark Pattern Susceptibility of Computer-Use Agents. IUI 2026.
(2025). Asking the Missing Piece: Context-Driven Clarification for Ambiguous VQA. NeurIPS 2025 FoRLM.
(2025). Escaping the SpuriVerse: Can Large Vision-Language Models Generalize Beyond Seen Spurious Correlations?. NeurIPS 2025 D&B.
(2025). Tensorized Clustered LoRA Merging for Multi-Task Interference. arXiv.
(2025). MARVEL: Modular Abstention for Reliable and Versatile Expert LLMs. ICML 2025.
(2025). MMMG: A Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation. arXiv.
(2025). Do Language Models Mirror Human Confidence? Exploring Psychological Insights to Address Overconfidence in LLMs. ACL 2025.
(2025). AutoScale-Automatic Prediction of Compute-optimal Data Composition for Training LLMs. COLM 2025.
(2025). Know Your Limits: A Survey of Abstention in Large Language Models. TACL 2025.