Bingbing Wen 🎓

Bingbing Wen

PhD Student

University of Washington

👋 About Me

I am a final-year Ph.D. candidate at the University of Washington, where I am advised by Prof. Bill Howe and Prof. Lucy Lu Wang. I am also a member of the UW RAISE Center and collaborate closely with Prof. Yulia Tsvetkov.

My research focuses on the efficiency and reliability of foundation models and agentic systems, aiming to reduce computational overhead while enhancing model trustworthiness. My work is structured around three core pillars:

  • Data-Centric Optimization: I develop methods for optimal data mixture selection and curation, designing fine-grained preference signals that align models beyond simple correctness.COLM25
  • Agent Workflows & Modular Architectures: I design adaptive systems that optimize tool use and dynamic routing mechanisms. My work explores how reinforcement learning can orchestrate collaboration among specialized experts to streamline complex agent workflows. TACL2025,ICML2025,IUI2026,CoA
  • Reliability-Aware Evaluation: I design frameworks for selective prediction and abstention, enabling models to quantify uncertainty and conserve resources by avoiding unnecessary computation on low-confidence samples. ACL2025, EMNLP2024,Neurips2025

Education

PhD in Information Science (Natural Language Processing)

University of Washington

MS in Computational Science & Engineering (Artificial Intelligence)

University of Hong Kong

BS in Control Science & Engineering (Robotics)

Zhejiang University

Research Interests

Developing data‑ and compute‑efficient methods that enable foundation models to learn, adapt, and allocate resources optimally across tasks and data sources—from training through inference
Featured Publications
Clarify or Answer: Reinforcement Learning for Agentic VQA with Context Under-specification featured image

Clarify or Answer: Reinforcement Learning for Agentic VQA with Context Under-specification

Reinforcement learning for agentic VQA that balances clarification and answering under underspecified context.

Read more
MARVEL: Modular Abstention for Reliable and Versatile Expert LLMs featured image

MARVEL: Modular Abstention for Reliable and Versatile Expert LLMs

A modular abstention framework for reliable expert LLMs that enables selective abstention from uncertain questions.

Read more
AutoScale-Automatic Prediction of Compute-optimal Data Composition for Training LLMs featured image

AutoScale-Automatic Prediction of Compute-optimal Data Composition for Training LLMs

Automatic prediction of compute-optimal data composition for efficient LLM training.

Read more
Do Language Models Mirror Human Confidence? Exploring Psychological Insights to Address Overconfidence in LLMs featured image

Do Language Models Mirror Human Confidence? Exploring Psychological Insights to Address Overconfidence in LLMs

Exploring psychological insights to address overconfidence in LLMs by comparing with human confidence patterns.

Read more
Know Your Limits: A Survey of Abstention in Large Language Models featured image

Know Your Limits: A Survey of Abstention in Large Language Models

A comprehensive survey of abstention mechanisms in large language models, covering theory, implementation, and evaluation.

Read more
Recent Publications
(2026). SusBench: An Online Benchmark for Evaluating Dark Pattern Susceptibility of Computer-Use Agents. IUI 2026.
(2025). Asking the Missing Piece: Context-Driven Clarification for Ambiguous VQA. NeurIPS 2025 FoRLM.
(2025). Escaping the SpuriVerse: Can Large Vision-Language Models Generalize Beyond Seen Spurious Correlations?. NeurIPS 2025 D&B.
(2025). Tensorized Clustered LoRA Merging for Multi-Task Interference. arXiv.
📰 News

1/2026 Our paper on reinforcement learning for agentic VQA has been released on arXiv.

1/2026 Our benchmark on dark pattern susceptibility of computer-use agents has been accepted by IUI 2026.

9/2025 Our paper about MLLM spurious correlation has been accepted by NeurIPS 2025!

7/2025 I presented our abstention survey in LLMs (oral presentation) and confidence calibration (poster) at ACL 2025!

7/2025 Our paper about modular abstention has been accepted by ICML 2025!

6/2025 I will start my summer internship at Apple as a research intern!

5/2025 Our paper about optimal data mixing in pretraining has been accepted by [COLM 2025]!

5/2025 Our paper about confidence calibration has been accepted by ACL 2025!

2/2025 Our paper about abstention survey in LLMs has been accepted by TACL 2025!