1

Benchmarking dark pattern susceptibility of computer-use agents in realistic UI environments.

Jan 1, 2026 • 1 min read

Context-driven clarification strategies for ambiguous VQA in a reasoning-focused NeurIPS workshop.

Dec 1, 2025 • 1 min read

Benchmarking LVLM robustness to spurious correlations and studying generalization beyond the SpuriVerse.

Sep 1, 2025 • 1 min read

A modular abstention framework for reliable expert LLMs that enables selective abstention from uncertain questions.

Jul 1, 2025 • 1 min read

Exploring psychological insights to address overconfidence in LLMs by comparing with human confidence patterns.

May 1, 2025 • 1 min read

Automatic prediction of compute-optimal data composition for efficient LLM training.

May 1, 2025 • 1 min read

Behavioral analysis and mitigation strategies for overconfidence in large language models.

Dec 1, 2024 • 1 min read

Characterizing LLM abstention behavior in science QA with context perturbations.

Oct 1, 2024 • 1 min read

Demonstrating that open-weight models can match ChatGPT in low-resource, laboratory-scale AI deployments.

Jun 1, 2024 • 1 min read

A generative model for animal motion synthesis under limited data constraints.

Mar 1, 2024 • 1 min read