Agentic Systems

Agentic Abstention: Do Agents Know When to Stop Instead of Act? featured image

Agentic Abstention: Do Agents Know When to Stop Instead of Act?

A benchmark and analysis of when tool-using LLM agents should stop and abstain rather than continue acting.

Read more
Clarify or Answer: Reinforcement Learning for Agentic VQA with Context Under-specification featured image

Clarify or Answer: Reinforcement Learning for Agentic VQA with Context Under-specification

Reinforcement learning for agentic VQA that balances clarification and answering under underspecified context.

Read more

SusBench: An Online Benchmark for Evaluating Dark Pattern Susceptibility of Computer-Use Agents

Benchmarking dark pattern susceptibility of computer-use agents in realistic UI environments.

Read more