SusBench: An Online Benchmark for Evaluating Dark Pattern Susceptibility of Computer-Use Agents
Benchmarking dark pattern susceptibility of computer-use agents in realistic UI environments.
•
1 min read
Read more
Benchmarking dark pattern susceptibility of computer-use agents in realistic UI environments.
Evaluation suite for diverse multitask multimodal generation with large multimodal models.
Learning metrics for evaluating recommendation explanations.