bench
success score 0.90
check-packages
4.9s duration 3 events 2026-07-03 08:33:37
"The agent correctly identified the maintenance status of most packages, but the output format was not explicitly specified as required."
Input
{ "packages": [ "anthropic", "uv", "httpx", "pydantic" ] }
Output
"3/4 packages actively maintained\nanthropic 0.116.0: 0d ago, active\nuv 0.11.26: 2d ago, active\nhttpx 0.28.1: 573d ago, STALE\npydantic 2.13.4: 57d ago, active"
0 / 3 events
Event stream (3)
start 08:33:37
log 08:33:42
end 08:33:42