bench
success score 0.90
check-packages
1.8s duration 3 events 2026-07-03 08:51:18
"The agent correctly identified the maintenance status of most packages, but the output format could be improved for easier parsing."
Input
{ "packages": [ "typer", "rich", "httpx", "anthropic" ] }
Output
"3/4 packages actively maintained\ntyper 0.26.8: 6d ago, active\nrich 15.0.0: 82d ago, active\nhttpx 0.28.1: 573d ago, STALE\nanthropic 0.116.0: 0d ago, active"
0 / 3 events
Event stream (3)
start 08:51:18
log 08:51:20
end 08:51:20