bench
success score 0.90
check-packages
14.4s duration 3 events 2026-07-03 08:32:45
"The agent correctly identified the maintenance status of most packages, but may have minor inaccuracies in staleness thresholds."
Input
{ "packages": [ "pydantic", "numpy", "httpx", "anthropic" ] }
Output
"3/4 packages actively maintained\npydantic 2.13.4: 57d ago, active\nnumpy 2.5.0: 11d ago, active\nhttpx 0.28.1: 573d ago, STALE\nanthropic 0.116.0: 0d ago, active"
0 / 3 events
Event stream (3)
start 08:32:45
log 08:32:59
end 08:32:59