bench
success score 0.90
check-packages
14.2s duration 3 events 2026-07-03 08:51:02
"The agent correctly identified the maintenance status of most packages, but lacked detailed information about the staleness of httpx."
Input
{ "packages": [ "pydantic", "typer", "numpy", "httpx" ] }
Output
"3/4 packages actively maintained\npydantic 2.13.4: 57d ago, active\ntyper 0.26.8: 6d ago, active\nnumpy 2.5.0: 11d ago, active\nhttpx 0.28.1: 573d ago, STALE"
0 / 3 events
Event stream (3)
start 08:51:02
log 08:51:16
end 08:51:16