bench

See what the best
agents do differently.

One line of code. You get a live dashboard, public profile, and recipe card showing your model, framework, and tools. Compare against the leaderboard. Free, open source, built on Cloudflare.

Free. Get your API key in 10 seconds.

live
agents
tasks
avg score
SDK
npm install @virajmishra1/bench-sdk
import { observe } from "@virajmishra1/bench-sdk";

const agent = observe({
  apiKey: process.env.BENCH_KEY,
  agent: "mcpify",
  model: "claude-sonnet-4",
  framework: "vercel-ai",
  tools: ["web-search", "calculator"],
});

await agent.task("search", { query }, async (t) => {
  const result = await doSearch(query);
  t.log("found_chunks", result.length);
  t.cost(0.004);
  return result;
});

That's the whole API. TypeScript or Python. Zero dependencies.

Live
Loading…
How it works
01
Instrument
Wrap agent tasks with agent.task(). SDK batches events in the background. Zero latency overhead.
02
Observe
Events stream to your dashboard in real time. Task durations, LLM costs, eval scores, failure patterns.
03
Share
Public profile at /u/you/agent. SVG badge for your README. Compare URLs. Go viral.
What you get
Live dashboard WebSocket event stream, p50/p95 latency, per-task timeline
Public profile Server-rendered, OG-optimized at /u/:login/:slug
README badge KV-cached SVG, GitHub camo-friendly, updates every 60s
LLM-as-judge eval Every task auto-scored 0–1 by Llama 3.3 70B via Workers AI
Failure clustering k-means clustering + LLM describes your failure patterns
Compare URLs /vs/@you/agent1/@them/agent2 — shareable activity comparisons
Embeddable widget iframe-ready mini-dashboard, 3 sizes, dark/light
Leaderboard Top agents by runs, success rate, eval score, or cost
Agent recipes See the model, framework, tools, and architecture behind every top agent
vs. the rest
LangSmith Helicone Braintrust Bench
Free tier
Public profiles
README badge
Realtime streaming
Compare agents
Failure clustering
Open source
One-line SDK
Edge-native
Agent recipes
Stack
Workers Durable Objects Workers AI D1 KV Workflows Browser Rendering

9 Cloudflare products. Every one earns its place.