Performance
Benchmarks and Methodology
How to evaluate runtime performance with reproducible, comparable measurements.
Benchmark principles
- measure on known hardware and record it
- run warm and cold scenarios separately
- report command, dataset, and config used
- include failure rate, not only latency
Suggested benchmark template
Capture system context
Record this at the top of each benchmark report.
uname -a
rustc --version
node --version
python --versionRuntime smoke benchmark
Use version + help invocations as baseline health timing checks.
time amem --version
time agentic-vision-mcp serve --help
time acb-mcp serve --helpReporting schema
| Field | Example |
|---|---|
| Runtime | AgenticVision |
| Scenario | warm perceive on recurring domain |
| Host | Apple M-series / Linux x86_64 |
| Dataset/workload | 25 URLs / 3 sessions |
| Latency p50/p95 | 180ms / 410ms |
| Failures | 0.8% |
| Notes | warm cache enabled |
Comparability
Use the same command set and workload size when comparing branches or releases. Changing both at once makes results non-actionable.