Agentra LabsAgentra Labs DocsPublic Documentation

Performance

Benchmarks and Methodology

How to evaluate runtime performance with reproducible, comparable measurements.

Benchmark principles

  • measure on known hardware and record it
  • run warm and cold scenarios separately
  • report command, dataset, and config used
  • include failure rate, not only latency

Suggested benchmark template

Capture system context

Record this at the top of each benchmark report.

uname -a
rustc --version
node --version
python --version

Runtime smoke benchmark

Use version + help invocations as baseline health timing checks.

time amem --version
time agentic-vision-mcp serve --help
time acb-mcp serve --help

Reporting schema

FieldExample
RuntimeAgenticVision
Scenariowarm perceive on recurring domain
HostApple M-series / Linux x86_64
Dataset/workload25 URLs / 3 sessions
Latency p50/p95180ms / 410ms
Failures0.8%
Noteswarm cache enabled
Comparability

Use the same command set and workload size when comparing branches or releases. Changing both at once makes results non-actionable.