Agentra LabsAgentra Labs DocsPublic Documentation

AgenticMemory

Benchmarks

Performance measurements for AgenticMemory's core operations across various graph sizes. All benchmarks use the Rust engine directly; Python SDK overhead is negligible for I/O-b...

Performance measurements for AgenticMemory's core operations across various graph sizes. All benchmarks use the Rust engine directly; Python SDK overhead is negligible for I/O-bound operations and adds approximately 5-15 microseconds per FFI call for compute-bound operations.

Test Environment

ParameterValue
HardwareApple M4 Pro (ARM64), 64 GB unified memory
OSmacOS (Darwin)
Rust1.90.0 (release profile, --release)
Benchmark frameworkcriterion.rs 0.5
Iterations100 per measurement (minimum), with statistical warm-up
Feature vectors128-dimensional, f32

All benchmarks are run with cargo bench using release-mode compilation with link-time optimization. Results represent the median of 100 iterations after warm-up, with 95% confidence intervals.

Summary Results

Headline numbers measured at 10K nodes, 50K edges:

OperationMedianDescription
Add node276 nsInsert a single node into the in-memory graph
Add edge1.2 msInsert an edge with adjacency list update
Traverse (depth 5)3.4 msBFS traversal from a single node, depth limit 5
Similarity search (top-10)9 msBrute-force cosine similarity across all vectors
File write32.6 msSerialize 10K nodes + 50K edges to .amem with LZ4
File read3.7 msDeserialize the same file into memory

Detailed Results by Graph Size

Add Node

Time to insert a single cognitive event (fact with 200-character content and metadata) into the in-memory graph.

Graph SizeMedianStd DevNotes
1K nodes248 ns12 nsCache-hot, all data fits in L1
10K nodes276 ns18 nsMarginal increase from hash map growth
100K nodes312 ns25 nsOccasional hash map resize amortized

Node insertion is O(1) amortized. Performance is dominated by the hash map insertion for the node ID and the timestamp syscall.

Add Edge

Time to insert a single edge and update both source and target adjacency lists.

Graph SizeMedianStd DevNotes
1K nodes, 5K edges0.9 ms0.08 msAdjacency lists are small
10K nodes, 50K edges1.2 ms0.11 msAverage 5 edges per node
100K nodes, 500K edges1.8 ms0.15 msAdjacency list growth

Edge insertion is O(1) amortized for the edge itself, with O(degree) for adjacency list management. The higher absolute time compared to node insertion comes from updating two adjacency lists and validating both endpoints.

Graph Traversal (BFS)

Time for a breadth-first traversal from a single starting node. Graph has average degree 10 (5 outgoing, 5 incoming edges per node).

Graph SizeDepth 3Depth 5Depth 7
1K nodes0.4 ms0.8 ms1.1 ms
10K nodes1.2 ms3.4 ms8.7 ms
100K nodes3.1 ms12.4 ms45.2 ms

Traversal time depends on the number of nodes visited, which grows exponentially with depth (bounded by graph size and degree). The visited-set check (hash set lookup) is the primary cost per node.

Brute-force cosine similarity search across all 128-dimensional feature vectors. Returns top-k results.

Graph SizeTop-5Top-10Top-50
1K nodes0.9 ms1.0 ms1.1 ms
10K nodes8.2 ms9.0 ms9.4 ms
100K nodes82 ms84 ms87 ms

Similarity search is O(N * D) where N is node count and D is vector dimension. The contiguous vector layout enables SIMD auto-vectorization -- the compiler generates NEON instructions on ARM that process 4 floats per cycle. The top-k selection adds minimal overhead (binary heap, O(N log k)).

At 100K nodes, the cluster map index (when enabled) reduces search to approximately 15-20 ms by scanning only relevant clusters.

File Write

Time to serialize the complete graph to a .amem file, including LZ4 content compression and vector block construction.

Graph SizeMedianFile SizeThroughput
1K nodes, 5K edges4.1 ms0.7 MB170 MB/s
10K nodes, 50K edges32.6 ms7.1 MB218 MB/s
100K nodes, 500K edges310 ms71 MB229 MB/s

Write time is dominated by LZ4 compression of the content block and sequential writes. The format's sequential layout (no random seeks during write) maximizes throughput.

File Read

Time to deserialize a .amem file into the in-memory graph, including LZ4 decompression.

Graph SizeMedianFile SizeThroughput
1K nodes0.5 ms0.7 MB1.4 GB/s
10K nodes3.7 ms7.1 MB1.9 GB/s
100K nodes34 ms71 MB2.1 GB/s

Read performance benefits from LZ4's fast decompression (>3 GB/s) and the sequential file layout. Throughput increases at larger sizes because the fixed overhead (header parsing, allocation) is amortized.

Memory-Mapped Read (MmapReader)

Time to open a file and access a single random node via memory-mapped I/O.

Graph SizeOpenRandom Node AccessNotes
1K nodes0.1 ms0.3 usEntire file in page cache
10K nodes0.1 ms0.4 usEntire file in page cache
100K nodes0.2 ms0.5 usMay trigger page fault

Memory-mapped access avoids reading the entire file upfront. Node access is a direct pointer dereference after the initial page fault. This is ideal for applications that read a small subset of nodes from a large brain.

Comparison Context

These benchmarks are intended for understanding AgenticMemory's performance profile. Direct comparisons with other systems require careful methodology because they solve different problems.

Key architectural differences from vector databases:

  • AgenticMemory is an embedded library, not a client-server system. There is no network round-trip.
  • The graph structure (edges, traversal) is not present in pure vector databases.
  • The binary file format is optimized for single-writer workloads, not concurrent multi-writer access.

Key architectural differences from graph databases (Neo4j, etc.):

  • AgenticMemory is a file, not a server. No query language parsing, no transaction management.
  • The fixed-size node records and contiguous layout are more cache-friendly than pointer-heavy graph representations.
  • The trade-off is less flexibility: no ad-hoc queries, no schema migrations, no ACID transactions.

Reproducing Benchmarks

Prerequisites

# Rust toolchain
rustup update stable

# Clone and build
git clone https://github.com/anthropic/agentic-memory.git
cd agentic-memory

Running All Benchmarks

cargo bench

Results are written to target/criterion/ with HTML reports including statistical analysis, throughput charts, and regression detection.

Running Specific Benchmarks

# Only node operations
cargo bench -- add_node

# Only I/O benchmarks
cargo bench -- file_write
cargo bench -- file_read

# Only search benchmarks
cargo bench -- similarity_search

Generating a Report

cargo bench -- --save-baseline my_hardware

The HTML report at target/criterion/report/index.html includes:

  • Distribution plots for each benchmark
  • Mean, median, and standard deviation
  • Throughput calculations
  • Change detection vs. previous runs

Custom Graph Sizes

The benchmarks accept environment variables to control graph size:

BENCH_NODES=50000 BENCH_EDGES=250000 cargo bench

Profiling

For detailed profiling, use cargo-flamegraph:

cargo install flamegraph
cargo flamegraph --bench core_benchmarks -- --bench

This generates an SVG flamegraph showing where time is spent during benchmark execution.


v0.2 Query Expansion Benchmarks

All v0.2 query types benchmarked on 100K-node synthetic graphs (300K edges, 3 edges/node average). Measured with Criterion (100 samples) except where noted.

All Query Types at 100K Nodes

CategoryQueryLatencyNotes
RetrievalBM25 text search (fast path)1.58 msUses TermIndex
RetrievalBM25 text search (slow path)122 msFull scan fallback for v0.1 files
RetrievalHybrid search (BM25 + vector)10.83 msRRF fusion
StructurePageRank (alpha=0.85)34.3 msIterative convergence
StructureDegree centrality20.7 msNormalized degree
StructureBetweenness centrality10.1 sBrandes' algorithm, sampled
StructureShortest path (BFS)104 usBidirectional BFS
StructureShortest path (Dijkstra)17.6 msBinary heap
CognitiveBelief revision53.4 msCounterfactual cascade
CognitiveGap detection297 sSingle-run measurement
CognitiveAnalogical query229 sSingle-run measurement
MaintenanceConsolidation (dry run)43.6 sSingle-run measurement
MaintenanceDrift detection68.4 msSupersedes chain analysis

Scaling from 10K to 100K Nodes

Query10K100KScaling Ratio
BM25 (fast)186 us1.58 ms8.5x
Hybrid1.00 ms10.83 ms10.8x
PageRank2.53 ms34.3 ms13.6x
Degree centrality1.73 ms20.7 ms12.0x
Betweenness centrality6.43 s10.1 s1.6x
BFS shortest path7.9 us104 us13.2x
Dijkstra shortest path888 us17.6 ms19.8x
Belief revision6.26 ms53.4 ms8.5x
Gap detection1.53 s297 s194x
Analogical2.40 s229 s95x
Consolidation352 ms43.6 s124x
Drift detection5.84 ms68.4 ms11.7x

Performance Tiers

Queries divide into three tiers at 100K nodes:

  • Interactive (<100 ms): BM25, hybrid, PageRank, degree, BFS, Dijkstra, belief revision, drift -- suitable for per-query use during conversations
  • Periodic (1-60 s): Betweenness centrality, consolidation -- run once per session or on a schedule
  • Offline (>60 s): Gap detection, analogical reasoning -- designed for batch analysis of large graphs; both complete in <3s at 10K nodes

BM25 Index Acceleration

Graph SizeFast Path (TermIndex)Slow Path (full scan)Speedup
10K nodes186 us8.59 ms46x
100K nodes1.58 ms122 ms77x

The inverted index speedup grows with graph size because the fast path cost depends on posting list size (sub-linear in n) while the slow path is always O(n).