Agentra LabsAgentra Labs DocsPublic Documentation

AgenticMemory

.amem File Format Specification

This document describes the binary layout of .amem files -- the on-disk format for AgenticMemory brain data. The format is designed for fast random access, memory-mapped I/O, an...

This document describes the binary layout of .amem files -- the on-disk format for AgenticMemory brain data. The format is designed for fast random access, memory-mapped I/O, and compact storage.

Design Goals

  • Zero-copy reads: Node and edge records are fixed-size, enabling direct memory access without deserialization.
  • Memory-mapped friendly: The layout is aligned for mmap() access. The OS handles paging, so brain files larger than available RAM work efficiently.
  • Compact: Content is LZ4-compressed. Feature vectors are stored as contiguous float arrays. No JSON overhead, no field names repeated per record.
  • Atomic writes: Writes update the header last. A crash during a write leaves the previous valid state intact.
  • Forward compatible: The version field and reserved header bytes allow format evolution without breaking existing readers.

Overview

An .amem file consists of six contiguous sections:

+------------------+
|   Magic + Header |  64 bytes
+------------------+
|   Node Records   |  node_count * 64 bytes
+------------------+
|   Edge Records   |  edge_count * 13 bytes
+------------------+
|   Content Block  |  variable (LZ4 compressed)
+------------------+
|  Feature Vectors |  node_count * dimension * 4 bytes
+------------------+
|     Indexes      |  variable
+------------------+

Section 1: Header (64 bytes)

The header occupies the first 64 bytes of the file.

OffsetSizeTypeFieldDescription
04[u8; 4]magicMagic bytes: 0x41 0x4D 0x45 0x4D (ASCII "AMEM").
42u16versionFormat version. Current: 1.
62u16flagsBitfield. Bit 0: has vectors. Bit 1: has indexes. Bit 2: content compressed.
84u32node_countTotal number of node records.
124u32edge_countTotal number of edge records.
162u16dimensionFeature vector dimension. Default: 128.
182u16session_countNumber of distinct sessions.
208u64content_offsetByte offset to the start of the content block.
288u64content_lengthLength of the content block in bytes (compressed).
368u64vector_offsetByte offset to the start of the feature vector block.
448u64index_offsetByte offset to the start of the index block.
524u32content_uncompressedUncompressed size of the content block.
568[u8; 8]reservedReserved for future use. Must be zero.

Validation rules:

  • magic must be 0x414D454D.
  • version must be <= 1 for this specification.
  • node_count and edge_count must be consistent with file size.
  • dimension must be a positive integer (typically 128).

Section 2: Node Records

Immediately following the header. Each node record is a fixed 64 bytes.

OffsetSizeTypeFieldDescription
01u8event_typeEvent type enum: 0=Fact, 1=Decision, 2=Inference, 3=Correction, 4=Skill, 5=Episode.
13[u8; 3]paddingAlignment padding. Must be zero.
44u32sessionSession ID.
84f32confidenceConfidence score (IEEE 754 single-precision).
128i64timestampUnix timestamp in seconds (UTC).
208u64content_offsetOffset within the decompressed content block where this node's content starts.
284u32content_lengthLength of this node's content in bytes (decompressed).
328u64vector_offsetOffset within the vector block. Set to u64::MAX if no vector is present.
408u64metadata_offsetOffset within the content block for JSON-encoded metadata. u64::MAX if no metadata.
484u32metadata_lengthLength of metadata in bytes. 0 if no metadata.
5212[u8; 12]reservedReserved. Must be zero.

Total: 64 bytes per node.

Notes:

  • Node IDs are implicit -- node N is at offset 64 + (N * 64) from the start of the file.
  • content_offset and metadata_offset are offsets into the decompressed content block, not the raw file.
  • vector_offset is a byte offset into the vector block section.

Section 3: Edge Records

Immediately following the node records. Each edge record is a fixed 13 bytes.

OffsetSizeTypeFieldDescription
04u32sourceSource node ID.
44u32targetTarget node ID.
81u8edge_typeEdge type enum: 0=CausedBy, 1=Supports, 2=Contradicts, 3=Supersedes, 4=RelatedTo, 5=PartOf, 6=TemporalNext.
94f32weightEdge weight (IEEE 754 single-precision).

Total: 13 bytes per edge.

Notes:

  • Edges are sorted by source node ID for efficient adjacency lookups.
  • Both source and target must be valid node IDs (less than node_count).

Section 4: Content Block

A single LZ4-compressed block containing all node content and metadata strings, concatenated end-to-end.

Structure (after decompression):

[node_0_content][node_1_content]...[node_N_content][node_0_metadata][node_1_metadata]...

All content is UTF-8 encoded. Metadata is JSON-encoded UTF-8 (a flat object with string keys and string values).

Compression:

  • Algorithm: LZ4 (frame format).
  • The header field content_length stores the compressed size.
  • The header field content_uncompressed stores the decompressed size.
  • If the flags bit 2 is not set, the content block is stored uncompressed (for very small brains where compression overhead is not worthwhile).

Rationale for LZ4: LZ4 decompression runs at memory bandwidth speeds (typically 3--5 GB/s), making it effectively free compared to I/O. The compression ratio for natural language text is typically 2--3x.

Section 5: Feature Vectors

A contiguous block of IEEE 754 single-precision floating-point values. Each node's vector is dimension floats (default: 128 floats = 512 bytes per vector).

Layout:

[node_0_vector: f32 * dimension][node_1_vector: f32 * dimension]...[node_N_vector]

Notes:

  • Vectors are stored in node ID order.
  • If a node has no vector (vector_offset is u64::MAX), the corresponding slot contains all zeros.
  • The contiguous layout is critical for SIMD-accelerated similarity search -- the CPU can scan vectors without pointer chasing.
  • Total size: node_count * dimension * 4 bytes.

Section 6: Indexes

The index section contains auxiliary data structures for accelerating queries. It is present only if the flags bit 1 is set.

Type Index (Bitmap)

A bitmap index that maps event types to node IDs. Each event type has a bitset of node_count bits, where bit N is set if node N has that event type.

OffsetSizeTypeDescription
04u32index_type: 0x01 (type index).
44u32num_types: Number of event types (6).
8varies[u8; ceil(node_count / 8)] * num_typesPacked bitsets, one per event type.

Session Index

Maps session IDs to their constituent node ID ranges.

OffsetSizeTypeDescription
04u32index_type: 0x02 (session index).
44u32num_sessions: Number of sessions.
8varies[(session_id: u32, start_node: u32, end_node: u32)] * num_sessionsSession-to-node-range mapping.

Time Index

A sorted array of (timestamp, node_id) pairs for efficient time-range queries.

OffsetSizeTypeDescription
04u32index_type: 0x03 (time index).
44u32num_entries: Number of entries.
8varies[(timestamp: i64, node_id: u32)] * num_entriesSorted by timestamp ascending.

Cluster Map

Pre-computed cluster assignments for approximate nearest-neighbor search. Nodes are grouped into clusters based on their feature vectors.

OffsetSizeTypeDescription
04u32index_type: 0x04 (cluster map).
44u32num_clusters: Number of clusters.
84u32dimension: Vector dimension.
12varies[f32; dimension] * num_clustersCluster centroid vectors.
variesvaries[(cluster_id: u32, node_id: u32)] * node_countNode-to-cluster assignments, sorted by cluster_id.

Version Compatibility

Version 1 (Current)

The initial release format as described in this document.

Readers must:

  • Reject files where magic != 0x414D454D.
  • Reject files where version > 1.
  • Handle the absence of vectors (flags bit 0 unset) gracefully.
  • Handle the absence of indexes (flags bit 1 unset) by falling back to linear scans.

Forward Compatibility

Future versions may:

  • Add new event types (values >= 6 in the event_type field). Old readers should treat unknown types as opaque.
  • Add new edge types (values >= 7 in the edge_type field). Old readers should treat unknown types as opaque.
  • Add new index types. Old readers should skip unknown index types.
  • Increase the header size. Old readers should use content_offset to find the content block rather than hardcoding offsets.
  • Utilize the reserved header bytes for new fields.

Byte Order

All multi-byte integers and floats are stored in little-endian byte order.

Size Estimates

For a brain with N nodes, M edges, and 128-dimensional vectors:

ComponentSize
Header64 bytes
Node recordsN * 64 bytes
Edge recordsM * 13 bytes
Content block~40% of raw content size (LZ4)
Feature vectorsN * 512 bytes
Indexes~N * 20 bytes (approximate)

Example: A brain with 100,000 nodes, 500,000 edges, and average content of 200 bytes per node:

ComponentSize
Header64 B
Node records6.1 MB
Edge records6.2 MB
Content block~8 MB (20 MB raw, ~2.5x compression)
Feature vectors48.8 MB
Indexes~1.9 MB
Total~71 MB