Agentra LabsAgentra Labs DocsPublic Documentation

AgenticCodebase

File Format Specification

The .acb binary format stores a complete code concept graph in a single file. This document describes the on-disk layout, section structure, and design rationale.

The .acb binary format stores a complete code concept graph in a single file. This document describes the on-disk layout, section structure, and design rationale.

Design Goals

  1. O(1) random access. Look up any code unit by ID without scanning the file.
  2. Compact. LZ4 compression for variable-length strings. Fixed-size records for units and edges.
  3. Memory-mappable. The format supports mmap() for zero-copy access to unit and edge tables.
  4. Forward-compatible. New fields are appended to the header. Older readers skip unknown sections.

File Layout

Offset 0x00   ┌─────────────────────────────┐
              │         Header (128 B)       │
              ├─────────────────────────────┤
              │     Unit Table (96N bytes)    │  N = unit_count
              ├─────────────────────────────┤
              │     Edge Table (40M bytes)    │  M = edge_count
              ├─────────────────────────────┤
              │  String Pool (LZ4 compressed) │  Variable size
              ├─────────────────────────────┤
              │  Feature Vectors (f32 array)  │  N * dim * 4 bytes
              └─────────────────────────────┘

Header (128 bytes)

OffsetSizeTypeFieldDescription
0x004[u8; 4]magicMagic bytes: ACB\0
0x044u32versionFormat version (currently 1)
0x088u64unit_countNumber of code units
0x108u64edge_countNumber of edges
0x188u64string_pool_offsetByte offset of string pool section
0x208u64string_pool_sizeCompressed size of string pool
0x288u64feature_offsetByte offset of feature vector section
0x304u32dimensionFeature vector dimensionality
0x348u64timestampCompilation timestamp (Unix epoch)
0x3C52[u8; 52]reservedReserved for future fields

Total: 128 bytes (fixed).

Unit Table

Starts immediately after the header at offset 128. Each unit record is 96 bytes.

OffsetSizeTypeFieldDescription
0x008u64idUnique unit identifier
0x044u32name_offsetOffset into decompressed string pool
0x084u32name_lengthLength of name string
0x0C4u32qname_offsetQualified name offset
0x104u32qname_lengthQualified name length
0x141u8unit_typeUnitType enum discriminant
0x151u8languageLanguage enum discriminant
0x161u8visibilityVisibility enum discriminant
0x171u8flagsBit flags (is_async, is_generator, etc.)
0x184u32file_offsetFile path offset in string pool
0x1C4u32file_lengthFile path length
0x204u32start_lineSpan start line
0x244u32start_colSpan start column
0x284u32end_lineSpan end line
0x2C4u32end_colSpan end column
0x304u32complexityCyclomatic complexity
0x344f32stabilityStability score (0.0 - 1.0)
0x384u32sig_offsetSignature string offset (0 if none)
0x3C4u32sig_lengthSignature string length
0x404u32doc_offsetDoc summary offset (0 if none)
0x444u32doc_lengthDoc summary length
0x4824[u8; 24]reservedReserved for future fields

Total: 96 bytes per unit.

Edge Table

Starts after the unit table. Each edge record is 40 bytes.

OffsetSizeTypeFieldDescription
0x008u64source_idSource unit ID
0x088u64target_idTarget unit ID
0x101u8edge_typeEdgeType enum discriminant
0x117[u8; 7]paddingAlignment padding
0x188f64weightEdge weight (0.0 - 1.0)
0x208[u8; 8]reservedReserved

Total: 40 bytes per edge.

String Pool

The string pool contains all variable-length text: unit names, qualified names, file paths, signatures, and documentation summaries. Stored as a single contiguous buffer, LZ4-compressed.

On read, the entire pool is decompressed into memory. String references in unit records use (offset, length) pairs into this decompressed buffer.

Compression

LZ4 block compression is used for the string pool. Typical compression ratios on source code metadata:

  • English identifiers: ~2.5x compression
  • File paths with common prefixes: ~3-4x compression
  • Documentation text: ~2-3x compression

LZ4 decompression runs at 3-5 GB/s on modern hardware, making the decompression cost negligible.

Feature Vectors

Feature vectors are stored as a flat array of f32 values, one vector per unit. The vector for unit N starts at offset feature_offset + N * dimension * 4.

Default dimension is 64, configurable at compile time. Vectors are not compressed since f32 values compress poorly.

Versioning

The version field in the header enables forward compatibility:

  • Version 1 (current): Base format as described in this document.
  • Future versions will maintain backward compatibility by appending new sections after existing ones and using reserved header fields.

Readers should check the version field and reject files with unsupported versions rather than attempting to parse unknown formats.

Checksum

The current format does not include checksums. File integrity can be verified using external tools (e.g., blake3sum). A checksum field may be added in a future version using reserved header space.