Get Started
.acomm File Format Specification
This document describes the binary layout of .acomm files -- the on-disk format for AgenticComm communication data. The format is designed for fast sequential reads, compact sto...
This document describes the binary layout of .acomm files -- the on-disk format for AgenticComm communication data. The format is designed for fast sequential reads, compact storage, and forward compatibility.
Design Goals
- Compact: Messages are serialized with bincode and compressed with flate2 (gzip). Typical agent conversations compress at 3:1 or better.
- Atomic writes: The checksum footer is written last. A crash during write leaves the previous valid state intact. Readers verify the checksum and fall back to the last known-good state on mismatch.
- Forward compatible: Unknown sections are skipped using the section table offsets. Older readers can safely open files written by newer versions.
- Indexed: Channel-to-message and timestamp indexes are stored inline for fast lookup without full deserialization.
- Bounded: Maximum file size is 2 GB. Maximum message count is 10 million. These limits prevent unbounded growth and ensure predictable performance.
File Layout
An .acomm file consists of seven contiguous sections:
+-------------------------+
| Magic + Header | 96 bytes
+-------------------------+
| Section Table | variable (6 entries * 24 bytes = 144 bytes)
+-------------------------+
| Channel Section | variable (bincode serialized)
+-------------------------+
| Message Section | variable (bincode + flate2 compressed)
+-------------------------+
| Subscription Section | variable (bincode serialized)
+-------------------------+
| Index Section | variable (bincode serialized)
+-------------------------+
| Footer | 40 bytes (checksum + metadata)
+-------------------------+Section 1: Header (96 bytes)
The header occupies the first 96 bytes of the file.
| Offset | Size | Type | Field | Description |
|---|---|---|---|---|
| 0 | 8 | [u8; 8] | magic | Magic bytes: 0x41 0x43 0x4F 0x4D 0x4D 0x30 0x30 0x31 (ASCII "ACOMM001"). |
| 8 | 2 | u16 | version | Format version. Current: 1. |
| 10 | 4 | u32 | flags | Bitfield. See flags table below. |
| 14 | 2 | u16 | section_count | Number of sections in the section table. Current: 6. |
| 16 | 8 | u64 | channel_count | Total number of channels. |
| 24 | 8 | u64 | message_count | Total number of messages (active + archived). |
| 32 | 8 | u64 | subscription_count | Total number of subscriptions. |
| 40 | 8 | u64 | dead_letter_count | Number of messages in the dead letter queue. |
| 48 | 8 | u64 | created_at | Store creation timestamp (Unix seconds, UTC). |
| 56 | 8 | u64 | modified_at | Last modification timestamp (Unix seconds, UTC). |
| 64 | 8 | u64 | total_size | Total file size in bytes. Used for quick validation. |
| 72 | 24 | [u8; 24] | reserved | Reserved for future use. Must be zero. |
Flags Bitfield
| Bit | Name | Description |
|---|---|---|
| 0 | COMPRESSED | Message section is flate2 compressed. |
| 1 | INDEXED | Index section is present. |
| 2 | HAS_DEAD_LETTERS | Dead letter messages are included in the message section. |
| 3 | HAS_SIGNATURES | At least one message carries a cryptographic signature. |
| 4 | HAS_METADATA | At least one message carries metadata. |
| 5 | ENCRYPTED | Message content is encrypted at rest. |
| 6-31 | -- | Reserved. Must be zero. |
Validation Rules
magicmust be exactlyACOMM001(8 bytes).versionmust be<= 1for this specification. Readers encountering a higher version should either upgrade or refuse to open the file with a clear error message.section_countmust match the number of entries in the section table.channel_countandmessage_countmust be consistent with their respective sections.total_sizemust match the actual file size.- Reserved bytes must be zero. Readers must ignore reserved bytes (do not reject files with non-zero reserved bytes, for forward compatibility).
Section 2: Section Table
Immediately following the header. Each entry is 24 bytes. The table has section_count entries (currently 6).
| Offset | Size | Type | Field | Description |
|---|---|---|---|---|
| 0 | 4 | u32 | section_type | Section type identifier. See table below. |
| 4 | 4 | u32 | flags | Section-specific flags. Currently reserved (must be zero). |
| 8 | 8 | u64 | offset | Byte offset from file start to the beginning of this section. |
| 16 | 8 | u64 | length | Length of this section in bytes. |
Section Types
| Value | Name | Description |
|---|---|---|
| 1 | CHANNELS | Channel definitions and configurations. |
| 2 | MESSAGES | Message records (compressed). |
| 3 | SUBSCRIPTIONS | Pub/sub subscription records. |
| 4 | INDEXES | Lookup indexes (channel-message, timestamp, topic, sender). |
| 5 | DEAD_LETTERS | Failed messages that exhausted retries. |
| 6 | ARCHIVE | Archived messages (retained but past retention policy). |
| 7-255 | -- | Reserved for future sections. Unknown section types must be skipped. |
Forward Compatibility
When a reader encounters an unknown section type:
- Log a warning (not an error).
- Skip the section using the
offsetandlengthfields. - Continue processing remaining sections.
This allows older readers to open files written by newer versions that introduce additional sections.
Section 3: Channel Section
Contains all channel records, serialized sequentially with bincode.
Channel Record Layout
Each channel is serialized as a bincode struct. The section is a length-prefixed sequence:
[channel_count: u64]
[channel_0: bincode bytes]
[channel_1: bincode bytes]
...
[channel_N: bincode bytes]Each channel record includes:
| Field | Type | Description |
|---|---|---|
id | u64 | Channel ID. |
name_len | u32 | Length of channel name. |
name | [u8; name_len] | Channel name (UTF-8). |
channel_type | u8 | Channel type enum value. |
owner_len | u32 | Length of owner participant ID. |
owner | [u8; owner_len] | Owner participant ID (UTF-8). |
participant_count | u32 | Number of participants. |
participants | [Participant; count] | Participant records (see below). |
config | ChannelConfig | Serialized channel configuration. |
state | u8 | Channel state enum value. |
created_at | u64 | Creation timestamp. |
modified_at | u64 | Last modification timestamp. |
message_count | u64 | Number of messages in this channel. |
description_len | u32 | Length of description (0 if none). |
description | [u8; desc_len] | Description (UTF-8), omitted if length is 0. |
tag_count | u32 | Number of tags. |
tags | [Tag; count] | Tags (length-prefixed strings). |
Participant Sub-record
| Field | Type | Description |
|---|---|---|
id_len | u32 | Length of participant ID. |
id | [u8; id_len] | Participant ID (UTF-8). |
role | u8 | Role enum value (0=Owner, 1=Member, 2=Observer). |
joined_at | u64 | Join timestamp. |
has_identity | u8 | 1 if identity_id is present, 0 otherwise. |
identity_id_len | u32 | Length of identity ID (only if has_identity=1). |
identity_id | [u8; len] | Identity ID string (only if has_identity=1). |
Section 4: Message Section
Contains all active messages, serialized with bincode and optionally compressed with flate2.
Compression Envelope
When the COMPRESSED flag is set in the header:
[uncompressed_size: u64] -- 8 bytes, original size before compression
[compressed_data: bytes] -- flate2 gzip compressed bincode dataWhen the COMPRESSED flag is not set:
[raw_data: bytes] -- uncompressed bincode dataMessage Record Sequence
After decompression (or directly if uncompressed), the message section contains:
[message_count: u64]
[message_0: bincode bytes]
[message_1: bincode bytes]
...
[message_N: bincode bytes]Messages are stored in creation order (ascending by created_at). This enables efficient temporal range queries on the raw data.
Message Record Layout
Each message is a bincode-serialized struct with the fields defined in Data Structures. Key considerations:
contentis stored as a length-prefixed byte array (UTF-8). The content is part of the compressed block, so individual message content is not independently accessible without decompressing the entire section.signature, if present, is stored as a length-prefixed byte array.metadata, if present, is serialized as a bincode map.- Optional fields use bincode's
Optionencoding (1 byte tag: 0=None, 1=Some followed by value).
Section 5: Subscription Section
Contains all pub/sub subscriptions.
[subscription_count: u64]
[subscription_0: bincode bytes]
[subscription_1: bincode bytes]
...
[subscription_N: bincode bytes]Each subscription record follows the Subscription struct layout from Data Structures.
Section 6: Index Section
Contains lookup indexes for fast queries without full message scanning.
Index Types
The index section contains multiple sub-indexes, each prefixed with a type tag and length:
[index_count: u32]
[index_type_0: u32] [index_length_0: u64] [index_data_0: bytes]
[index_type_1: u32] [index_length_1: u64] [index_data_1: bytes]
...Channel-Message Index (type = 1)
Maps channel IDs to the message IDs they contain.
[entry_count: u64]
For each entry:
[channel_id: u64]
[message_id_count: u64]
[message_ids: [u64; count]] -- sorted ascendingTimestamp Index (type = 2)
Sorted array of (timestamp, message_id) pairs for binary search.
[entry_count: u64]
For each entry:
[timestamp: u64]
[message_id: u64]Entries are sorted by timestamp ascending. Ties are broken by message_id ascending.
Topic Index (type = 3)
Maps exact topic strings to message IDs.
[entry_count: u64]
For each entry:
[topic_len: u32]
[topic: [u8; topic_len]]
[message_id_count: u64]
[message_ids: [u64; count]] -- sorted ascendingSender Index (type = 4)
Maps sender participant IDs to their message IDs.
[entry_count: u64]
For each entry:
[sender_len: u32]
[sender: [u8; sender_len]]
[message_id_count: u64]
[message_ids: [u64; count]] -- sorted ascendingCorrelation Index (type = 5)
Maps correlation IDs to all messages in the thread.
[entry_count: u64]
For each entry:
[correlation_id_len: u32]
[correlation_id: [u8; len]]
[message_id_count: u64]
[message_ids: [u64; count]] -- sorted ascendingSection 7: Footer (40 bytes)
The footer is the last 40 bytes of the file. It is written last to ensure atomic updates.
| Offset | Size | Type | Field | Description |
|---|---|---|---|---|
| 0 | 32 | [u8; 32] | checksum | SHA-256 hash of all preceding bytes (header through index section). |
| 32 | 8 | [u8; 8] | footer_magic | Footer magic bytes: 0x41 0x43 0x45 0x4E 0x44 0x30 0x30 0x31 (ASCII "ACEND001"). |
Checksum Verification
On file load:
- Read the last 40 bytes to extract checksum and footer magic.
- Verify footer magic is
ACEND001. - Compute SHA-256 of bytes 0 through (file_size - 40).
- Compare computed hash with stored checksum.
- If mismatch: reject the file with an integrity error. Do not silently load corrupt data.
Atomic Write Protocol
When writing a .acomm file:
- Write to a temporary file (
<path>.acomm.tmp). - Write header, section table, all sections.
- Compute SHA-256 of everything written so far.
- Write footer (checksum + magic).
- Flush and sync the temporary file.
- Atomically rename the temporary file to the target path.
This ensures that readers never see a partially-written file. If the process crashes during write, the temporary file is left behind and can be cleaned up on the next successful write.
Size Limits
| Limit | Value | Rationale |
|---|---|---|
| Maximum file size | 2 GB (2,147,483,648 bytes) | Keeps memory-mapped operations predictable. |
| Maximum message count | 10,000,000 | Prevents excessive decompression time. |
| Maximum channel count | 100,000 | Keeps channel index manageable. |
| Maximum subscription count | 1,000,000 | Prevents subscription matching overhead. |
| Maximum message content | 1 MB (1,048,576 bytes) | Prevents single messages from dominating the file. |
| Maximum channel name | 128 bytes | Keeps indexes compact. |
| Maximum topic length | 256 bytes | Keeps topic matching fast. |
When a limit is reached, the store returns an error with a descriptive message. The store does not silently truncate or drop data.
Versioning
The version field in the header enables format evolution:
| Version | Status | Changes |
|---|---|---|
| 1 | Current | Initial format. All sections as described above. |
Version Upgrade Protocol
When a reader encounters version > 1:
- If the reader supports the version, proceed normally.
- If the reader does not support the version, check if the section table contains only known section types.
- If all section types are known, attempt to read the file (the new version may only add new sections that can be skipped).
- If unknown section types are present, skip them and read known sections.
- If the header structure itself has changed (different header size), reject with a version error.
Version Downgrade
Files written by version N can be read by readers supporting version N or higher. Downgrade (opening a version-2 file with a version-1 reader) is supported only if the version-2 file uses no features beyond version-1 and all unknown sections can be safely skipped.
File Locking
When multiple processes access the same .acomm file:
- Writers acquire an exclusive lock on
<path>.acomm.lock(advisory file lock). - Readers do not require a lock (they read the last atomically-written version).
- The lock file contains the PID of the holding process and a timestamp.
- Stale locks (process PID no longer running) are automatically recovered after 60 seconds.
Lock File Format
PID: <process_id>
STARTED: <unix_timestamp>
HOSTNAME: <hostname>Plain text, human-readable. This allows manual inspection and recovery.
Example File Sizes
| Scenario | Channels | Messages | Uncompressed | Compressed | Ratio |
|---|---|---|---|---|---|
| Small session | 3 | 100 | 48 KB | 18 KB | 2.7:1 |
| Day of work | 10 | 5,000 | 2.1 MB | 680 KB | 3.1:1 |
| Week of work | 25 | 50,000 | 19 MB | 5.8 MB | 3.3:1 |
| Large project | 100 | 500,000 | 180 MB | 52 MB | 3.5:1 |
| Maximum capacity | 100,000 | 10,000,000 | ~1.8 GB | ~500 MB | 3.6:1 |