# Engine Persistence and Disk Usage

# Overview

The Engine persists its state through two mechanisms:

  1. Command logs: Sequential records of all received commands
  2. Snapshots: Point-in-time captures of complete engine state

# Command Logs

# Structure

  • Every command received by the Engine is written to command-log files, in order
  • Minimum activity: One clock-tick every 10ms
  • Each order consumes approximately 500 bytes

# Daily Disk Usage Scenarios

Scenario Command Log Usage
Minimum Daily (No Orders) ~300 MB
+100,000 Orders +50 MB
+1,000,000 Orders +500 MB
+10,000,000 Orders +5 GB

# Snapshots

# Characteristics

  • Contains complete engine state
  • Generated every ~8 minutes by the Snapshotter process
  • Each snapshot is indexed to a specific command in the command log
  • Minimum size: 1.5 MB per snapshot
  • Approximately 300 MB per day for minimal usage

# Size Factors

Snapshot size varies based on:

  • Number of instruments
  • Transaction volume
  • Active Orders
  • Price tick frequency
  • Overall system activity

Size range: 1.5 MB to several hundred MB per snapshot

# State Recovery Process

flowchart TD
A[Start Engine] --> B{Snapshot Available?}

    B -->|Yes| C[Load Most Recent Snapshot]
    B -->|No| D[Find commandLog.0]
    
    C --> E[Find Commands Since Snapshot]
    D --> F[Process All Command Files]
    
    E --> G[Replay Commands]
    F --> G
    
    G --> H[Engine Ready]
    
    style A fill:#f9f,stroke:#333
    style H fill:#9f9,stroke:#333

# File Management

# Archival Process

The Snapshotter automatically:

  • Moves older command logs to archive directory
  • Moves older snapshots to archive directory
  • Never modifies archived files

# Recommended Management Practices

  1. File System Management:

    • Regularly compress archived files
    • Transfer archives to secondary storage
    • Remove intermediate snapshots as needed
    • Maintain a minimum snapshot frequency (e.g., daily)
  2. Storage Planning:

    • Monitor growth rate of command logs
    • Track snapshot size trends
    • Ensure adequate primary storage capacity
    • Implement archive retention policy

# Technical Considerations

# Disk Space Planning

  • Active systems may generate multiple GB daily
  • Consider both command logs and snapshots in capacity planning
  • Monitor growth rate during peak trading periods

# Performance Impact

  • Startup time depends on:
    • Size of latest snapshot
    • Number of commands since snapshot
    • Storage system performance

# Recovery Capabilities

  • Any snapshot provides full system state
  • Intermediate snapshots are safe to delete
  • Maintain at least one recent snapshot at all times

# Persistence Directory Reference

# File Format

All persistence files use a proprietary binary format which is subject to change.

# Directory Structure

# Command Logs

commandLog.<index>
  • <index>: Integer representing the first command index in the file
  • Files are sequentially numbered
  • New file created with each snapshot generation

# Snapshot Directories

snapshot.<index>/
├── complete.meta
└── *.meta
  • <index>: Command index at which snapshot was taken
  • Contains complete system state at specified index

# File Specifications

File Pattern Format Description
commandLog.0 Binary Initial command log. Created only once after engine reset. Contains first command processed.
commandLog.<n> Binary Sequential command logs. Created after each snapshot. Contains commands starting from index <n>.
snapshot.<n>/ Directory Snapshot directory for command index <n>.
snapshot.<n>/complete.meta Binary Snapshot validation marker. Must exist for snapshot to be considered valid.
snapshot.<n>/*.meta Binary Segment-specific metadata files describing snapshot contents.

# Index Relationships

For a snapshot at index n:

  • Snapshot directory: snapshot.n/
  • Next command log: commandLog.(n+1)
  • Previous command log: commandLog.m where m is the index of the previous snapshot + 1

# Example Directory Layout

persistence/
├── commandLog.0
├── snapshot.287209819/
│   ├── complete.meta
│   ├── balances.meta
│   ├── balances
│   ├── orders.meta
│   ├── orders
│   ├── positions.meta
│   ├── positions
│   └── ...
├── commandLog.287209820

# Multiple Instance Protection and Recovery

# Command File Locking

The Engine implements strict single-writer semantics for the persistence directory:

  • Only one Engine instance can write to the persistence directory at a time
  • Each command file contains metadata including:
    • Completion status flag
    • Process ID (PID) of the writing process
  • Multiple concurrent writers would corrupt the command log sequence

# Startup Safety Checks

On startup, the Engine performs these verification steps:

  1. Checks the most recent command file's completion status
  2. Verifies if another process is actively writing to the file
  3. Takes action based on configuration:
    • Default behavior: Exits immediately if incomplete command file detected
    • Configurable wait period: Can monitor for abandoned files

# Abandonment Detection

The Engine can be configured to handle abandoned command files:

ABANDONMENT_TIMEOUT_MS=<milliseconds>

When configured, the startup process:

  1. Monitors the most recent command file for changes
  2. If file size is changing:
    • Waits indefinitely
    • Assumes active writer exists
  3. If file size remains static:
    • Starts abandonment timer
    • Takes over after ABANDONMENT_TIMEOUT_MS with no changes
  4. If file marked complete during wait:
    • Immediately proceeds with normal startup

An abandoned command file is defined as:

  • Not marked as complete
  • No active writing process (PID no longer exists)
  • Only relevant for the most recent command file
  • Earlier incomplete files assumed complete if later files exist

# Deployment Best Practices

For clustered environments:

  1. Configure cluster manager to enforce single-instance constraint
  2. For Kubernetes:
    • Set maxReplicas: 1
    • Set minReplicas: 0

# Recovery Scenarios

Scenario Default Behavior With Timeout Configured
Clean shutdown Proceeds normally Proceeds normally
Crashed writer process Exits with error Waits for timeout, then takes over
Active writer process Exits with error Waits indefinitely
Incomplete old files Assumes completed Assumes completed

# Validation Rules

  1. Command Logs

    • Must be sequentially numbered
    • Each contain a sequential log of commands
    • Index of the next command file must be n+s where n is the index of the file and s is the number of commands in the file
  2. Snapshots

    • Must contain complete.meta
    • Missing complete.meta indicates invalid snapshot
    • Invalid snapshots are ignored and eligible for overwrite
  3. Index Integrity

    • Next command log index must be snapshot index + 1
    • First command log (index 0)
      • is created immediately after a clean engine state
      • is the necessary starting point for a complete command replay "from the start of time"