# Engine Persistence and Disk Usage

## Overview

The Engine persists its state through two mechanisms:

1. Command logs: Sequential records of all received commands
2. Snapshots: Point-in-time captures of complete engine state

## Command Logs

### Structure

- Every command received by the Engine is written to command-log files, in order
- Minimum activity: One clock-tick every 10ms
- Each order consumes approximately 500 bytes

### Daily Disk Usage Scenarios

| Scenario                  | Command Log Usage |
|---------------------------|-------------------|
| Minimum Daily (No Orders) | ~300 MB           |
| +100,000 Orders           | +50 MB            |
| +1,000,000 Orders         | +500 MB           |
| +10,000,000 Orders        | +5 GB             |
| +100,000,000 Orders       | +50 GB            |

## Snapshots

### Characteristics

- Contains complete engine state
- Generated every ~8 minutes by the Snapshotter process
- Each snapshot is indexed to a specific command in the command log
- Minimum size: 1.5 MB per snapshot
- Approximately 300 MB per day for minimal usage

### Size Factors

Snapshot size varies based on:

- Number of instruments
- Transaction volume
- Active Orders
- Price tick frequency
- Overall system activity

Size range: 1.5 MB to several GB per snapshot

## State Recovery Process

```mermaid
flowchart TD
A[Start Engine] --> B{Snapshot Available?}

    B -->|Yes| C[Load Most Recent Snapshot]
    B -->|No| D[Find commandLog.0]
    
    C --> E[Find Commands Since Snapshot]
    D --> F[Process All Command Files]
    
    E --> G[Replay Commands]
    F --> G
    
    G --> H[Engine Ready]
    
    style A fill:#f9f,stroke:#333
    style H fill:#9f9,stroke:#333
```

## File Management

### Archival Process

The Snapshotter automatically:

- Moves older command logs to archive directory
- Moves older snapshots to archive directory
- Never modifies archived files

### Recommended Management Practices

1. File System Management:
    - Regularly compress archived files
    - Transfer archives to secondary storage
    - Remove intermediate snapshots as needed
    - Maintain a minimum snapshot frequency (e.g., daily)

2. Storage Planning:
    - Monitor growth rate of command logs
    - Track snapshot size trends
    - Ensure adequate primary storage capacity
    - Implement archive retention policy

## Technical Considerations

### Disk Space Planning

- Active systems may generate multiple GB daily
- Consider both command logs and snapshots in capacity planning
- Monitor growth rate during peak trading periods

### Performance Impact

- Startup time depends on:
    - Size of latest snapshot
    - Number of commands since snapshot
    - Storage system performance

### Recovery Capabilities

- Any snapshot provides full system state
- Intermediate snapshots are safe to delete
- Maintain at least one recent snapshot at all times

## Persistence Directory Reference

### File Format
All persistence files use a proprietary binary format which is subject to change.

### Directory Structure

#### Command Logs
```
commandLog.<index>
```
- `<index>`: Integer representing the first command index in the file
- Files are sequentially numbered
- New file created with each snapshot generation

#### Snapshot Directories
```
snapshot.<index>/
├── complete.meta
└── *.meta
```
- `<index>`: Command index at which snapshot was taken
- Contains complete system state at specified index

### File Specifications

| File Pattern | Format | Description |
|--------------|---------|-------------|
| `commandLog.0` | Binary | Initial command log. Created only once after engine reset. Contains first command processed. |
| `commandLog.<n>` | Binary | Sequential command logs. Created after each snapshot. Contains commands starting from index `<n>`. |
| `snapshot.<n>/` | Directory | Snapshot directory for command index `<n>`. |
| `snapshot.<n>/complete.meta` | Binary | Snapshot validation marker. Must exist for snapshot to be considered valid. |
| `snapshot.<n>/*.meta` | Binary | Segment-specific metadata files describing snapshot contents. |

### Index Relationships

For a snapshot at index `n`:
- Snapshot directory: `snapshot.n/`
- Next command log: `commandLog.(n+1)`
- Previous command log: `commandLog.m` where `m` is the index of the previous snapshot + 1

### Example Directory Layout
```
persistence/
├── commandLog.0
├── snapshot.287209819/
│   ├── complete.meta
│   ├── balances.meta
│   ├── balances
│   ├── orders.meta
│   ├── orders
│   ├── positions.meta
│   ├── positions
│   └── ...
├── commandLog.287209820
```

### Multiple Instance Protection and Recovery

#### Command File Locking

The Engine implements strict single-writer semantics for the persistence directory:

- Only one Engine instance can write to the persistence directory at a time
- Each command file contains metadata including:
   - Completion status flag
   - Process ID (PID) of the writing process
- Multiple concurrent writers would corrupt the command log sequence

#### Startup Safety Checks

On startup, the Engine performs these verification steps:

1. Checks the most recent command file's completion status
2. Verifies if another process is actively writing to the file
3. Takes action based on configuration:
   - Default behavior: Exits immediately if incomplete command file detected
   - Configurable wait period: Can monitor for abandoned files

#### Abandonment Detection

The Engine can be configured to handle abandoned command files:

```
ABANDONMENT_TIMEOUT_MS=<milliseconds>
```

When configured, the startup process:

1. Monitors the most recent command file for changes
2. If file size is changing:
   - Waits indefinitely
   - Assumes active writer exists
3. If file size remains static:
   - Starts abandonment timer
   - Takes over after ABANDONMENT_TIMEOUT_MS with no changes
4. If file marked complete during wait:
   - Immediately proceeds with normal startup

An abandoned command file is defined as:
- Not marked as complete
- No active writing process (PID no longer exists)
- Only relevant for the most recent command file
- Earlier incomplete files assumed complete if later files exist

#### Deployment Best Practices

For clustered environments:

1. Configure cluster manager to enforce single-instance constraint
2. For Kubernetes:
   - Set `maxReplicas: 1`
   - Set `minReplicas: 0`

#### Recovery Scenarios

| Scenario | Default Behavior | With Timeout Configured |
|----------|-----------------|------------------------|
| Clean shutdown | Proceeds normally | Proceeds normally |
| Crashed writer process | Exits with error | Waits for timeout, then takes over |
| Active writer process | Exits with error | Waits indefinitely |
| Incomplete old files | Assumes completed | Assumes completed |

### Validation Rules

1. Command Logs
    - Must be sequentially numbered
    - Each contain a sequential log of commands
    - Index of the next command file must be `n+s` where `n` is the index of the file and `s` is the number of commands in the file

2. Snapshots
    - Must contain `complete.meta`
    - Missing `complete.meta` indicates invalid snapshot
    - Invalid snapshots are ignored and eligible for overwrite

3. Index Integrity
    - Next command log index must be snapshot index + 1
    - First command log (index 0) 
      - is created immediately after a clean engine state
      - is the necessary starting point for a complete command replay "from the start of time" 
