# Collect-Replay Investigation Tool

# Overview

The collect-replay.py script is a forensic investigation tool shipped within the janitor's Docker image. It enables extraction of engine state data for debugging and analysis purposes. The tool automatically locates the appropriate snapshot and command files needed to reconstruct engine state for a specific command index or range.

# Purpose

The collect-replay tool extracts relevant data in response to system issues or for post-mortem investigations. It collects the minimal amount of data necessary to reproduce the system conditions at the referenced command indices.

# How It Works

The script follows a systematic process to gather all necessary files:

  1. Snapshot Discovery: Locates the most recent snapshot preceding the target command index
  2. Multi-Location Search: Searches across all storage locations:
    • Active persistence directory (/app/persistence)
    • Archive directory (/app/persistence/archived)
    • Deep storage directory (/app/deep_storage)
  3. Command File Collection: Gathers all command log files from the snapshot through the target range
  4. Archive Creation: Packages everything into a compressed ZIP file
  5. Safe Storage: Places the archive in the investigations volume or fallback location

# Usage

# Basic Syntax

docker exec <janitor_container> /app/collect-replay.py <start_index> [end_index]

# Single Command Index

Extract data for investigation of a specific command:

docker exec janitor /app/collect-replay.py 4280361412

This creates: investigate.4280361412.zip

# Command Range

Extract data covering a range of commands:

docker exec janitor /app/collect-replay.py 4280361412 4281361512

This creates: investigate.4280361412-4281361512.zip

# Parameters

Parameter Description Required
start_index Command index to start investigation from Yes
end_index Command index to end investigation at No

If end_index is omitted, only the single start_index command is targeted.

# Output Location

The script attempts to write output files in the following priority order:

  1. Primary: /app/investigations/ (mounted volume)
  2. Fallback: /tmp/ (if investigations volume unavailable)

# File Naming Convention

investigate.<start_index>[-<end_index>].zip

Examples:

  • Single index: investigate.4280361412.zip
  • Range: investigate.4280361412-4281361512.zip

# Archive Contents

Each generated ZIP file contains:

  • Complete snapshot directory for the most recent snapshot before the target range
  • All command log files from the snapshot index through the target range
  • Sequential command files ensuring complete state reconstruction

# Example Archive Structure

investigate.4280361412-4281361512.zip
├── snapshot.4280000000/
│   ├── complete.meta
│   ├── balances.meta
│   ├── balances
│   ├── orders.meta
│   ├── orders
│   └── ...
├── commandLog.4280000001
├── commandLog.4280300000
├── commandLog.4280361000
└── commandLog.4281000000

# Validation

The script validates:

  • Command index format and range
  • File accessibility across storage tiers
  • Output directory write permissions
  • Archive creation success

# Security Considerations

# Data Sensitivity

  • Investigation archives contain complete trading state
  • Implement appropriate access controls on investigations volume
  • Consider encryption for sensitive production data

# Container Security

  • Script runs within janitor container context
  • Inherits container's access permissions
  • Limited to mounted volumes and configured storage locations