#
Version Upgrades
#
Overview
When deploying an upgrade to the Nekuti Matching Engine, operators should follow the below shutdown procedure to ensure state consistency across the upgrade. This process involves creating a snapshot immediately before shutting down, which guarantees that both the old and new versions of the engine agree on the system state.
#
Upgrade Procedure
#
Step 1: Snapshot and Shutdown
Before upgrading the engine, call the SnapshotAndShutdown endpoint (Optionally, specify an exit code for the engine process):
curl -X 'PUT' \
'http://control-gateway:8181/ExchangeWideControls/SnapshotAndShutdown?exitCode=12' \
-H 'accept: application/json'
#
Step 2: Wait for Response
Wait for the endpoint to return its response before proceeding. The engine process will:
- Create a complete snapshot of the current state
- Return the HTTP response
- Exit with the specified exit code (or zero if none provided)
#
Step 3: Upgrade and Restart
Once the endpoint has returned and the engine process has exited, it is safe to:
- Deploy the new version of the engine
- Start the engine immediately
The engine will automatically load the snapshot created in Step 1 and resume operations.
#
Snapshotting before non-upgrade restarts
It is safe to perform this upgrade procedure even when the engine version is not being upgraded.
#
Best Practices
- Schedule upgrades during low-activity periods to minimize the impact of the longer restart time
- Monitor the endpoint response to ensure the snapshot completes successfully
- Test the upgrade procedure in a staging environment before production deployment
#
The Butterfly Effect Problem
#
Why Snapshots Matter for Upgrades
The Nekuti Matching Engine is fully deterministic: given the same sequence of commands, it will always produce the same state. However, this determinism is only guaranteed within a single engine version. Code changes between versions—even seemingly minor ones—can cause different behavior when replaying the same commands.
When the engine starts, it:
- Loads the most recent snapshot
- Replays any commands from the command log that occurred after that snapshot
If the engine version changes without first creating a closing snapshot, the new version could replay commands differently than the old version originally processed them. This can cause a state divergence "butterfly effect": small differences in command processing compound over time into significant state inconsistencies. The engine would then think it executed a different sequence of events than what actually occurred.
#
Version Mismatch Protection
To guard against this, the engine records its version in each command log file. On startup, the engine checks whether the command log was written by an engine of the same version.
By default, the engine refuses to start if it detects a version mismatch between itself and the command log. This prevents accidental state corruption from replaying commands across version boundaries.
If you attempt to start a new engine version against command logs from a different version without a closing snapshot, startup will fail with an error indicating the version mismatch.
#
Disaster Recovery Override
In certain exceptional situations—such as disaster recovery where snapshot-and-shutdown was not possible—you may need to upgrade the engine without a closing snapshot.
To allow the engine to start despite a version mismatch, set the following environment variable:
ALLOW_VERSION_MISMATCH="yes"
With this variable set, the engine will start regardless of whether the command log was written by an engine of the same version.
Warning
Use ALLOW_VERSION_MISMATCH only when absolutely necessary. Replaying commands across versions may produce different state than the original execution. This option should be reserved for disaster recovery scenarios where no other option exists.