# Fault Injection

## Overview

The Fault Injection endpoint is a testing utility that allows operators to simulate various failure scenarios in test and integration environments. This feature enables stress-testing of recovery mechanisms, persistence behaviors, and system resilience without waiting for actual failures to occur.

!!!danger Critical Warning
**NEVER enable fault injection in production environments.** This feature is designed exclusively for testing and can cause system crashes and service disruptions.
!!!

## Enabling Fault Injection

### Environment Variable Configuration

Fault injection must be explicitly enabled at startup by setting the environment variable on the Engine and Control Gateways:

```bash
ENABLE_FAULT_INJECTION=yes
```

### Verification

When fault injection is **disabled** (default), the endpoint returns:
```json
{
  "value": "FeatureNotSupported"
}
```

When fault injection is **enabled**, successful injection returns:
```json
{
  "value": "FaultInjected"
}
```

Note: This response is returned even when injecting `None` fault types or locations, confirming the feature is active.

## Reference

### API Endpoint

```http
PUT /TestingControls/InjectFault
```

**Parameters:**
- `fault` (required): Type of fault to inject
- `faultLocation` (required): Where in the system to inject the fault
- `parameter` (optional): Numeric parameter for certain fault types (e.g., delay duration in milliseconds)

**Example Request:**
```bash
curl -X 'PUT' \
  'http://control-gateway:8181/TestingControls/InjectFault?fault=Delay&faultLocation=MainLoopBeforeResponse&parameter=10000' \
  -H 'accept: application/json'
```

### Fault Types

| Fault Type | Description | Parameter |
|------------|-------------|-----------|
| `None` | Logs an error message but performs no actual fault injection | Not used |
| `ThrowException` | Throws an exception at the specified location | Not used |
| `StackOverflow` | Triggers a stack overflow at the specified location | Not used |
| `OutOfMemory` | Exhausts available memory at the specified location | Not used |
| `Delay` | Sleeps for the specified duration | Milliseconds (long) |
| `GcPressure` | Allocates memory leaving approximately the specified amount free | Bytes (long, minimum ~8MB) |

#### GcPressure Details

The `GcPressure` fault allocates memory to create garbage collection pressure:

- Minimum free memory: ~8MB (enforced automatically)
- If no parameter provided or value < 8MB: defaults to 8MB free
- Re-injecting with different values: frees previous allocation and allocates to new level
- Useful for testing low-memory scenarios and GC behavior

**Example:**
```bash
# Leave only 8MB free (minimum)
curl -X 'PUT' \
  'http://control-gateway:8181/TestingControls/InjectFault?fault=GcPressure&faultLocation=EndPointRequest'

# Leave 100MB free
curl -X 'PUT' \
  'http://control-gateway:8181/TestingControls/InjectFault?fault=GcPressure&faultLocation=EndPointRequest&parameter=104857600'
```

### Fault Locations

| Location | Description                                             | Injection Point |
|----------|---------------------------------------------------------|-----------------|
| `None` | No-op location; returns success without injecting fault | N/A |
| `EndPointRequest` | Gateway process before command submission               | Before engine receives command |
| `EndPointResponse` | Gateway process after successful engine response        | After engine processes command |
| `MainLoopBeforeResponse` | Engine's main loop during command processing            | Before response sent to gateway |
| `MainLoopAfterResponse` | Engine's main loop during command processing            | After response sent to gateway |

#### Location Behavior Notes

- Faults are **only** injected during live command processing
- Faults are **not** injected when:
    - Commands are loaded from command files during recovery
    - Commands are processed by the snapshotter
- This ensures fault injection tests live system behavior without corrupting persistence or permanently disabling the instance

## Explanation

### Examples

#### Testing Crash Recovery
```bash
# Simulate engine crash after processing, but before starting the next command
curl -X 'PUT' \
  'http://control-gateway:8181/TestingControls/InjectFault?fault=ThrowException&faultLocation=MainLoopAfterResponse'
```

#### Testing Gateway Resilience
```bash
# Simulate gateway delay before command submission
curl -X 'PUT' \
  'http://control-gateway:8181/TestingControls/InjectFault?fault=Delay&faultLocation=EndPointRequest&parameter=30000'
```

#### Testing Memory Pressure
```bash
# Create severe memory pressure
curl -X 'PUT' \
  'http://control-gateway:8181/TestingControls/InjectFault?fault=GcPressure&faultLocation=MainLoopBeforeResponse&parameter=10485760'
```

### Heap Dumps

Fatal exits triggered by fault injection may produce heap dump files if a heap dump volume is configured:

**Characteristics:**
- Heap dumps can be very large (multiple gigabytes)
- Generated automatically on fatal errors
- Require periodic cleanup in test environments

### Best Practices

1. **Isolation**: Only enable in dedicated test environments
2. **Documentation**: Log which faults are injected during test runs
3. **Monitoring**: Watch for heap dumps and clean up regularly
