We present cyber-redteam-sim, a high-fidelity network attack and defense simulation engine that models over 45 MITRE ATT&CK techniques across the full cyber kill chain. The engine provides a discrete-event simulation environment where red team operators execute realistic attack sequences against modeled enterprise networks, while blue team defenders detect, correlate, and respond to adversarial activity based on configurable maturity levels. We introduce three key innovations: (1) an attack graph pathfinding system using A* search with state-dependent pruning to identify optimal compromise paths through network topologies, producing risk-scored paths that balance success probability against stealth; (2) a cumulative stealth model where detection probability grows with successive actions and decays over time, creating realistic operational tempo tradeoffs; and (3) adversarial machine learning agents using tabular Q-learning and REINFORCE policy gradients that learn attack strategies through simulated episodes. The simulation includes probabilistic cloud attack modeling, C2 infrastructure simulation with redirectors and dead drops, IEEE 1278.1 DIS federation for multi-exercise interoperability, and comprehensive after-action review with automated scoring. We demonstrate that the engine can simulate complex multi-stage attack campaigns against enterprise Active Directory environments, producing actionable detection gap analysis and remediation recommendations.
Modern enterprise networks face sophisticated, multi-stage cyber attacks that traverse the full kill chain from initial reconnaissance through data exfiltration. Traditional penetration testing and red team exercises, while valuable, are limited by cost, scope, and the difficulty of safely testing defensive capabilities against realistic adversarial behavior. Organizations need the ability to model attack paths, evaluate detection coverage, and train both offensive and defensive teams in a safe, repeatable environment.
cyber-redteam-sim addresses this gap by providing a high-fidelity simulation engine that models the complete adversarial lifecycle. Unlike network vulnerability scanners that identify weaknesses in isolation, our engine simulates attack campaigns—sequences of techniques that build upon each other, where successful exploitation creates new opportunities for lateral movement and privilege escalation. The engine models 45+ techniques mapped to the MITRE ATT&CK framework [1], spanning 11 tactics from reconnaissance through command and control.
The key contributions of this work are:
The remainder of this paper describes the architecture (Section 2), attack modeling (Section 3), network topology (Section 4), defense modeling (Section 5), attack graph pathfinding (Section 6), ML agents (Section 7), stealth mechanics (Section 8), cloud attack modeling (Section 9), C2 infrastructure (Section 10), purple team operations (Section 11), after-action review (Section 12), federation (Section 13), and performance (Section 14).
The simulation engine is implemented as a Go module organized into 18 packages, producing a single ~2.8 MB static binary. This monolithic design prioritizes deployment simplicity and deterministic reproducibility while maintaining clear internal package boundaries.
The engine operates as a tick-based discrete event simulation. Each tick represents a configurable time interval (default: 1 second). The core loop executes the following phases in order:
Scenarios are defined in YAML files specifying network topology, host configurations, security controls, and red team objectives. The engine provides an interactive configuration wizard with four built-in templates:
| Template | Description | Hosts | Objective |
|---|---|---|---|
enterprise-ad | Corporate AD environment | 5+ | Domain admin |
dmz-breach | Multi-zone DMZ to internal | 3+ | Data exfiltration |
cloud-kill-chain | AWS/Azure multi-cloud | 4+ | Cloud admin |
apt-persistence | Long-duration APT | 5+ | Persistence + C2 |
The attack package implements 45+ MITRE ATT&CK techniques spanning the full kill chain. Each technique is modeled with probabilistic success rates, detection probabilities, and realistic state transitions.
Techniques are organized by ATT&CK tactic. Each technique in the catalog specifies:
For example, technique T1190 (Exploit Public-Facing Application) has a base success probability of 0.6, detection base rate of 0.2, no prerequisites, cost of 0.3, and noise level of 0.4. By contrast, T1558.001 (Kerberoasting) has success 0.75, detection 0.35, requires ad-access and ldap-query prerequisites, cost 0.15, and noise 0.3—reflecting its higher sophistication and lower visibility.
The engine's red team planner follows a 12-phase progression that mirrors the ATT&CK kill chain:
Success probabilities are not static; they are modulated by target host properties:
$$P_{\text{success}} = P_{\text{base}} \times (1 - \text{patchLevel} \times 0.5) \times (1 - \text{hardeningLevel} \times 0.2)$$For lateral movement, the formula additionally incorporates credential strength and target hardening:
$$P_{\text{lateral}} = 0.6 \times (1 - \text{hardeningLevel}_{\text{target}} \times 0.3) + \text{credentialBonus}$$This creates realistic scenarios where a fully patched, hardened host with EDR level 3 is significantly harder to compromise than an unpatched workstation with no endpoint protection.
The network package models enterprise networks as directed graphs of hosts connected by links with firewall rules. This is the foundation upon which all attack and defense operations are built.
Each host is modeled with rich properties that directly affect attack and defense simulation:
| Property | Type | Description |
|---|---|---|
patch_level | 0–1 | Fraction of vulnerabilities patched; reduces exploit success |
hardening_level | 0–1 | CIS hardening compliance; reduces all attack success rates |
edr_level | 0–3 | Endpoint detection capability (0=none, 3=advanced) |
av_level | 0–2 | Anti-virus capability (0=none, 2=heuristic) |
log_level | 0–3 | Logging verbosity; affects SIEM detection probability |
zone | string | Network zone: dmz, internal, management, cloud |
role | string | Host role: workstation, server, dc, firewall, router |
services | [] | Running services with port, protocol, version, banner |
vulnerabilities | [] | CVE entries with CVSS, exploit availability, remediation status |
The network is segmented into zones (DMZ, internal, management, cloud) with links that specify bandwidth, latency, and firewall rules between subnets. Reachability is computed on demand:
func (n *Network) IsReachable(srcIP, dstIP string, port int, proto string) (bool, *FirewallRule)
This function traverses the link graph between source and destination subnets, evaluating firewall rules at each hop. It enables realistic scenarios where lateral movement is constrained by network segmentation, and attackers must find paths through allowed ports and protocols.
The network model includes Active Directory domains with:
This enables realistic Kerberoasting, AS-REP roasting, pass-the-ticket, and DCSync attack simulation with protocol-level fidelity.
The defense package implements a multi-layered detection and response system that scales with organizational maturity. The model is designed to produce realistic detection gaps that mirror real-world blue team limitations.
The engine defines five security maturity levels, each unlocking progressively more capable detection and response:
| Level | Name | Capabilities | MTTD | MTTC |
|---|---|---|---|---|
| 1 | Basic | AV only, no SIEM, no IR plan | Days | Weeks |
| 2 | Intermediate | SIEM + basic EDR, IR playbook | Hours | Days |
| 3 | Advanced | NDR + threat hunting, 24/7 SOC | Minutes | Hours |
| 4 | Expert | ML detection, purple team, automated response | Seconds | Minutes |
| 5 | Elite | Full telemetry, proactive defense, zero-trust | Real-time | Automated |
The correlation engine aggregates log events within time windows and triggers alerts when correlation rules fire. It implements threshold-based detection (N events of type X within window Y) and cross-source correlation (auth log + network log + process log matching pattern). Each action in the simulation emits structured log events that are ingested by the correlation engine for pattern matching.
The EDR monitor implements process-level detection with capability levels that scale with maturity:
The IR engine models the full incident response lifecycle: detect → analyze → contain → eradicate → recover. Cases advance through phases with realistic time delays based on maturity level. Containment actions include host isolation, account disabling, and firewall blocking. The engine tracks case histories, response times, and outcome effectiveness.
The attack graph module is the engine's strategic reasoning layer. It generates a directed graph from network topology and computes optimal attack paths using A* search with state-dependent pruning.
An attack graph consists of nodes representing (host, privilege-level) pairs and edges representing technique-executable transitions. For example, node dc01:3 represents domain admin privilege on the domain controller, while web01:1 represents user-level access on a web server.
Edges are annotated with:
The attack state machine defines 17 state keys that gate technique availability:
| State Key | Description | Granted By |
|---|---|---|
foothold | Initial access achieved | Any user-level compromise |
admin-access | Local admin on any host | Privilege escalation to admin |
admin-creds | Administrator credentials obtained | Credential dumping, Kerberoasting |
ntlm-hash | NTLM hash available | Credential dumping (T1003) |
kerberos-ticket | Kerberos TGT/TGS available | Kerberoasting, AS-REP roasting |
ad-access | Authenticated AD access | Domain credentials or Kerberos tickets |
c2-established | Active C2 channel | C2 setup (T1071) |
data-collected | Data staged for exfiltration | Data discovery (T1005) |
domain-admin | Domain admin achieved | DC compromise + admin escalation |
This prerequisite system ensures that the attack planner cannot, for example, execute Kerberoasting without first achieving AD access, or exfiltrate data without collecting it. It creates a realistic dependency chain that mirrors actual adversarial campaigns.
The FindPaths function searches for attack paths from an initial state to a target (host, privilege) node. It uses a priority queue ordered by risk score:
The algorithm maintains an attack state that tracks achieved prerequisites. At each expansion step, edges whose prerequisites are not satisfied are pruned. This produces paths that are not only optimal in terms of risk score, but also feasible—every technique in the path has its prerequisites met by prior successful techniques.
The FindFeasiblePaths variant starts from currently achieved state nodes (rather than a single start node), enabling real-time path recommendations as the attack progresses. The NextBestAction function returns the first edge of the highest-risk feasible path, providing an AI agent's action recommendation.
Multiple paths to the same objective can be compared along different dimensions:
The stealth scorer's AnalyzeTradeoff function explicitly computes both approaches and recommends based on the cumulative detection risk threshold. If the fast approach has detection risk below 0.5, it is recommended; if the quiet approach has significantly lower risk (<50% of fast), stealth is prioritized; otherwise, a balanced approach is suggested.
The ML package implements two adversarial learning agents: tabular Q-learning and REINFORCE policy gradients. Both agents learn to select attack actions that maximize a composite reward function balancing objective achievement, stealth, and detection avoidance.
The simulation state is encoded as a feature vector with per-host features and global features:
Per-host features (7 dimensions): Compromised (binary), privilege level (0–3), EDR level (0–3 normalized), patch level (0–1), hardening level (0–1), service count (normalized), vulnerability count (normalized).
Global features (5 dimensions): Normalized tick count, fraction of hosts compromised, detection rate, stealth score, credential pool size.
A deterministic hash of the state vector enables Q-table lookup while maintaining sufficient state discrimination.
The action space is the Cartesian product of technique IDs × source hosts × target hosts. Valid actions at any state are those where the source host is compromised (or external) and the target is not yet compromised at the same or higher privilege level.
The composite reward function is:
$$R = R_{\text{tick}} + N_{\text{new}} \times R_{\text{compromise}} + R_{\text{domain}} + R_{\text{objective}} + R_{\text{detect}} + S_{\text{stealth}} \times \beta - R_{\text{cumulative}} \times \gamma$$Where:
Rtick = −2 — Time penalty encouraging faster solutionsRcompromise = +5 — Reward for each new host compromisedRdomain = +10 — Large reward for achieving domain adminRobjective = +3 — Reward for completing the objectiveRdetect = −5 — Penalty for being detectedSstealth × 2 — Bonus for high stealth actionsRcumulative × 3 — Penalty for accumulated detection riskThe Q-learning agent uses ε-greedy exploration with decaying ε and experience replay. The Q-table is indexed by state hash and action index. The update rule is:
$$Q(s, a) \leftarrow Q(s, a) + \alpha \left[ r + \gamma \max_{a'} Q(s', a') - Q(s, a) \right]$$Default hyperparameters: learning rate α=0.1, discount factor γ=0.95, initial ε=1.0 with decay 0.995 and minimum 0.01. Experience replay buffer holds up to 10,000 transitions with batch sampling.
The policy gradient agent uses a linear policy with softmax action selection and a learned baseline:
$$\pi(a|s) = \text{softmax}(W_a \cdot s + b_a)$$The update uses advantage estimation with a running average baseline:
$$\Delta W_{a,s} = \alpha \times (G_t - b) \times (\delta_{a,a_t} - \pi(a_t|s_t)) \times s_t$$Weights are initialized with Xavier initialization (scale = √(2/(dimstate + dimaction))). Discount factor γ=0.99, baseline learning rate 0.01.
The package includes random and greedy baselines for evaluation. The random agent selects uniformly from valid actions, while the greedy agent always picks the last valid action (typically the most aggressive). Training runs generate EpisodeResult records tracking total reward, ticks, win condition, stealth average, detections, and unique attack paths discovered.
The stealth package implements a cumulative detection probability model that creates realistic operational tempo tradeoffs. Unlike simpler models that treat each action's detection independently, our model accounts for the accumulated noise of prior operations.
Each technique is characterized by a noise profile with six dimensions:
| Dimension | Description | Range |
|---|---|---|
BaseNoise | Inherent detection probability | 0–1 |
EDRBypass | How well it evades EDR | 0–1 (higher = better evasion) |
SIEMBypass | How well it evades SIEM correlation | 0–1 |
LogFootprint | Log evidence left behind | 0–1 |
NetworkNoise | Network visibility of the action | 0–1 |
DiskFootprint | Disk artifacts left behind | 0–1 |
For example, T1071.004 (C2 via DNS) has BaseNoise=0.15, EDRBypass=0.7, SIEMBypass=0.6, LogFootprint=0.2, NetworkNoise=0.3, DiskFootprint=0.02—reflecting its low visibility but limited bandwidth. By contrast, T1110 (Brute Force) has BaseNoise=0.8, EDRBypass=0.1, NetworkNoise=0.7—reflecting its inherently noisy nature.
The stealth score for each action is computed as:
$$\text{stealth} = (1 - \text{BaseNoise}) \times (1 - \text{EDR}_{\text{level}} \times 0.1 \times \text{EDRBypass}) \times \text{techniqueModifier} \times \text{opsecSafeBonus}$$The detection risk for each action incorporates cumulative noise:
$$P_{\text{detect}} = (1 - \text{stealth}) \times (1 + \text{cumulativeRisk} \times 0.5)$$After each action, cumulative risk increases:
$$\text{cumulativeRisk} \mathrel{+}= (1 - \text{stealth}) \times 0.3$$And decays by 5% per tick:
$$\text{cumulativeRisk} \mathrel{*}= (1 - 0.05)$$This creates a natural tension: rapid successive actions increase detection risk, while pauses allow it to decay. Operators must balance speed against stealth, just as real adversaries do.
The overall probability of detection across all actions is computed using Bayesian combination:
$$P(\text{detected}) = 1 - \prod_{i} \left( 1 - P_i \times e^{-0.05 \times \text{age}_i} \right)$$
Where agei is the number of ticks since action i. This weighting ensures recent actions contribute more to detection risk than older ones, modeling the defender's recency bias and log retention.
The cloud package models probabilistic attacks against AWS and Azure environments with defender maturity scaling. Unlike on-premises attacks, cloud attacks are modeled as probabilistic outcomes rather than simulation steps, with success and detection probabilities that scale with the target's cloud security posture.
A cloud environment includes:
Three primary attack models are implemented:
Enumerates resources, public storage, managed identities, and MFA-less users. Detection probability uses a sigmoid model:
$$P_{\text{detect}} = \text{maturity} \times 0.03 + \sigma\!\left(\frac{\text{volume}}{20}\right) \times 0.3 \times (1 + \text{maturity} \times 0.3)$$Where σ is the logistic function. This models the reality that low-volume enumeration is nearly invisible, while mass enumeration triggers anomaly detection.
Models five escalation paths: managed identity role abuse, password spray against non-MFA users, over-privileged app consent, credential expiry exploitation, and public storage access. Each path has base success and detection rates that are modified by defender maturity and specific controls (PIM, conditional access, Azure Policy).
Models seven exfiltration channels (S3 presigned URLs, Azure Blob SAS, cloud functions, API gateways, DNS tunneling, s3 sync, azcopy) with method-specific detection curves:
The C2 module models adversary command and control infrastructure including redirectors, implants, dead drops, and beacon behavior. This models the full C2 lifecycle from deployment through detection.
Each compromised host can establish a C2 channel from available profiles, which are selected based on the host's network position and available services. Profiles specify:
The C2 infrastructure model includes:
The CheckInfrastructureResilience function evaluates the C2 infrastructure's resilience by testing redirector availability, implant health, and detection surface area.
C2 beacons are simulated every 5 ticks. Each beacon has a detection probability determined by the channel's stealth characteristics and defender maturity. DNS tunneling beacons are additionally checked for pattern anomalies using the protocol-level DNS simulation.
The team package implements multi-team exercise support with four team roles (red, blue, purple, white) and three interaction modes (competitive, cooperative, purple).
An exercise is configured with rules that govern engagement:
In purple team mode, red and blue teams share intelligence to maximize learning:
White team observers have full visibility into all team actions, detections, and communications. This enables exercise controllers to:
The AAR package generates comprehensive reports from simulation data, and the replay package provides time-travel debugging capabilities.
Each simulation produces a report containing:
The replay system captures periodic state snapshots (configurable interval) and event logs. It supports:
The federation package implements an IEEE 1278.1 DIS (Distributed Interactive Simulation) bridge that enables cyber-redteam-sim to interoperate with other simulation platforms, including military simulation systems and physical training environments.
Cyber events are mapped to DIS PDU types as follows:
| Cyber Event | DIS PDU Type | Notes |
|---|---|---|
| Exploit attempt | Fire PDU | Munition type = cyber exploit category |
| Host compromise | Detonation PDU | Result: 2=partial, 3=full compromise |
| C2 communication | Signal PDU | Payload contains C2 details as JSON |
| Detection/alert | Event Report PDU | Event type = cyber detection/alert |
| Host state change | Entity State PDU | Regular heartbeat (default: 5s) |
| Scanning | Event Report PDU | Cyber scan event type |
Network hosts are registered as DIS entities with:
The federation manager handles lifecycle (join/leave), heartbeat publication, event translation (bidirectional), and remote entity tracking. It supports both async (heartbeat-based) and synchronous (tick-based) PDU publication.
The engine compiles to a single static binary of approximately 2.8 MB (Linux amd64), with no external runtime dependencies. This small footprint enables deployment in constrained environments including embedded systems and air-gapped networks.
| Platform | Format | Size |
|---|---|---|
| Linux amd64 | .tar.gz / .deb | ~2.8 MB |
| Linux arm64 | .tar.gz | ~2.5 MB |
| macOS Intel | .tar.gz | ~2.8 MB |
| macOS ARM | .tar.gz | ~2.6 MB |
| Windows amd64 | .zip | ~2.8 MB |
| Docker | stsgym/cyber-redteam-sim:latest | Alpine-based |
The engine provides three API interfaces:
The engine is distributed through multiple channels:
brew install wezzels/tap/cyber-redteam-sim)Cross-compilation and release automation are handled by GoReleaser with GitHub Actions CI/CD for testing and GitLab CI for internal distribution.
We have presented cyber-redteam-sim, a high-fidelity network attack and defense simulation engine that models the complete adversarial lifecycle from initial reconnaissance through data exfiltration. The engine's three key innovations—state-dependent attack graph pathfinding, cumulative stealth with temporal decay, and adversarial ML agents—address fundamental gaps in existing simulation tools.
The attack graph module enables strategic reasoning about multi-step compromise paths, producing risk-scored paths that balance success probability against detection risk. The prerequisite state machine ensures that only feasible paths are considered, while the A* search efficiently finds optimal routes through the exponentially large space of possible attack sequences.
The cumulative stealth model creates realistic operational tempo constraints where attackers must balance speed against detection risk. The Bayesian cumulative detection probability, weighted by temporal decay, models how defenders' awareness builds with repeated suspicious activity—a phenomenon well-documented in real incident response but absent from simpler simulation models.
The ML agents demonstrate that both tabular Q-learning and REINFORCE policy gradient methods can learn effective attack strategies through episodic simulation. The Q-learning agent's experience replay and the policy gradient's advantage estimation provide complementary learning dynamics: Q-learning excels at learning precise state-action values for frequently visited states, while REINFORCE generalizes better across similar states.
The engine's protocol-level fidelity (TCP state machines, SMB negotiation, Kerberos exchanges, LDAP queries), probabilistic cloud attack modeling, C2 infrastructure simulation, and comprehensive defense modeling (SIEM correlation, EDR monitoring, incident response playbooks) produce simulation outputs that closely mirror real-world red team engagement results.
Future work includes extending the cloud attack models to GCP, adding ICS/SCADA protocol simulation (Modbus, DNP3), implementing Active Directory schema modeling with full ACL/ACE support, and exploring Monte Carlo Tree Search as an alternative to A* for attack graph exploration in environments with stochastic outcomes.
https://attack.mitre.org/https://shield.mitre.org/