What is Sharely.ai Response?
An incident response simulator that models the full complexity of how your organization responds to incidents. AI actors (Claude Haiku) role-play human responders, communication tool mechanics are simulated, your org's policies are applied, and Time to Notify (TTN) is measured across every stakeholder group.
Core Concepts
Elapsed time from incident start to when each stakeholder group is notified. Measured as a vector across targets.
Your org's baseline: teams, roles, comm tools, policies, service dependencies, signal sources, resource constraints.
A specific incident: narrative, severity, affected services, actors, active signals, and chaos cards.
N iterations of a scenario against an org profile. Each iteration produces a different outcome due to stochastic elements.
A participant in the response. Has a role, team, comm tool, situation state, and phase. Claude Haiku role-plays each one.
A detection mechanism (threshold alert, anomaly detection, SLO burn rate, etc.) that fires at incident start.
A "chaos card" that fires during simulation with a configured probability, adding delays or forcing escalations.
A single pass through detection, triage, escalation, and notification phases. Produces a TTN vector and delay attribution.
Quick Start Guide
Org Profile
Your organization's baseline configuration. Everything here applies to all scenarios unless overridden. Think of it as “how our org works on any given day.”
Access from the sidebar under Org Profile. The overview page shows all sections and current resource constraints.
Teams & Roles
Teams
Each team has a name, timezone(s), on-call rotation style, and a maturity profile that affects simulation behavior:
Roles
Job functions with phase assignments and concurrency limits:
Communication Tools
Comm tools model real delivery mechanics, not just “notification sent.”
- Ack timeout + escalation policy
- Multi-step escalation chains
- Channel noise probability
- Rate limiting at high msg volumes
- Draft time + approval chain
- Template availability
- Legal review for SEV1
- Outage correlation tags
Policies
Org policies govern who can do what during an incident. Each policy is a configurable knob that the simulator enforces:
Who can declare, strictness level, auto-declare conditions
Which SEV levels trigger leadership, customer, partner notifications
Who approves customer comms, statuspage, partner notifications, legal
IC can page any team? Manager approval? Restrictions?
Services & Dependencies
The service dependency graph models how failures propagate through your architecture.
Signal Sources
Signal sources define how incidents are detected. The simulator races all active signals — the first confident signal determines detection time.
Situations
Situations model the real-world context of each actor as a state machine, not a static sample.
Weighted by time of day (asleep more likely at 2am) and triggered by events (cloud_outage triggers vpn_down).
Event Injectors
“Chaos cards” — controlled disruptions that fire during simulation based on probability and trigger conditions.
Each injector has probability (0-1), trigger conditions (phase, severity, tags), and an effect (delay, escalation, severity change, tool outage).
Calibration
Two systems ensure realistic outputs:
- "asleep" must be ≥ 2 min response
- "at_desk" must be ≤ 15 min response
- LLM responses clamped to bounds
- VP cannot be notified before IC paged
- Statuspage cannot post before SEV declared
- Violations reduce plausibility score
Creating Scenarios
A scenario defines a specific incident to simulate:
Actors
Each actor is assigned to a phase which determines when they become active:
The notification target field determines which TTN vector component this actor contributes to: internal_eng, internal_leadership, customers, partners, regulatory.
Running Simulations
Click “Run Simulation” from any scenario detail page. Results stream via SSE to a live swim lane view.
Run Overrides
Experiment knobs. Modify any configuration for a single run without changing the org profile or scenario. Enables apples-to-apples comparisons.
TTN Vector
The primary output. Notification time per target, with distributions (p50, p90, p95, p99) across N iterations.
Secondary Metrics
Delay Attribution
Every delay on the critical path is tagged with a cause category:
Critical Path
The longest chain of dependent events from incident start to each notification target.
Nodes marked in purple are delay bottlenecks. Only delays on the critical path directly extend TTN.
Interventions
After a run completes, the engine generates ranked intervention recommendations:
Evidence from simulation data + one-click re-run override
Each intervention includes: the specific change, category (tooling, policy, staffing, process, monitoring), expected TTN p90 reduction, confidence level, evidence, and a run override config for one-click testing.
Comparisons
View runs side-by-side with overlaid metrics, radar charts, and grouped delay attribution.
Simulation Engine
The engine orchestrates each iteration through four phases:
Also manages situation state machines, resource constraints, dependency graph propagation, and event injector evaluation. All events stream via SSE.
LLM Actors
Each actor step calls Claude Haiku with a structured prompt:
Plausibility Guards
Two guard systems ensure realistic outputs:
Plausibility Bounds
Response times clamped to min/max per situation:
Monotonicity Rules
Event ordering validated per iteration: