Incident Analysis Intelligence
● System Healthy
Total Agents
6
3 extraction · 3 decision
Tests Passing
12 / 12
100% pass rate
Incidents Analyzed
8
Last 30 days
Avg MTTR
47 min
Across SEV1 incidents
Agent Confidence Scores
IncidentParser
TimelineExtractor
ActionExtractor
TaskPrioritizer
OwnerAssigner
EscalationDecider
Recent Incident Analysis
IncidentSeverityRoot CauseStatus
INC-2024-087SEV1DB connection pool exhaustionResolved
INC-2024-083SEV2Cache invalidation race conditionResolved
INC-2024-079SEV1Certificate expirationResolved
INC-2024-076SEV3Memory leak in workerMonitoring
Action Items from Postmortems
IDAction ItemPriorityOwnerStatus
ACT-001Implement connection pool monitoring and alertingP0SRE TeamIn Progress
ACT-002Add circuit breakers to all database connectionsP0Platform TeamComplete
ACT-003Create automated certificate rotation pipelineP1DevOpsIn Progress
ACT-004Update runbook for DB connection exhaustionP1On-callComplete

🔥 What is POSTMORTEM?

POSTMORTEM (Incident Analysis Intelligence) is a multi-agent system that processes post-incident reports and outage documentation. It automatically reconstructs incident timelines, identifies root causes, extracts remediation action items, assigns follow-up owners, and escalates unresolved risks — turning chaotic incident data into structured, trackable improvements.

⚡ How It Works

Submit an incident report or post-mortem document. POSTMORTEM runs 6 agents in sequence:

Step 1 — Extraction
📄 IncidentParser
Parses the incident report structure — identifies severity, affected services, detection method, impact scope, and resolution status from the raw documentation.
Step 2 — Extraction
⏰ TimelineExtractor
Reconstructs the full incident timeline — detection, alerting, response, mitigation, and resolution timestamps. Identifies gaps where response was delayed.
Step 3 — Extraction
✅ ActionExtractor
Extracts all remediation actions, follow-up tasks, and process improvements recommended in the post-mortem. Captures deadlines and acceptance criteria.
Step 4 — Decision
📈 TaskPrioritizer
Prioritizes remediation tasks based on recurrence risk, blast radius, and customer impact. Systemic infrastructure fixes get highest priority.
Step 5 — Decision
👤 OwnerAssigner
Assigns each remediation action to the responsible team — SRE for infrastructure, backend for code fixes, platform for tooling improvements.
Step 6 — Decision
🚨 EscalationDecider
Flags incidents requiring executive attention — recurring outages, data loss events, SLA breaches with financial impact, or unresolved systemic risks.

🎬 Live Agent Pipeline Demo

Watch how POSTMORTEM processes an incident report through all 6 agents in real-time

INCIDENT: Database cluster failover at 03:42 UTC
Severity: SEV1   Duration: 2h 17m   Impact: 100% of write operations failed
Root Cause: Disk I/O saturation triggered by unindexed query in batch job deployed at 03:30 UTC
Remediation: Rollback batch job, add missing index, implement query cost guards in CI pipeline
📄
Step 1
IncidentParser
Idle
SEV1 · 2h17m · Database cluster
📅
Step 2
TimelineExtractor
Idle
4 events: deploy → saturation → failover → fix
Step 3
ActionExtractor
Idle
3 actions: rollback, index, CI guard
📈
Step 4
TaskPrioritizer
Idle
P0: Index fix · P1: CI guard
👤
Step 5
OwnerAssigner
Idle
DBA → Index · Platform → CI guard
🚨
Step 6
EscalationDecider
Idle
⚠ SEV1 → VP Eng review required
✅ Pipeline Complete — 6 agents processed in 3.9s
4
Timeline Events
3
Actions
2
Assigned
1
Escalations

🎯 Real-World Use Cases

SEV1/SEV2 Post-Incident Reviews
Automatically structure post-mortem documents into actionable remediation plans with clear owners, deadlines, and priority levels.
Root Cause Analysis
Extract and categorize root causes across hundreds of incidents to identify systemic patterns — infrastructure, deployment, or human process failures.
Remediation Tracking
Track follow-up action item completion across SRE, engineering, and platform teams. Ensure post-mortem commitments are actually delivered.
Incident Pattern Detection
Aggregate data across all incidents to detect recurring failure modes, measure MTTR improvements, and track reliability trends over quarters.

🏗 Architecture

🔄
Orchestration Engine
Coordinates all 6 agents sequentially from parsing through escalation, managing incident state transitions.
🛡
Circuit Breaker Recovery
Per-agent fault isolation ensures timeline extraction failures don't block action item processing or escalation.
📋
Full Audit Trail
Every extraction and decision is logged with timestamps and confidence scores for post-incident accountability.
🔥 Submit Incident Report for Analysis LIVE AGENTS
📄
Drag & drop an incident report, or click to browse
.txt .md supported
Live Agent Results
Extracted Action Items
ActionPriorityOwner

— or analyze a sample incident —
📄
Incident
Parser
Extraction
Idle
Timeline
Extractor
Extraction
Idle
Action
Extractor
Extraction
Idle
📈
Task
Prioritizer
Decision
Idle
👤
Owner
Assigner
Decision
Idle
🚨
Escalation
Decider
Decision
Idle
Analysis Log
Submit an incident report to begin analysis…
Action Items
Timeline Events
Analysis Time
Total Tests
12
Passed
12
Failed
0
Duration
2.01s
Agent Tests
TestIncidentParser
test_incident_parsing0.24sPASS
test_empty_report0.08sPASS
TestTimelineExtractor
test_timeline0.19sPASS
TestActionExtractor
test_action_items0.16sPASS
TestTaskPrioritizer
test_prioritization0.11sPASS
TestOwnerAssigner
test_assignment0.14sPASS
Infrastructure Tests
TestRecoveryStrategies
test_retry_strategy0.05sPASS
test_recovery_manager0.07sPASS
TestCircuitBreaker
test_circuit_closed0.04sPASS
test_circuit_opens0.03sPASS
TestAuditLogger
test_log_event0.09sPASS
test_audit_export0.10sPASS
Total Events
36
Incidents Analyzed
8
Avg Confidence
84%
Escalations
3
Analysis Audit Timeline
03:14:00 · alert
SEV1 Declared: API Gateway Outage
PagerDuty alert triggered · On-call engineer paged
03:14:22 · agent
IncidentParser → Complete
DB connection pool exhaustion identified as root cause
03:14:45 · agent
TimelineExtractor → 8 events mapped
Impact window: 03:14 – 04:01 UTC (47 min)
03:15:01 · agent
ActionExtractor → 5 action items
2 P0 (immediate), 2 P1 (this sprint), 1 P2 (next quarter)
03:15:20 · decision
Priority: 2 items escalated to VP Engineering
Connection pool limits + monitoring gaps flagged
03:15:32 · workflow
Analysis Complete
Full timeline, root cause, and remediation plan generated
Root Cause Classification
🔥 Root Cause: DB Connection Pool Exhaustion
• Connection pool max: 100, active: 100/100
• Triggered by: Traffic spike + connection leak in auth-service
• Contributing: No pool monitoring alerts configured
• Impact: 100% API failure for 47 minutes
✅ Remediation Steps
• Increased pool max to 200
• Fixed connection leak in auth-service v2.4.1
• Added connection pool utilization alerts
• Implemented circuit breaker on DB connections
🕑 Impact Timeline
• 03:14 UTC — First errors detected
• 03:18 UTC — On-call paged
• 03:28 UTC — Root cause identified
• 03:45 UTC — Fix deployed
• 04:01 UTC — Full recovery confirmed
Circuit Breakers
6
All closed
Recoveries
4
Retry Rate
100%
Uptime
99.9%
Circuit Breaker Status
IncidentParser
CLOSED
0 failures
TimelineExtractor
CLOSED
0 failures
ActionExtractor
CLOSED
0 failures
TaskPrioritizer
CLOSED
0 failures
OwnerAssigner
CLOSED
0 failures
EscalationDecider
CLOSED
0 failures