Sentinel consumed by wolf after repeated false alarms.
Impact
Loss of sentinel. No flock impact.
Root causes
Sentinel generated noisy alerts due to premature deployment,
incomplete training, and overly monotonous task. Oncalls failed to
respond to true positive due to alert fatigue.
Trigger
Wolf.
Resolution
Gathered flock. Deployed replacement sentinel.
Detection
Sentinel did not report at end of shift.
Action Items
Priority
Action Item
Type
Status
P0
Gather flock
mitigate
complete
P0
Deploy replacement sentinel
mitigate
complete
P1
Update playbook for wolf alerts
prevent
complete
P2
Update remaining sentinels
prevent
complete
P2
Revise sentinel training program
prevent
complete
P2
Investigate equipping sentinels with flutes or slings
prevent
in progress
Lessons Learned
What went well
Flock gathering proceeded without issues.
No flock injuries or losses.
Replacement sentinel did not exhibit false positive alerts.
What went wrong
Noisy alerts not addressed.
Alerts silenced contrary to playbook.
Loss of sentinel.
Where we got lucky
Only one wolf.
Wolf sated after sentinel consumption.
Replacement sentinel available.
Timeline
All times local
March 3rd:
16:32 Oncalls paged “wolf”.
16:34 First oncall arrives at sentinel location.
16:34 Alert diagnosed as false positive. No corrective
action performed.
March 4th:
14:15 Oncalls paged “wolf”.
14:19 First oncall arrives at sentinel location.
14:19 Alert diagnosed as false positive. No corrective
action performed.
Wolf Incident Postmortem
Link post
Incident #210
Status
Complete, one action item outstanding.Summary
Sentinel consumed by wolf after repeated false alarms.Impact
Loss of sentinel. No flock impact.Root causes
Sentinel generated noisy alerts due to premature deployment, incomplete training, and overly monotonous task. Oncalls failed to respond to true positive due to alert fatigue.Trigger
Wolf.Resolution
Gathered flock. Deployed replacement sentinel.Detection
Sentinel did not report at end of shift.Action Items
Lessons Learned
What went well
Flock gathering proceeded without issues.
No flock injuries or losses.
Replacement sentinel did not exhibit false positive alerts.
What went wrong
Noisy alerts not addressed.
Alerts silenced contrary to playbook.
Loss of sentinel.
Where we got lucky
Only one wolf.
Wolf sated after sentinel consumption.
Replacement sentinel available.
Timeline
All times localMarch 3rd:
16:32 Oncalls paged “wolf”.
16:34 First oncall arrives at sentinel location.
16:34 Alert diagnosed as false positive. No corrective action performed.
March 4th:
14:15 Oncalls paged “wolf”.
14:19 First oncall arrives at sentinel location.
14:19 Alert diagnosed as false positive. No corrective action performed.
March 5th:
March 6th:17:03 (Reconstructed) Outage begins, sentinel notices wolf.
17:03 Oncalls paged “wolf”.
17:04 Oncalls paged “wolf”.
17:04 Oncalls paged “real wolf”.
17:05 (Reconstructed) Wolf consumes sentinel.
18:45 Sentinel does not report at end of shift.
19:05 Primary oncall dispatched to field.
19:10 Oncall diagnoses issue.
19:10 Incident begins, secondary and tertiary oncalls paged.
19:15 First sheep located.
19:52 Last sheep located.
20:05 Flock safe in pens.
20:05 Outage ends, flock protection fully restored.
20:45 Replacement sentinel identified.
07:38 Replacement sentinel deployed
18:45 Replacement sentinel reports at end of shift
18:45 Incident ends, 24hr without wolf alerts or activity (exit criterion).
Comment via: facebook, mastodon