Skip to content

Latest commit

 

History

History
9 lines (8 loc) · 3.65 KB

2-incident-response.md

File metadata and controls

9 lines (8 loc) · 3.65 KB

Incident Response

Language Used Behavior Displayed
Novice
  1. Generalized statements regarding events/activities in progress, i.e. "Something is wrong with the network!"
  2. "I have no idea what to do."
  3. "Have you tried turning it off and turning it on again?"
  4. No standard language.
  1. Team is event-focused; some (or all) of the team are "alarmed" by the occurrence of incidents and the actions taken in response
  2. The commencement of an incident is “fuzzy” and occurs primarily external to the team (manual NOC notification via “call-out” or email)
  3. Inconsistent response once incident has commenced
Beginner
  1. "I think there’s a problem with the database, network," etc.
  2. "I think [specific person] is familiar with how that works; we need to find them"
  3. Standardized language among the teams
  1. The team is area-focused, i.e. incidents involve specific system components; some (or all) of the team "fears" the occurrence of incidents
  2. Incident response based on "tribal knowledge" and usually requires specific people to effectively respond
  3. Incident remediation is inconsistent and based on the specific actors involved in the response
Competent
  1. "The deployment caused the database to hang."
  2. "We've got a knowledge base article on how to handle this."
  3. Some team members are familiar with standardized Incident Management System terminology and may use it during incident responses.
  1. The team is action-focused; they are "aware" that incident occurrence is a normal side effect of system operations
  2. The commencement of incidents is well-defined and well-understood by the team; most incident response are triggered via automated alerting/monitoring
  3. The team has identified incident "responders" and those incident responders know what is expected of them
Proficient
  1. "Have the database, network and search on-calls perform a systems status and report back to the incident commander."
  2. Team is familiar with standardardized Incident Management System terminology
  1. The team is technology-focused; they "accept" that incident occurrence is a normal side effect of system operations
  2. Incident response becomes an aspect of organizational and team “culture”; incident response expectations and “how tos” are part of on-boarding/recurrent training
  3. Teams collaborate cross-functionally to determine overall plan for large-scale incident response coordination and resolution
Advanced
  1. "What parts of the service did not 'self-heal' and need manual intervention?"
  2. "System A on-call rep and System-B on-call rep both reported issues which may seem related? Are we sure those two groups are talking to each other? If not, let's put them in touch immediately."
  3. Team uses/values standardardized Incident Management System terminology
  1. The team is systems-focused; they "embrace" incidents as learning experiences and improvement opportunities
  2. Incident conclusion criteria and processes are documented and understood by incident responders and includes crew dissolution steps and inputs into the ongoing incident remediation process
  3. The organization considers outside-normal-business hours incident response or repeated business-hours incident response to be inhumane to system operators