Managing major cyber incidents

Hi, my name is Will Oram. I’m a cyber security consultant living in London. You can follow me on Twitter. I help companies respond to cyber security breaches and prevent cyber attacks. I also tweet and blog about cyber security, and maintain this collection of resources for managing and remediating from cyber security breaches.

Managing major cyber incidents

Quick-reference notes I use when responding to major cyber incidents. Loosely organised into four sections.

Arriving on scene

The basics of incident management
Top priorities when arriving at an incident
Gaining situational awareness
Immediate priority checklist
Common platform for secure communication and collaboration

Understanding capabilities

Understanding the environment
Key logs to consider
Detection capabilities

Building a response strategy

The basics of a response strategy
Building workstreams
Focusing on key investigative questions
Remediation objectives
Focusing on the attacker
Data breach impacts
Key issues managing and coordinating response efforts

Delivering the response

Building an action plan
Driving forward delivery
Providing status updates
Defining a watch out criteria
Managing an interrupted remediation
Delivering the response to an incident
Key response documents
Communicating on data breaches

Arriving on scene

The basics of incident management

What has happened?
What is the plan?
Who is in charge?

Organisations time and time again struggle to clearly answer these questions during major cyber incidents.

Top priorities when arriving at an incident

Understand organisational priorities
Ensure immediate priorities are being actioned
Gain situational awareness
Assess risks and issues
Stand up / integrate with the business-wide strategic response
Define ways of working and build response tempo
Stand up common platforms for secure communication and collaboration
Establish measurable incident objectives
Select appropriate strategies to achieve objectives
Define and mobilise workstreams to deliver on strategies / tasks (define and document workstream processes)
Identify and resolve resourcing gaps
Perform tactical direction and provide necessary follow-up

(Document everything)

Gaining situational awareness

What has happened?
How have you responded?
What is unknown at this point?
Have you considered the legal and regulatory implications?
Has senior leadership been briefed?
Who has been notified?
What are you concerned about? (risks / issues)
What are your priorities?
What is your plan?
Who is in charge? (of the response and individual workstreams)
Have you thought about how the incident could escalate?
What is your plan if the incident escalates? (and how will you identify this?)
Do you have a reactive press statement prepared?
Are you in a crisis?

Immediate priority checklist

Senior management are being briefed on the incident, risks and issues
Action is being taken to mitigate any unacceptable risks to the business
Evidence is being collected and preserved
Legal & regulatory obligations are being assessed
Gaps in technical visibility have been identified and are being resolved
Incident response and crisis management plans have been initiated

Common platform for secure communication and collaboration

Examples Teams, JIRA, Slack, Google Drive

Need to have a way to:

Share documents / collaborate on document writing
Communicate both with all teams working on the incident (often from multiple companies)
Track issues and projects with workflows
Centrally store key information

Understanding capabilities

Understanding the environment

Workstations
Email and Web
Servers
Cloud
Networks / Data centers
Applications
Identity
Data

For each:

What do you have?
What capabilities do you have to prevent attackers? Coverage, features, constraints, limitations
What capabilities do you have to detect attackers? Coverage, features, constraints, limitations
What people and processes do you have to support this?

Key logs to consider

Server and workstation operating system logs
Authentication logs (e.g. login, remote access, VPN)
Application logs (e.g. web logs, database logs)
Network logs (e.g. web proxy logs, firewall logs, DNS, NetFlow)
Security Tool logs (e.g. EDR, AV, mail filtering logs)

Detection capabilities

What are your roll-out plans and deployment statistics for endpoint agents?
What are your roll-out plans and deployment statistics for network appliances?
What is your availability of logging covering other sources of visibility?
What visibility gaps do you have of the environment?
What monitoring and detection tools are built on top of these sources of visibility?
How are these tools configured to detect attacker activity?
How are detection alerts tracked?
What processes are there to triage, investigate and respond to detection alerts?

Do you need to stand-up a tool e.g. JIRA to track and manage new detection alerts?

Key that all detection alerts are tracked centrally and moved through a single process.

Building a response strategy

The basics of a response strategy

A response strategy needs to be proportionate to respond to sophistication of the threat actor and the scale/complexity of the incident.

Encompasses "how" we are going to respond. Activities should then be grouped / organised into workstreams.

Considering:

Priorities and objectives
Risks and issues
Understanding of the environment
Visibility of the environment
Organisational and technical capability / capacity to respond
Investigative findings so far, including knowledge of the adversary

Building workstreams

Workstreams should map to objectives / strategies, not aligned to any pre-existing business units / organisational hierarchies.

Each workstream should have a lead responsible and accountable for the workstream's activities.

Where possible the team working on a workstream should work and sit together.

Processes used by each workstream should be mapped out and communicated.

Need to ensure response efforts have the capacity and speed to scale to the size of the incident.

Example workstreams for the strategic organization-wide response

Communications
Legal
IT Operations
Technical Incident Management
Business Operations
Strategic Improvements
Finance / Administration

Focusing on key investigative questions

When was the window of compromise?
How did the attacker initially gain access to the environment?
What systems did the attacker access and/or compromise?
How did the attacker access and/or compromise these systems?
What accounts did the attacker compromise?
What activity was carried out by the attacker within the environment?
What data did the attacker access and how did the attacker do this?
What evidence is there of data exfiltration?
Does the attacker still have access to the environment?
Has the attack concluded?

Remediation objectives

Remediation has four key objectives:

Remove attacker access to the environment.
Prevent the attacker from re-gaining access to the environment.
Detect the attacker if they re-gain access to the environment.
Limit the attacker’s ability to achieve any objectives if access to the environment is reacquired.

These four objectives are achieved by carrying out posturing, eradication and hardening.

Against a motivated and targeted attacker, failure to identify all attacker access, improve detection capabilities and carry out improvements to prevent the attacker from immediately re-gaining access to the environment, will likely result in the eradication not being successful (with the attacker maintaining access and embedding deeper in the network).

See my other GitHub repo here for more information.

Focusing on the attacker

What activity was carried out by the attacker within the environment?
What access does the attacker have into the environment?
Has the attacker gained access to any data that will make it easier for them to re-compromise the environment?
What are the likely motivations of the attacker?
What are the assessed capabilities of the attacker?
Has the attacker adapted their behaviour as a result of remediation activities undertaken?

Data breach impacts

Reputational
Legal
Technical
Operational
Financial

Key issues managing and coordinating response efforts

No clear or suitable incident management structures

Structures are formed ad-hoc, teams fail to interoperate, existing businesses structures are used
Lack of "operational rhythm" and programme management

Response is not as quick as leadership desires, delays in recognising a crisis, lack of accountability and action tracking
No clear strategy and objectives driving response efforts

Response is tactical not strategic, conflicting priorities/strategies, reactive decision making
Poor communication and collaboration

Disjoined uses of tooling, conflicting terminology, poor interoperability and missed/delayed escalations
Lack of leadership and accountability

Unclear chains of command, blurred lines of responsibility, fragmented teams, lack of trust reduces collaboration
No clear understanding of the facts which matter

Risks and issues are missed, leadership inundated with noise, remediation efforts repeatedly fail

Delivering the response

Building an action plan

What do we want to do? Priorities and objectives
How are we going to do it? Strategy, workstreams
Who is responsible for doing it? Roles and responsibilities
How do we communicate with each other? Daily rhythm and tempo
What is the procedure if the incident escalates?
What are the expectations of team members working on the response?
How will decisions be made?

Driving forward delivery

Break the organisation out of a business as usual mindset - removing pre-existing structures, expectations and assumptions
Build and follow a planning process
Group tactical and tasks into workstreams with leads
Communicate strategy and plans, roles and responsibilities to all involved
Track and hold teams to account to deliver on actions
Get teams to report on the delivery and effectiveness of plans in measurable ways
Run effective meetings with defined outcomes
Ensure situational awareness and document
Track risks and issues

Providing status updates

When was the first identified evidence of compromise? + delta
When was the last identified evidence of compromise? + delta
How many systems have been assessed as compromised? + delta
How many systems have been assessed as accessed? + delta
How many accounts have been assessed as compromised? + delta
How many privileged accounts have been assessed as compromised? + delta
Endpoing agent coverage + delta
What has been done over the status update period?
What is planned for the next status update period?
Risks and issues being tracked + delta
Update against key investigation questions
Update against "Eradication event criteria"

Defining a watch out criteria

Attacker finds a previously unidentified route into the environment
Attacker moves towards sensitive or personal data
Attacker compresses or stages files
Attacker accesses internet facing servers
Attacker gains domain administrator privileges
Attacker gains access to a domain controller
Attacker adds or edits users in Active Directory
Attacker carries out activity indicating potential destructive intentions

If triggered:

Who should this be communicated to?
How should this be communicated to them?
How should this first be verified?
Should this communication be written or verbal in the first instance?
What technical response playbooks have been written to ensure a rapidly and effectively response?
What playbooks have been written for carrying out common response tasks such as blockings IPs, sinkholing DNS, resetting accounts and isolating systems?
How is the organisation building an increased state of readiness?

Managing an interrupted remediation

If remediation activities are interrupted by an alert what are the key questions to ask.

When did the first alert occur?
What is the first evidence of compromise on this system? (e.g. before or after eradication, key to decide whether to rapidly remediate)
Should any of this activity have been blocked?
Are we seeing any of the same indicators as used previously? (e.g. IP or domain names)
Are we seeing similar TTPs to previous activity?
Are we confident this is the same attacker?
Are we seeing attacker hands-on keyboard activity?
What activity has the attacker performed?
What level of access has the attacker gained?
Is there any other related activity on other systems?

Key decision making factors for response

Are we confident all new activity has been identified?
Will we alert on all instances of this activity going forward?

Delivering the response to an incident

Responding to a significant cyber security incident requires not only a technical response but a highly integrated strategic organization-wide response.

Crisis Management Team (CMT)

Manages and coordinates the organization-wide response to the incident
Sets the response objectives, priorities and strategies
Has overall responsibility for all response activities
Secures support from the wider organization including from senior management
Leads with an example of the culture required to successfully navigate through the crisis

Needs to be tightly integrated with the technical response.

Strategy Advisory Group (SAG)

Propose priorities and strategies to resolve the incident
Consist of cyber security leadership, external advisors and legal
Consider technical risks and issues

Incident Management Team (IMT)

Delivers the technical response to the incident
Uses the inter-operable / modular FEMA Incident Command System (ICS)
The incident command is in charge of the technical response to the incident

Needs to be tightly integrated with the strategic organization-wide response.

Investigation	Threat Hunting	Remediation	Monitoring	Operations	Logistics / PMO
Situational Awareness	Threat Detection	Analysis and Planning	Alert Triage	Evidence Collection	Action Tracking
Forensic Analysis	Hunting	Triage	Continuous Monitoring	Tech Deployment	Resourcing
Threat Intelligence	Tuning	Delivery		Agent Deployment	Finance and Admin
Impact Assessment				Recovery

Other ideas for workstreams:

Agent Deployment
Threat Intelligence
Pre-emptive containment (limit the impact of ransomware attacks before they detonate)
Recovery
Strategic Improvement

Key response documents

Incident action plan
Ways of working
Red line / watch out criteria
Immediate priorities checklist
Incident timeline
Remediation plan
Risks, actions, issues, decisions tracker
Investigation tracker

What systems are compromised / suspected compromised?
What systems has the attacker accessed?
What systems has the attacker performed recon against?
What accounts are compromised / suspected compromised?
What systems have agents deployed?

Stakeholder mapping
Evidence tracker
Media handling FAQ
Comms tracker

Communicating on data breaches

Key messages to deliver:

Care and concern (about those affected)
Control (of the situation)
Commitment (to resolving the problem)

Key considerations:

Mapping stakeholders
Coordinating / sequencing communications based on priority
Anticipating stakeholder issues and preparing to respond
Incremental reassurance
Media trackers
External comms trackers (e.g. vendors)

Key questions to answer:

What happened?
How this happened?
What will the impact be on customers?
How do you feel about it?
What you are going to do to fix it?
How you are committed to making this right?
How you are going to be transparent and maintain customer trust?
How you are staying true to your values?
What steps customers can take to protect themselves? (what are you doing to help customers?)
When are you going to provide your next update?
FAQs (see my other GitHub repo here for examples)

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
LICENSE.md		LICENSE.md
README.md		README.md

License

WillOram/cyber-incident-management

Folders and files

Latest commit

History

LICENSE.md

LICENSE.md

README.md

README.md

Repository files navigation

Managing major cyber incidents

Arriving on scene

Understanding capabilities

Building a response strategy

Delivering the response

Arriving on scene

The basics of incident management

Top priorities when arriving at an incident

Gaining situational awareness

Immediate priority checklist

Common platform for secure communication and collaboration

Understanding capabilities

Understanding the environment

Key logs to consider

Detection capabilities

Building a response strategy

The basics of a response strategy

Building workstreams

Focusing on key investigative questions

Remediation objectives

Focusing on the attacker

Data breach impacts

Key issues managing and coordinating response efforts

Delivering the response

Building an action plan

Driving forward delivery

Providing status updates

Defining a watch out criteria

Managing an interrupted remediation

Delivering the response to an incident

Crisis Management Team (CMT)

Strategy Advisory Group (SAG)

Incident Management Team (IMT)

Key response documents

Communicating on data breaches

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages