Cloud Incident Readiness
A Practical Playbook for CISOs and Teams
Introduction
For a CISO, mastering cloud incident readiness means more than just adopting new technology—it requires a paradigm shift in how you approach risk, visibility, and response. By recognizing and addressing the common shortcomings mentioned below, you can build a robust, proactive defense that minimizes risk and enhances your organization's resilience in the face of evolving threats.Call to Action: Review your current cloud incident readiness strategies in light of these insights. Identify gaps, update your IR plans, and schedule regular training sessions to ensure your team remains vigilant and prepared.
Ensuring access when it matters most
Every incident starts with an intake call, where we will go over common questions such as "What happened?" and "What have you done so far?", at some point in this conversation it will pivot to "What logs do you have available and can we get access to the logs?". Oftentimes this is where it gets interesting especially for larger organizations, because most onboarding processes are not that flexible and not fit for emergency access. This results in unwanted delays in your incident response. So let's figure out what access your incident response team or external provider needs.
Azure & Entra ID
Let's start with the Microsoft cloud covering Azure and Entra ID permissions. Here are some common challenges and pitfalls we see in real life cases:
Access limited to specific subscriptions, we have had cases where the security team only had access to specific subscriptions. The issue is that you don't know what you can't see, effectively you're trying to protect a border, but you don't know where the border is or where it ends.
Insufficient permissions, in a lot of cases the security teams only have access to logging and security tools in the cloud, but not to resources and Entra ID details. One of the issues here is that your team is able to see potentially malicious logins, but not what actions were taken with that account against Azure resources.
To make sure the right access is available for the right situation we differentiate between Standard and Emergency access, where standard can be used in day-to-day scenarios and emergency should only be used in active breaches to "stop the bleeding". The below table outlines the access we recommend setting up beforehand:
Standard
Reader Role on Root Management Group
Allows read-only access to all resources within the Azure subscriptions.
Standard
Global Reader Role
Provides read-only access to Entra ID logs and settings.
Emergency
Contributor Role (PIM)
Enables modification of Azure resources during critical remediation efforts.
Emergency
User Access Administrator Role (PIM)
Manage user accounts, such as blocking accounts or resetting passwords.
AWS
Permissions in AWS are very different to Microsoft, to start of you have different types of policies, you can set permissions directly on a user or leverage an IAM group with attached policies or an IAM role and attach permissions. Here are some challenges we have faced in the past while doing incident response in AWS environments:
No point of contact for an AWS account, this is especially a problem in larger organizations where teams have the control over their own AWS accounts. Having a list or overview of accounts and who is responsible helps, it's even better if there's information on what services are used. This will help in scenario's where you would see an often abused service like Amazon SES used by an account that should only be running workloads in ECS for example.
No AWS organization, it can get worse when you are responding to incidents especially if there are stand-alone accounts. Without an organization it's not possible to use Service Control Policies (SCP) and most importantly an organization allows you to setup a central CloudTrail trail, which is invaluable in an incident response scenario.
In AWS we can use built-in policies that can be assigned to roles, those roles should only be assumable by the security team, ideally from a separate Security/IR account. The table below shows the policies you should configure:
Standard
ReadOnlyAccess
Provides read-only access to AWS resources for monitoring and auditing without making any changes.
Emergency
AdministratorAccess
Full access to all AWS resources, enabling modifications during critical incident remediation.
Google Cloud
For Google Cloud there are also several ways in which you can provide access and managed IAM. For the purpose of this blog we have assumed the Google Cloud built-in IAM solution is used. With that we can use the Basic roles to arrange access to Google Cloud resources. Let's look at some example of common challenges we faced when doing incident response in Google Cloud:
Access setup at the 'wrong' level, for example at the project level, the concept of a project in Google is quite similar to an AWS account. That's why sometimes you will only get access on the Project or Folder level. This causes the same issue as described before you can only see what you have access to and not the rest of the organization.
Insufficient expertise on IAM, from the big cloud providers it seems to be that Google Cloud skills are the more unique one. Especially when it comes to IAM this can be quite the challenge and in an IR scenario you don't really want to start reading the docs.
The table below shows the Basic roles we can use for Google Cloud:
Standard
Viewer
Provides read-only access to all Google Cloud resources for monitoring and auditing without making any changes.
Emergency
Owner
Full access to all Google Cloud resources, enabling modifications and management during critical incident remediation.
Conclusion
In this post we went over what kind of access your teams and consultants should have in case of an incident. We consider this a starting point, for most of our client we would tailor the access to their environment and services and we recommend you do that too. For example one of our clients uses Kubernetes in AWS and sends the logs to CloudWatch we need to be able to access that too in case of an incident. Generally speaking you want to make sure that your teams are able to read all relevant information and for specific scenario's have the ability to elevate to write permissions to contain an incident.
Last updated