Containing a Cryptojacking Incident in GCP
Title: Containing a Cryptojacking Incident in GCP
Date: 2024-10-27
Tags: Incident Response, Google Cloud Platform, Forensics, Monitoring, Google Cloud CLI
I. Introduction
1.1 Context & Purpose
- Unusual Compute Engine CPU alerts triggered an investigation, leading to the discovery of a cryptomining container deployed via a compromised service account.
- This log details the IR process using Google Cloud's native tools to contain and analyze the incident.
1.2 What This Covers
- Triage using Cloud Monitoring and Cloud Logging.
- Forensic analysis of the impacted VM and IAM resources.
- Steps taken to eradicate the threat and harden the environment.
II. Setup & Environment
2.1 Network & Tools Overview
- Environment: GCP Project
web-app-prod.
- Impacted Resource: A single
n2-standard-4VM instance inus-central1-a.
- Tools Used: Cloud Console, Cloud Shell,
gcloudCLI, VPC Flow Logs, OSQuery (deployed post-incident).
2.2 Prerequisites / Preparations
- Cloud Monitoring alert for "CPU utilization > 95% for 10 minutes."
- Pre-existing IR playbook for cloud incidents.
- Permissions:
roles/compute.admin,roles/iam.securityReviewer,roles/logging.viewer.
III. Execution & Findings
3.1 Steps Taken
- Triage: Confirmed the alert. The VM (
victim-instance) showed 100% CPU on all cores. Isolated the instance by removing its external IP and firewall rules.
gcloud compute instances delete-access-config victim-instance \
--access-config-name "external-nat" --zone us-central1-a
- Analysis: Reviewed Cloud Logging for the service account used by the VM.
-- Logs Explorer Query
resource.type="gce_instance"
log_id("cloudaudit.googleapis.com/activity")
protoPayload.authenticationInfo.principalEmail="compromised-sa@web-app-prod.iam.gserviceaccount.com"
Found: compute.instances.startWithEncryptionKey API call from an unfamiliar IP (45.xx.xx.xx).
3.2 Challenges & Fixes
- Challenge: The instance was running an unknown container. We needed to capture memory/disk state without powering it down (and losing volatile data).
- Fix: Created a snapshot of the boot disk and a memory dump using the
gcloudtoolcreate-diagnosis-report.
gcloud compute instances create-diagnosis-report victim-instance \
--zone us-central1-a --destination=gs://ir-forensics-bucket/
- Challenge: Determining the initial access vector.
- Fix: Cross-referenced VPC Flow Logs with the IAM audit log timestamp. Found an inbound SSH connection from the same malicious IP to a different, poorly secured bastion host two days prior.
IV. Observations & Insights
- The Attacker's Path: SSH to Bastion (weak password) → Stolen Service Account Key (stored in
/home/user/.config/) → Deployed mining container via Cloud Run API on a new, powerful VM. - Detection Gap: No alerts were configured for the creation of new VMs or for the use of the specific Cloud Run API method. Our detection was purely resource-based, which is late-stage.
- The container was a publicly known XMRig miner image, but it used a
--max-cpu-usage=75flag to avoid immediate suspicion, failing due to a flawed deployment script.
V. Considerations & Next Steps
- Immediate:
- Rotate all service account keys in the project.
- Enforce VPC Service Controls to limit where APIs can be called from.
- Implement Cloud IAM Recommender to reduce excess permissions on the compromised SA.
- Long-term: Deploy Cloud Security Command Center (SCC) Premium for continuous asset inventory and threat detection. Mandate OSQuery or a similar EDR agent on all compute instances.
VI. Conclusion
- The incident was contained because of a resource consumption alert, which is a last line of defense. The focus must shift left to initial access and privilege escalation detection.
- The key lesson is the critical importance of credential hygiene (especially on bastion hosts) and the principle of least privilege for service accounts. A key stored on a filesystem is a single point of failure for cloud security.
