Containing a Cryptojacking Incident in GCP

Title: Containing a Cryptojacking Incident in GCP
Date: 2024-10-27
Tags: Incident Response, Google Cloud Platform, Forensics, Monitoring, Google Cloud CLI

Pasted image 20260104202603.png

I. Introduction

1.1 Context & Purpose

Unusual Compute Engine CPU alerts triggered an investigation, leading to the discovery of a cryptomining container deployed via a compromised service account.
This log details the IR process using Google Cloud's native tools to contain and analyze the incident.

1.2 What This Covers

Triage using Cloud Monitoring and Cloud Logging.
Forensic analysis of the impacted VM and IAM resources.
Steps taken to eradicate the threat and harden the environment.

II. Setup & Environment

2.1 Network & Tools Overview

Environment: GCP Project web-app-prod.
Impacted Resource: A single n2-standard-4 VM instance in us-central1-a.
Tools Used: Cloud Console, Cloud Shell, gcloud CLI, VPC Flow Logs, OSQuery (deployed post-incident).

2.2 Prerequisites / Preparations

Cloud Monitoring alert for "CPU utilization > 95% for 10 minutes."
Pre-existing IR playbook for cloud incidents.
Permissions: roles/compute.admin, roles/iam.securityReviewer, roles/logging.viewer.

III. Execution & Findings

3.1 Steps Taken

Triage: Confirmed the alert. The VM (victim-instance) showed 100% CPU on all cores. Isolated the instance by removing its external IP and firewall rules.

gcloud compute instances delete-access-config victim-instance \
    --access-config-name "external-nat" --zone us-central1-a

Analysis: Reviewed Cloud Logging for the service account used by the VM.

-- Logs Explorer Query
resource.type="gce_instance"
log_id("cloudaudit.googleapis.com/activity")
protoPayload.authenticationInfo.principalEmail="compromised-sa@web-app-prod.iam.gserviceaccount.com"

Found: compute.instances.startWithEncryptionKey API call from an unfamiliar IP (45.xx.xx.xx).

3.2 Challenges & Fixes

Challenge: The instance was running an unknown container. We needed to capture memory/disk state without powering it down (and losing volatile data).
Fix: Created a snapshot of the boot disk and a memory dump using the gcloud tool create-diagnosis-report.

gcloud compute instances create-diagnosis-report victim-instance \
    --zone us-central1-a --destination=gs://ir-forensics-bucket/

Challenge: Determining the initial access vector.
Fix: Cross-referenced VPC Flow Logs with the IAM audit log timestamp. Found an inbound SSH connection from the same malicious IP to a different, poorly secured bastion host two days prior.

IV. Observations & Insights

The Attacker's Path: SSH to Bastion (weak password) → Stolen Service Account Key (stored in /home/user/.config/) → Deployed mining container via Cloud Run API on a new, powerful VM.
Detection Gap: No alerts were configured for the creation of new VMs or for the use of the specific Cloud Run API method. Our detection was purely resource-based, which is late-stage.
The container was a publicly known XMRig miner image, but it used a --max-cpu-usage=75 flag to avoid immediate suspicion, failing due to a flawed deployment script.

V. Considerations & Next Steps

Immediate:

Rotate all service account keys in the project.
Enforce VPC Service Controls to limit where APIs can be called from.
Implement Cloud IAM Recommender to reduce excess permissions on the compromised SA.

Long-term: Deploy Cloud Security Command Center (SCC) Premium for continuous asset inventory and threat detection. Mandate OSQuery or a similar EDR agent on all compute instances.

VI. Conclusion

The incident was contained because of a resource consumption alert, which is a last line of defense. The focus must shift left to initial access and privilege escalation detection.
The key lesson is the critical importance of credential hygiene (especially on bastion hosts) and the principle of least privilege for service accounts. A key stored on a filesystem is a single point of failure for cloud security.