in Tech

Examining the Instability of Memory in a Google Kubernetes Engine Node

1.2k Views

**Spotify’s New Method for Memory Analysis on GKE: An Open Source Alternative for Monitoring Containerized Workloads**

*June 22, 2023*

*Written by Marcus Hallberg, Security Engineer at Spotify*

**Introduction**

At Spotify, we utilize containerized workloads across our organization and rely on Google Kubernetes Engine (GKE) on Google Cloud Platform (GCP) for our production workloads. To ensure the security of our workloads, it is essential to quickly analyze any suspicious behavior and identify potential malicious activity. While we currently use commercial solutions for monitoring, we also embarked on a research project to explore alternative options. This led us to a new method for conducting memory analysis on GKE using open source tools. In this blog post, I will provide a detailed explanation of how memory analysis works and how this new method can be utilized on any GKE node in production today.

**Spotify’s Usage of GKE on GCP**

As heavy users of GKE on GCP, Spotify operates across five GCP regions and runs hundreds of thousands of pods in production within over 3,000 GKE namespaces. With such extensive usage, it becomes crucial for us to scale and monitor our production workloads effectively.

**Understanding GKE Terminology**

Before diving into the memory analysis process, it’s important to familiarize ourselves with some general terms related to GKE:

**Control Plane**: This is the container orchestration layer that facilitates the management of containers by exposing APIs and interfaces for defining, deploying, and managing their lifecycle.

**Cluster**: A cluster consists of worker machines, known as nodes, which run containerized applications. Each cluster must have at least one worker node.

**Node**: A node refers to a worker machine within the Kubernetes framework.

**Namespace**: A namespace is an abstraction utilized by Kubernetes to provide isolation for groups of resources within a single cluster.

**Pod**: The smallest and simplest Kubernetes object, a Pod represents a set of running containers within a cluster.

**Container**: A container is a lightweight and portable executable image containing software and its dependencies.

**A High-Level Architecture of GKE Cluster**

The following diagram provides a high-level overview of the architecture of a GKE cluster on GCP:

**Figure 1: GKE-managed cluster overview**

**Accessing the Kernel on a GKE Node for Memory Analysis**

To analyze memory on a GKE node and examine the running processes within it, the kernel serves as the optimal location to retrieve this information. Many commercial solutions utilize the extended Berkeley Packet Filter (eBPF) approach to access the kernel. However, an alternative approach was discovered during our research. The process of accessing the kernel and analyzing memory on a GKE node involves the following three steps:

**Step 1: Create a Kernel Memory Dump**

By creating a kernel memory dump, a snapshot of all the kernel activities at a specific time can be obtained for analysis. Due to GKE nodes running the hardened operating system COS, traditional methods like kernel modules are not applicable. Instead, we add a temporarily privileged container to the GKE node with adequate permissions, allowing us to access the kernel space through the file path: /proc/kcore. The open source tool AVML is then utilized to create the kernel memory dump. The Terraform configuration below illustrates the addition of a privileged container in GKE:

**Figure 4: Terraform config of GKE container**

**Step 2: Build a Symbol File of the Kernel**

In order to interpret the kernel memory dump, it is necessary to construct an Intermediate Symbol File (ISF) specific to the kernel version of the GKE node. This can be accomplished by accessing the vmlinux file, which represents the uncompressed kernel image, and using the open source tool dwarf2json to generate the symbol file. The challenge lies in locating the vmlinux file for the COS version of a GKE node hosted by Google Cloud. After extensive research and discussions with Google engineers, we discovered an undocumented API that grants access to the vmlinux file by utilizing the build_id of the COS version running on the GKE node. As the build_id can be found within the GKE image name, the API can be accessed using the following link: https://storage.googleapis.com/cos-tools/$build_id/vmlinux. The example below showcases the GKE image configuration containing the build_id:

**Figure 5: GKE image configuration, including build_id**

Armed with this knowledge, we can access the vmlinux file through the link: https://storage.googleapis.com/cos-tools/16919.235.1/vmlinux. Subsequently, the symbol file can be built using dwarf2json.

**Step 3: Analyze the Kernel Memory Dump**

With both the kernel memory dump and the symbol file at our disposal, Volatility 3 is employed for the analysis. Volatility 3 enables us to view all running processes on both the privileged pod and a test pod located on the same GKE node. The test pod attempts various processes, such as a Netcat listener and a Python script, to provide examples for analysis. The following diagram displays the complete output of the process analysis from the kernel memory dump:

**Figure 6: Process output from Volatility 3**

**Conclusion**

By utilizing AVML, dwarf2json, and Volatility 3, Spotify has found free and open source alternatives for monitoring containerized workloads in place of commercial solutions. Although this approach provides a snapshot of process activity, it can serve as a valuable starting point for memory analysis on GKE or complement existing commercial monitoring solutions. The code used in this research project is available on GitHub and was presented at BSidesNYC 2023.

**Tags: backend**

Examining the Instability of Memory in a Google Kubernetes Engine Node

Ezoic Earnings: Report on Income from Niche Sites in May 2024

Attract Free Traffic to Your Links, Website, and Affiliate Marketing in 2024

Starting a Profitable Affiliate Marketing Business in 7 Days Using A.I.

Introduction to Affiliate Marketing Trends: Part 1

Creating a Free Affiliate Marketing Website with AI

iRobot’s Revolutionary Roomba j7+ with Poop Detection Available at Unbeatable Price!

Examining the mechanisms of server-side rendering and hydration in Gatsby and Next

Samsung launches the highly anticipated One UI 6 beta program, welcoming users to immerse themselves in the cutting-edge interface

Detecting new fraudulent behaviors through unsupervised graph anomaly detection

Xsolla Unveils Exciting Collaborations to Empower Game Developers and Unveils Tokyo Expansion

Enhanced Streaming Experience for Sidekick Users

Leave a ReplyCancel reply

Tour of Pearl Garden in Om Nagar, Vasai West

Watch the detailed tutorial on investing in UAP Old Mutual Unit Trust Fund now!

GenAfrica Asset Managers: Our Portfolio

Assessing Vulnerabilities of 5G Networks: An In-depth Field Campaign | MIT News

Gabriel Davidescu, UTI Construction and Facility Management, unveils all about Brașov Airport

iRobot’s Revolutionary Roomba j7+ with Poop Detection Available at Unbeatable Price!

Ezoic Earnings: Report on Income from Niche Sites in May 2024

Attract Free Traffic to Your Links, Website, and Affiliate Marketing in 2024

Starting a Profitable Affiliate Marketing Business in 7 Days Using A.I.

Introduction to Affiliate Marketing Trends: Part 1

Creating a Free Affiliate Marketing Website with AI

Traffic source that is free for affiliate marketing and websites in 2024 by Anup Gutta.

Download the free book on GetBigCommissions.Com. For high-quality lead magnets.

Demo of the UpTik Affiliate Outreach Bot for TikTok Shop Live with a Comprehensive Update Overview and a 2-Day Trial Offer

Building a Profitable Affiliate Marketing Funnel on Pinterest

Ezoic Earnings: Report on Income from Niche Sites in May 2024

Attract Free Traffic to Your Links, Website, and Affiliate Marketing in 2024

Starting a Profitable Affiliate Marketing Business in 7 Days Using A.I.

Introduction to Affiliate Marketing Trends: Part 1

Creating a Free Affiliate Marketing Website with AI

Traffic source that is free for affiliate marketing and websites in 2024 by Anup Gutta.

Download the free book on GetBigCommissions.Com. For high-quality lead magnets.

The Premier 2023 I/O Preshow – Curated by Dan Deacon (with Expert Assistance from MusicLM)

Leave a ReplyCancel reply

Log In

Sign In

Forgot password?

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections

Hold on! Before you go away...