**Spotify’s New Method for Memory Analysis on GKE: An Open Source Alternative for Monitoring Containerized Workloads**
*June 22, 2023*
*Written by Marcus Hallberg, Security Engineer at Spotify*
**Introduction**
At Spotify, we utilize containerized workloads across our organization and rely on Google Kubernetes Engine (GKE) on Google Cloud Platform (GCP) for our production workloads. To ensure the security of our workloads, it is essential to quickly analyze any suspicious behavior and identify potential malicious activity. While we currently use commercial solutions for monitoring, we also embarked on a research project to explore alternative options. This led us to a new method for conducting memory analysis on GKE using open source tools. In this blog post, I will provide a detailed explanation of how memory analysis works and how this new method can be utilized on any GKE node in production today.
**Spotify’s Usage of GKE on GCP**
As heavy users of GKE on GCP, Spotify operates across five GCP regions and runs hundreds of thousands of pods in production within over 3,000 GKE namespaces. With such extensive usage, it becomes crucial for us to scale and monitor our production workloads effectively.
**Understanding GKE Terminology**
Before diving into the memory analysis process, it’s important to familiarize ourselves with some general terms related to GKE:
**Control Plane**: This is the container orchestration layer that facilitates the management of containers by exposing APIs and interfaces for defining, deploying, and managing their lifecycle.
**Cluster**: A cluster consists of worker machines, known as nodes, which run containerized applications. Each cluster must have at least one worker node.
**Node**: A node refers to a worker machine within the Kubernetes framework.
**Namespace**: A namespace is an abstraction utilized by Kubernetes to provide isolation for groups of resources within a single cluster.
**Pod**: The smallest and simplest Kubernetes object, a Pod represents a set of running containers within a cluster.
**Container**: A container is a lightweight and portable executable image containing software and its dependencies.
**A High-Level Architecture of GKE Cluster**
The following diagram provides a high-level overview of the architecture of a GKE cluster on GCP:
**Figure 1: GKE-managed cluster overview**
**Accessing the Kernel on a GKE Node for Memory Analysis**
To analyze memory on a GKE node and examine the running processes within it, the kernel serves as the optimal location to retrieve this information. Many commercial solutions utilize the extended Berkeley Packet Filter (eBPF) approach to access the kernel. However, an alternative approach was discovered during our research. The process of accessing the kernel and analyzing memory on a GKE node involves the following three steps:
**Step 1: Create a Kernel Memory Dump**
By creating a kernel memory dump, a snapshot of all the kernel activities at a specific time can be obtained for analysis. Due to GKE nodes running the hardened operating system COS, traditional methods like kernel modules are not applicable. Instead, we add a temporarily privileged container to the GKE node with adequate permissions, allowing us to access the kernel space through the file path: /proc/kcore. The open source tool AVML is then utilized to create the kernel memory dump. The Terraform configuration below illustrates the addition of a privileged container in GKE:
**Figure 4: Terraform config of GKE container**
**Step 2: Build a Symbol File of the Kernel**
In order to interpret the kernel memory dump, it is necessary to construct an Intermediate Symbol File (ISF) specific to the kernel version of the GKE node. This can be accomplished by accessing the vmlinux file, which represents the uncompressed kernel image, and using the open source tool dwarf2json to generate the symbol file. The challenge lies in locating the vmlinux file for the COS version of a GKE node hosted by Google Cloud. After extensive research and discussions with Google engineers, we discovered an undocumented API that grants access to the vmlinux file by utilizing the build_id of the COS version running on the GKE node. As the build_id can be found within the GKE image name, the API can be accessed using the following link: https://storage.googleapis.com/cos-tools/$build_id/vmlinux. The example below showcases the GKE image configuration containing the build_id:
**Figure 5: GKE image configuration, including build_id**
Armed with this knowledge, we can access the vmlinux file through the link: https://storage.googleapis.com/cos-tools/16919.235.1/vmlinux. Subsequently, the symbol file can be built using dwarf2json.
**Step 3: Analyze the Kernel Memory Dump**
With both the kernel memory dump and the symbol file at our disposal, Volatility 3 is employed for the analysis. Volatility 3 enables us to view all running processes on both the privileged pod and a test pod located on the same GKE node. The test pod attempts various processes, such as a Netcat listener and a Python script, to provide examples for analysis. The following diagram displays the complete output of the process analysis from the kernel memory dump:
**Figure 6: Process output from Volatility 3**
**Conclusion**
By utilizing AVML, dwarf2json, and Volatility 3, Spotify has found free and open source alternatives for monitoring containerized workloads in place of commercial solutions. Although this approach provides a snapshot of process activity, it can serve as a valuable starting point for memory analysis on GKE or complement existing commercial monitoring solutions. The code used in this research project is available on GitHub and was presented at BSidesNYC 2023.
**Tags: backend**
GIPHY App Key not set. Please check settings