Using Kubernetes Pod Security Standards

Background

As of Kubernetes version 1.23, Pod Security Standards are a built-in feature that can be used to enforce hardening policies in your cluster. This built-in admission controller allows users to set a consistent baseline by simply adding a label to a Kubernetes namespace. Additionally, this method is the new way to enforce restrictions instead of the (now deprecated) Pod Security Policies.

Pod Security Standards

There are three different standards available with the built-in pod security admission controller described in the docs as follows:

  • Privileged: Unrestricted policy, providing the widest possible level of permissions. This policy allows for known privilege escalations.
  • Baseline: Minimally restrictive policy which prevents known privilege escalations. Allows the default (minimally specified) Pod configuration.
  • Restricted: Heavily restricted policy, following current Pod hardening best practices.

The namespace labels allow us to have some flexibility in how standards are enforced. For example, you could warn at restricted but enforce at baseline. This approach would allow you to enforce some level of protections, while being able to see areas that can be improved before enforcing the restricted standard.

Example:

apiVersion: v1
kind: Namespace
metadata:
  name: myapp
  labels:
    pod-security.kubernetes.io/enforce: baseline
    pod-security.kubernetes.io/warn: restricted

Additionally, you can specify the version to enforce:

apiVersion: v1
kind: Namespace
metadata:
  name: myapp
  labels:
    pod-security.kubernetes.io/enforce: baseline
    pod-security.kubernetes.io/enforce-version: v1.23

    pod-security.kubernetes.io/warn: restricted
    pod-security.kubernetes.io/warn-version: v1.23

How it works

Pod security standards can give you immediate feedback to missing requirements in your deployment. For example, create this namespace to enforce and warn on the restricted standard:

apiVersion: v1
kind: Namespace
metadata:
  name: myapp
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/warn: restricted

Once the namespace is created, try creating the following deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
  labels:
    app: myapp
  name: myapp
  namespace: myapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - image: busybox
        imagePullPolicy: IfNotPresent
        name: myapp
        command: ['sh', '-c', 'echo "Testing pod security standards!" && sleep 3600']

The kubectl output should show something like this:

Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "myapp" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "myapp" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "myapp" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "myapp" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")

We can see that no pods were created in our myapp namespace:

$ kubectl get po -n myapp
No resources found in myapp namespace.

Now, if we change the namespace labels to enforce baseline, but warn on restricted, the following should work:

# myapp.yml

apiVersion: v1
kind: Namespace
metadata:
  name: myapp
  labels:
    pod-security.kubernetes.io/enforce: baseline
    pod-security.kubernetes.io/warn: restricted
---
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
  labels:
    app: myapp
  name: myapp
  namespace: myapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - image: busybox
        imagePullPolicy: IfNotPresent
        name: myapp
        command: ['sh', '-c', 'echo "Testing pod security standards!" && sleep 3600']

The output will show the same warning, but the pod will start:

$ kubectl apply -f myapp.yml 
namespace/myapp configured
Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "myapp" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "myapp" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "myapp" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "myapp" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
deployment.apps/myapp configured

$ kubectl get po -n myapp
NAME                     READY   STATUS    RESTARTS   AGE
myapp-587c5fb5c5-fd7nf   1/1     Running   0          10s

Hardening the application:

Ideally, you would want the previous example to actually work with the restricted standard. The following is an example of how the deployment could be modified to work in the restricted standard namespace:

apiVersion: v1
kind: Namespace
metadata:
  name: myapp
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/warn: restricted
---
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
  labels:
    app: myapp
  name: myapp
  namespace: myapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      automountServiceAccountToken: false
      securityContext:
        runAsUser: 65534
        runAsGroup: 65534
        runAsNonRoot: true
        seccompProfile:
          type: "RuntimeDefault"
      containers:
      - image: busybox
        imagePullPolicy: IfNotPresent
        name: myapp
        command: ['sh', '-c', 'echo "Testing pod security standards!" && sleep 3600']
        securityContext:
          allowPrivilegeEscalation: false
          privileged: false
          capabilities:
            drop: ["ALL"]

Notice the deployment has securityContext defined at both the pod and container level. This configures fields like dis-allowing the pod to run as root, dis-allowing privilege escalation, drop all kernel capabilities etc. The full details for allowed fields in the restricted standard is defined here: https://kubernetes.io/docs/concepts/security/pod-security-standards/#restricted

Why does this matter?

Ideally, this is just a nice guard rail for your engineering team so no one accidentally deploys pods with insecure settings. However, this is also another layer of protection that would prevent a malicious actor with namespace permissions deploying something like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
  labels:
    app: evilapp
  name: evilapp
  namespace: evilapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: evilapp
  template:
    metadata:
      labels:
        app: evilapp
    spec:
      containers:
      - image: evilapp:latest
        imagePullPolicy: IfNotPresent
        name: evilapp
        command: ['pwn-all-the-things.sh']
        securityContext:
          privileged: true
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
      hostPID: true

A deployment like this could potentially allow an attacker with namespace permissions to escalate to a full blown cluster and node compromise.

Final Thoughts

When you are configuring 3rd party applications to have them conform to your chosen pod security standard, make sure you look for pod level AND a container level securityContext. I ran into this with the influx telegraf helm chart recently where you could configure securityContext at the pod level, but not the container level. Luckily, the folks at influxdata are awesome and they approved my patch to their helm chart quickly.

Pod Security Standards are certainly not the only defense you should implement in your clusters, but it significantly raises the bar for security and helps your team avoid accidental misconfigurations.

comments powered by Disqus