Background
As of Kubernetes version 1.23, Pod Security Standards are a built-in feature that can be used to enforce hardening policies in your cluster. This built-in admission controller allows users to set a consistent baseline by simply adding a label to a Kubernetes namespace. Additionally, this method is the new way to enforce restrictions instead of the (now deprecated) Pod Security Policies.
Pod Security Standards
There are three different standards available with the built-in pod security admission controller described in the docs as follows:
- Privileged: Unrestricted policy, providing the widest possible level of permissions. This policy allows for known privilege escalations.
- Baseline: Minimally restrictive policy which prevents known privilege escalations. Allows the default (minimally specified) Pod configuration.
- Restricted: Heavily restricted policy, following current Pod hardening best practices.
The namespace labels allow us to have some flexibility in how standards are enforced. For example, you could warn at restricted but enforce at baseline. This approach would allow you to enforce some level of protections, while being able to see areas that can be improved before enforcing the restricted standard.
Example:
apiVersion: v1
kind: Namespace
metadata:
name: myapp
labels:
pod-security.kubernetes.io/enforce: baseline
pod-security.kubernetes.io/warn: restricted
Additionally, you can specify the version to enforce:
apiVersion: v1
kind: Namespace
metadata:
name: myapp
labels:
pod-security.kubernetes.io/enforce: baseline
pod-security.kubernetes.io/enforce-version: v1.23
pod-security.kubernetes.io/warn: restricted
pod-security.kubernetes.io/warn-version: v1.23
How it works
Pod security standards can give you immediate feedback to missing requirements in your deployment. For example, create this namespace to enforce and warn on the restricted standard:
apiVersion: v1
kind: Namespace
metadata:
name: myapp
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/warn: restricted
Once the namespace is created, try creating the following deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
labels:
app: myapp
name: myapp
namespace: myapp
spec:
replicas: 1
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- image: busybox
imagePullPolicy: IfNotPresent
name: myapp
command: ['sh', '-c', 'echo "Testing pod security standards!" && sleep 3600']
The kubectl
output should show something like this:
Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "myapp" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "myapp" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "myapp" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "myapp" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
We can see that no pods were created in our myapp
namespace:
$ kubectl get po -n myapp
No resources found in myapp namespace.
Now, if we change the namespace labels to enforce baseline, but warn on restricted, the following should work:
# myapp.yml
apiVersion: v1
kind: Namespace
metadata:
name: myapp
labels:
pod-security.kubernetes.io/enforce: baseline
pod-security.kubernetes.io/warn: restricted
---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
labels:
app: myapp
name: myapp
namespace: myapp
spec:
replicas: 1
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- image: busybox
imagePullPolicy: IfNotPresent
name: myapp
command: ['sh', '-c', 'echo "Testing pod security standards!" && sleep 3600']
The output will show the same warning, but the pod will start:
$ kubectl apply -f myapp.yml
namespace/myapp configured
Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "myapp" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "myapp" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "myapp" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "myapp" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
deployment.apps/myapp configured
$ kubectl get po -n myapp
NAME READY STATUS RESTARTS AGE
myapp-587c5fb5c5-fd7nf 1/1 Running 0 10s
Hardening the application:
Ideally, you would want the previous example to actually work with the restricted standard. The following is an example of how the deployment could be modified to work in the restricted standard namespace:
apiVersion: v1
kind: Namespace
metadata:
name: myapp
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/warn: restricted
---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
labels:
app: myapp
name: myapp
namespace: myapp
spec:
replicas: 1
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
automountServiceAccountToken: false
securityContext:
runAsUser: 65534
runAsGroup: 65534
runAsNonRoot: true
seccompProfile:
type: "RuntimeDefault"
containers:
- image: busybox
imagePullPolicy: IfNotPresent
name: myapp
command: ['sh', '-c', 'echo "Testing pod security standards!" && sleep 3600']
securityContext:
allowPrivilegeEscalation: false
privileged: false
capabilities:
drop: ["ALL"]
Notice the deployment has securityContext
defined at both the pod and container level. This configures fields like dis-allowing the pod to run as root, dis-allowing privilege escalation, drop all kernel capabilities etc. The full details for allowed fields in the restricted standard is defined here: https://kubernetes.io/docs/concepts/security/pod-security-standards/#restricted
Why does this matter?
Ideally, this is just a nice guard rail for your engineering team so no one accidentally deploys pods with insecure settings. However, this is also another layer of protection that would prevent a malicious actor with namespace permissions deploying something like this:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
labels:
app: evilapp
name: evilapp
namespace: evilapp
spec:
replicas: 1
selector:
matchLabels:
app: evilapp
template:
metadata:
labels:
app: evilapp
spec:
containers:
- image: evilapp:latest
imagePullPolicy: IfNotPresent
name: evilapp
command: ['pwn-all-the-things.sh']
securityContext:
privileged: true
dnsPolicy: ClusterFirst
restartPolicy: Always
terminationGracePeriodSeconds: 30
hostPID: true
A deployment like this could potentially allow an attacker with namespace permissions to escalate to a full blown cluster and node compromise.
Final Thoughts
When you are configuring 3rd party applications to have them conform to your chosen pod security standard, make sure you look for pod level AND a container level securityContext
. I ran into this with the influx telegraf helm chart recently where you could configure securityContext
at the pod level, but not the container level. Luckily, the folks at influxdata are awesome and they approved my patch to their helm chart quickly.
Pod Security Standards are certainly not the only defense you should implement in your clusters, but it significantly raises the bar for security and helps your team avoid accidental misconfigurations.