RBAC misconfigurations that break production (and how to fix them)
Most RBAC failures aren't from lack of permissions — they're from misconfigurations that silently over-grant access until something explodes. Here are the eight patterns I see repeatedly and exactly how to fix them.

I've done a lot of cluster audits. The pattern is almost always the same: RBAC was set up correctly at the start, then gradually degraded. A debugging session left a permissive binding that nobody removed. A Helm chart shipped with a ClusterRole that nobody read. The default service account quietly accumulated permissions over time.
Nobody notices until something breaks or, worse, until an auditor or attacker does.
This post covers the eight misconfigurations I see most often — not theoretical risks, but things that have caused real production incidents or failed real compliance reviews.
1. Wildcard verbs on sensitive resources
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["*"]This grants create, delete, update, patch, list, watch, and get on secrets. The intention was usually "the app needs to read secrets." The result is a workload that can overwrite or exfiltrate every secret in the namespace.
Why it's dangerous: list + watch on secrets is particularly bad. A compromised pod with these permissions can dump every secret in the namespace in a single API call. list on secrets returns full secret values, not just metadata.
Fix: Be explicit about verbs. If an app only reads secrets, say so:
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get"]If it needs to read a specific secret, use resourceNames:
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get"]
resourceNames: ["my-app-credentials"]2. Using ClusterRoleBinding when you need RoleBinding
1kind: ClusterRoleBinding
2roleRef:
3 kind: ClusterRole
4 name: pod-reader
5subjects:
6 - kind: ServiceAccount
7 name: my-app
8 namespace: productionThis grants pod-reader permissions across every namespace in the cluster, not just production. If pod-reader includes list pods, your app can list pods in kube-system, monitoring, and every tenant namespace.
Why it happens: ClusterRoleBinding is easier to write — you don't have to think about which namespace. The docs don't shout loudly enough that a RoleBinding achieves the same effect scoped correctly.
The rule: A ClusterRole can be bound with either a ClusterRoleBinding (cluster-wide) or a RoleBinding (namespace-scoped). Default to RoleBinding unless you have an explicit reason to go cluster-wide.
1kind: RoleBinding
2metadata:
3 namespace: production
4roleRef:
5 kind: ClusterRole
6 name: pod-reader
7subjects:
8 - kind: ServiceAccount
9 name: my-app
10 namespace: production3. The default service account over-permission trap
When you don't specify a serviceAccountName in your Pod spec, Kubernetes assigns the default service account in that namespace. If someone bound a permissive role to default — even temporarily, even "just to test" — every pod in that namespace inherits those permissions.
# Find all bindings to the default SA
kubectl get rolebindings,clusterrolebindings -A -o json \
| jq '.items[] | select(
.subjects[]? |
(.kind=="ServiceAccount" and .name=="default")
) | .metadata.name'Fix: Audit and clean up bindings to default. Then opt out of automounting the token for workloads that don't call the API:
spec:
automountServiceAccountToken: falseOr set this at the service account level to make it the default for all pods using that SA:
apiVersion: v1
kind: ServiceAccount
metadata:
name: default
automountServiceAccountToken: false4. Aggregated ClusterRoles silently expanding permissions
Kubernetes ships with aggregated ClusterRoles (admin, edit, view) that collect permissions from other roles via label selectors. If you create a ClusterRole with the right label, its permissions get merged in automatically:
1kind: ClusterRole
2metadata:
3 name: my-crd-permissions
4 labels:
5 rbac.authorization.k8s.io/aggregate-to-edit: "true" # <-- merged into 'edit'
6rules:
7 - apiGroups: ["mycompany.io"]
8 resources: ["sensitiveresources"]
9 verbs: ["*"]Everyone bound to the edit ClusterRole now has full access to sensitiveresources, and nobody added an explicit binding. This is a silent expansion — the edit ClusterRole itself didn't change, but its effective permissions did.
Fix: Audit every ClusterRole that has an aggregate-to-* label. Treat those labels as a trust boundary — they effectively make your role part of a shared, cluster-wide role.
kubectl get clusterroles -o json \
| jq '.items[] | select(.metadata.labels |
keys[] | startswith("rbac.authorization.k8s.io/aggregate-to"))
| .metadata.name'5. system:authenticated bindings
subjects:
- kind: Group
name: system:authenticated
apiGroup: rbac.authorization.k8s.ioThis grants permissions to every authenticated user and service account in the cluster. I've found this in the wild more times than I'd like to admit — usually left by a Helm chart or copied from a StackOverflow answer. The person who wrote it meant "our team members," not "every workload running in the cluster."
system:unauthenticated is even worse and grants access to unauthenticated requests. Check for it:
kubectl get clusterrolebindings -o json \
| jq '.items[] | select(
.subjects[]?.name == "system:authenticated" or
.subjects[]?.name == "system:unauthenticated"
) | .metadata.name'Fix: Replace with explicit user, group, or service account subjects. There is almost no legitimate reason to bind to system:authenticated cluster-wide.
6. Leaving debug bindings in place
This is the most common one. An incident happens at 2 AM. Someone adds a permissive ClusterRoleBinding to get debugging access fast. The incident is resolved. The binding is never removed.
1kind: ClusterRoleBinding
2metadata:
3 name: debug-access-temp # "temp" — LOL
4roleRef:
5 kind: ClusterRole
6 name: cluster-admin
7subjects:
8 - kind: ServiceAccount
9 name: my-broken-app
10 namespace: productionSix months later, my-broken-app has been running with cluster-admin and nobody knew.
Fix — two layers:
First, clean up by auditing for high-privilege bindings that reference non-system subjects:
kubectl get clusterrolebindings -o json \
| jq '.items[] | select(.roleRef.name == "cluster-admin") |
{name: .metadata.name, subjects: .subjects}'Second, add expiry to any binding you create during an incident. Kubernetes doesn't support TTLs on RBAC objects natively, but you can enforce the practice via a custom admission webhook or just a documented runbook item: "every debug binding gets a GitHub issue tracking its removal."
7. Helm charts shipping over-privileged service accounts
Open any popular Helm chart and look at the templates/rbac.yaml. Many ship with ClusterRoles that include far more than the app actually needs — either because the developer wasn't sure what was needed, or because they wanted to avoid support tickets.
A common offender is the list/watch on all resources in all API groups:
rules:
- apiGroups: ["*"]
resources: ["*"]
verbs: ["get", "list", "watch"]Read-only across everything looks harmless. It isn't. An attacker who compromises that pod can enumerate your entire cluster topology — all services, all secrets metadata, all config maps, all running workloads.
Fix: Override RBAC rules at install time if the chart supports it. Many charts have a rbac.rules value for exactly this. If they don't, file an issue or fork the template:
# values.yaml override
rbac:
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list"]And always review rbac.yaml before helm install. Make it a mandatory step in your chart adoption checklist.
8. No audit log review
RBAC controls what's allowed. Audit logs tell you what's actually happening. Running without reviewing audit logs means you can have a correctly-configured RBAC policy and still miss that a service account is being used in unexpected ways — or that a compromised pod is probing the API for permissions.
Enable audit logging and set up alerts for:
- Any request that results in
403 Forbiddenfrom a service account (legitimate apps shouldn't be hitting permission boundaries constantly) - Any use of
secretslistorwatchverbs - Any request from a pod in a sensitive namespace calling the Kubernetes API at all (if that pod shouldn't be calling the API)
On EKS, audit logs land in CloudWatch. A basic CloudWatch Insights query:
fields @timestamp, user.username, objectRef.resource, verb, responseStatus.code
| filter responseStatus.code = 403
| filter ispresent(user.username)
| sort @timestamp desc
| limit 100
On self-managed clusters, configure the audit policy to capture at minimum RequestResponse level for secrets and Metadata level for everything else.
The audit you should run right now
1# 1. Who has cluster-admin?
2kubectl get clusterrolebindings -o json \
3 | jq -r '.items[] | select(.roleRef.name=="cluster-admin") |
4 .metadata.name + " → " + ([.subjects[]? | .kind + "/" + .name] | join(", "))'
5
6# 2. Which service accounts can list/get secrets?
7kubectl auth can-i list secrets --as=system:serviceaccount:default:default
8kubectl auth can-i get secrets --as=system:serviceaccount:default:default
9
10# 3. Find wildcard rules anywhere in the cluster
11kubectl get clusterroles,roles -A -o json \
12 | jq '.items[] | select(.rules[]? | .verbs[] == "*") |
13 .metadata.name + " (" + (.metadata.namespace // "cluster") + ")"'
14
15# 4. Find bindings to system:authenticated
16kubectl get clusterrolebindings,rolebindings -A -o json \
17 | jq '.items[] | select(.subjects[]?.name == "system:authenticated") |
18 .metadata.name'Run these on any cluster you're responsible for. The results are usually surprising.
The pattern behind all of them
Every misconfiguration above has the same root cause: RBAC was configured once and never revisited. Permissions accumulate. Bindings persist after their purpose is gone. Charts bring in roles nobody audited.
RBAC isn't a "set it and forget it" control. It needs to be reviewed the same way you'd review IAM policies in AWS — regularly, with tooling, and with a bias toward removing what isn't justified.
If you want to build these permissions interactively, I have an RBAC Generator that outputs correct Role + RoleBinding YAML, and a Kubernetes YAML Linter that catches common structural issues before you apply.


