Automating TLS with cert-manager and Let's Encrypt
Manual TLS certificate management doesn't scale. cert-manager automates issuance, renewal, and rotation using Let's Encrypt — and integrates directly with your Ingress resources. This tutorial covers HTTP-01 and DNS-01 challenges, debugging stuck certificates, and rotating to production after testing with staging.
Before you begin
- Kubernetes cluster with an Ingress controller (nginx or similar)
- kubectl and Helm installed
- A real domain name with DNS access
- A public-facing cluster (for HTTP-01) or DNS provider API access (for DNS-01)
Manual certificate management is a 2am pager alert waiting to happen. Someone forgets to renew, the cert expires on a Friday afternoon, and now you're scrambling to run openssl commands in production while users are seeing browser warnings. I've been there. cert-manager eliminates this entirely — it issues, renews, and rotates certificates automatically, and integrates directly with Kubernetes Ingress. This tutorial wires it up end-to-end, from a first install through wildcard certs, with the debugging steps you'll actually need.
What You'll Build
By the end of this tutorial you'll have:
- cert-manager installed via Helm with CRDs managed properly
- A Let's Encrypt staging ClusterIssuer for testing (no rate limits)
- A Certificate issued via HTTP-01 challenge wired to a real Ingress
- A production ClusterIssuer switch once staging is confirmed working
- A DNS-01 challenge example for wildcard certificates using Cloudflare
The full flow takes about 45 minutes the first time. Once you have the pattern down, replicating it across domains takes 5 minutes.
Step 1: Install cert-manager
cert-manager ships as a Helm chart. The crds.enabled=true flag tells Helm to install the CRDs as part of the release — this means they'll be upgraded and removed with the chart, which is what you want.
bash1helm repo add jetstack https://charts.jetstack.io 2helm repo update 3 4helm install cert-manager jetstack/cert-manager \ 5 --namespace cert-manager \ 6 --create-namespace \ 7 --version v1.14.0 \ 8 --set crds.enabled=true
Verify the pods are running before moving on:
bashkubectl get pods -n cert-manager
You should see three pods all in Running state:
NAME READY STATUS RESTARTS AGE
cert-manager-7d75b9b5b5-xkjqp 1/1 Running 0 90s
cert-manager-cainjector-6c9f7b5b8-4frcl 1/1 Running 0 90s
cert-manager-webhook-6b9b7b5b5-z7vdm 1/1 Running 0 90s
If the webhook pod is stuck in Init or ContainerCreating, wait another 30 seconds — it needs to inject its own CA before it can serve requests. If it stays stuck, check kubectl logs -n cert-manager deployment/cert-manager-webhook.
Step 2: Create a Staging ClusterIssuer
Always start with Let's Encrypt staging. Production is rate-limited to 5 duplicate certificates per domain per week. If you make a configuration mistake — and you will the first time — you'll burn that quota on failed attempts. Staging has no rate limits. The only difference is that staging issues certificates from a fake CA that browsers don't trust, which is exactly what you want for testing.
yaml1apiVersion: cert-manager.io/v1 2kind: ClusterIssuer 3metadata: 4 name: letsencrypt-staging 5spec: 6 acme: 7 server: https://acme-staging-v02.api.letsencrypt.org/directory 8 email: your@email.com 9 privateKeySecretRef: 10 name: letsencrypt-staging-key 11 solvers: 12 - http01: 13 ingress: 14 ingressClassName: nginx
Apply it:
bashkubectl apply -f clusterissuer-staging.yaml
Check that the issuer registered successfully:
bashkubectl describe clusterissuer letsencrypt-staging
Look for Status: True and Ready in the conditions. If you see Failed to register ACME account, check that the email is valid and the ACME server URL is correct.
Step 3: Issue a Certificate via HTTP-01
There are two ways to trigger cert-manager: annotate your Ingress directly, or create a standalone Certificate resource. I'll show both — use annotations for simple cases, the Certificate resource when you need more control (multiple SANs, custom duration, etc.).
Approach 1: Ingress annotation
Add these to your existing Ingress:
yaml1apiVersion: networking.k8s.io/v1 2kind: Ingress 3metadata: 4 name: my-app-ingress 5 namespace: default 6 annotations: 7 cert-manager.io/cluster-issuer: "letsencrypt-staging" 8spec: 9 ingressClassName: nginx 10 tls: 11 - hosts: 12 - yourdomain.com 13 secretName: yourdomain-tls 14 rules: 15 - host: yourdomain.com 16 http: 17 paths: 18 - path: / 19 pathType: Prefix 20 backend: 21 service: 22 name: my-app 23 port: 24 number: 80
cert-manager watches for Ingress resources with that annotation and automatically creates a Certificate resource for you.
Approach 2: Certificate resource
When you want explicit control — multiple domains, custom renewal window, or a cert not tied to a specific Ingress:
yaml1apiVersion: cert-manager.io/v1 2kind: Certificate 3metadata: 4 name: yourdomain-cert 5 namespace: default 6spec: 7 secretName: yourdomain-tls 8 issuerRef: 9 name: letsencrypt-staging 10 kind: ClusterIssuer 11 dnsNames: 12 - yourdomain.com
Apply and watch the status:
bashkubectl apply -f certificate.yaml kubectl get certificate -w
The READY column should flip to True within 60–90 seconds if everything is configured correctly. If it stays False, move to the next step.
Step 4: Debug the Stuck Certificate
This is where everyone gets lost. cert-manager creates a chain of resources during issuance: Certificate → CertificateRequest → Order → Challenge. If something fails, it fails at one level of this chain. You need to walk down the chain to find where.
bash1# Level 1: Certificate — shows overall status and any top-level errors 2kubectl describe certificate yourdomain-cert -n default 3 4# Level 2: CertificateRequest — shows the actual CSR and approval status 5kubectl describe certificaterequest -n default 6 7# Level 3: Order — shows the ACME order state with Let's Encrypt 8kubectl describe order -n default 9 10# Level 4: Challenge — shows the individual challenge attempt 11kubectl describe challenge -n default
The Challenge resource is usually where you find the actual error. Common failure reasons:
HTTP-01 challenge URL not reachable. Let's Encrypt sends a GET request to http://yourdomain.com/.well-known/acme-challenge/<token>. If port 80 isn't publicly reachable, or your Ingress controller isn't routing that path, the challenge fails. Check it yourself:
bashcurl http://yourdomain.com/.well-known/acme-challenge/<token>
You should get back the token value. If you get a 404 or connection refused, the Ingress isn't routing the challenge path correctly. Some Ingress controllers block /.well-known/ by default — check your nginx configuration.
ingressClassName mismatch. The ingressClassName in your ClusterIssuer solver must match the class your Ingress controller is actually using. Check with:
bashkubectl get ingressclass
DNS not propagated. If you just created the DNS record for the domain, Let's Encrypt might be checking before it resolves. Wait for propagation and then delete the stuck Challenge resource to trigger a retry:
bashkubectl delete challenge -n default <challenge-name>
cert-manager will create a new one automatically.
cert-manager logs. When all else fails:
bashkubectl logs -n cert-manager deployment/cert-manager -f
The log lines are verbose but the errors are clear — look for lines containing Error or Failed.
Step 5: Switch to Production
Once your staging certificate is issued, kubectl get certificate shows READY=True. The cert itself will be from Let's Encrypt's staging CA (browsers will warn — that's expected). At this point you know the full pipeline works. Now switch to production.
yaml1apiVersion: cert-manager.io/v1 2kind: ClusterIssuer 3metadata: 4 name: letsencrypt-production 5spec: 6 acme: 7 server: https://acme-v02.api.letsencrypt.org/directory 8 email: your@email.com 9 privateKeySecretRef: 10 name: letsencrypt-production-key 11 solvers: 12 - http01: 13 ingress: 14 ingressClassName: nginx
bashkubectl apply -f clusterissuer-production.yaml
Update your Ingress annotation or Certificate resource to reference letsencrypt-production. Then delete the old TLS secret so cert-manager issues a fresh one from the production CA — cert-manager won't replace an existing secret automatically when you change the issuer:
bashkubectl delete secret yourdomain-tls -n default
cert-manager detects the missing secret and triggers a new issuance immediately. Watch it:
bashkubectl get certificate -w -n default
Within 60–90 seconds, READY flips to True and the secret is recreated with a valid, browser-trusted certificate. Your Ingress controller picks it up without any restart.
Step 6: DNS-01 for Wildcard Certificates
HTTP-01 challenges only work for exact hostnames. If you want *.yourdomain.com — so every subdomain gets TLS without individual certificates — you need DNS-01. Instead of serving a file over HTTP, cert-manager creates a DNS TXT record to prove domain ownership. This means it needs API access to your DNS provider.
I'll use Cloudflare here since it's common, but cert-manager supports Route53, Google Cloud DNS, Azure DNS, and many others via the same pattern.
Create a Cloudflare API token with Zone:DNS:Edit permissions for your domain. Then store it as a secret in the cert-manager namespace (not your app namespace — the ClusterIssuer lives there):
bashkubectl create secret generic cloudflare-api-token \ --from-literal=api-token=<your-cloudflare-api-token> \ -n cert-manager
Create the DNS-01 ClusterIssuer:
yaml1apiVersion: cert-manager.io/v1 2kind: ClusterIssuer 3metadata: 4 name: letsencrypt-dns01 5spec: 6 acme: 7 server: https://acme-v02.api.letsencrypt.org/directory 8 email: your@email.com 9 privateKeySecretRef: 10 name: letsencrypt-dns01-key 11 solvers: 12 - dns01: 13 cloudflare: 14 apiTokenSecretRef: 15 name: cloudflare-api-token 16 key: api-token
bashkubectl apply -f clusterissuer-dns01.yaml
Request the wildcard certificate. Note that Let's Encrypt requires you to include both *.yourdomain.com and yourdomain.com as separate SANs — the wildcard doesn't cover the apex domain:
yaml1apiVersion: cert-manager.io/v1 2kind: Certificate 3metadata: 4 name: wildcard-cert 5 namespace: default 6spec: 7 secretName: wildcard-tls 8 issuerRef: 9 name: letsencrypt-dns01 10 kind: ClusterIssuer 11 dnsNames: 12 - "*.yourdomain.com" 13 - "yourdomain.com"
bashkubectl apply -f wildcard-certificate.yaml kubectl get certificate wildcard-cert -w
DNS-01 issuance takes longer than HTTP-01 — up to 2–3 minutes while cert-manager creates the TXT record and Let's Encrypt waits for DNS propagation. If it gets stuck, check the Challenge resource:
bashkubectl describe challenge -n default
Look for errors around DNS propagation or API token permissions. A common mistake is using a Cloudflare API key (account-level) instead of an API token (zone-level) — they're different things in the Cloudflare dashboard.
Verification
After issuance, verify the certificate contents:
bash1# List all certificates across namespaces 2kubectl get certificate -A 3 4# Inspect the issued cert — check Subject, SANs, and expiry 5kubectl get secret yourdomain-tls -o jsonpath='{.data.tls\.crt}' | \ 6 base64 -d | \ 7 openssl x509 -noout -text | \ 8 grep -E "Subject:|DNS:|Not After" 9 10# Verify the live HTTPS endpoint 11curl -v https://yourdomain.com 2>&1 | grep -E "SSL|issuer|expire"
For the production issuer, the issuer line from curl should reference R10 or R11 (Let's Encrypt's current intermediate CAs), not the fake staging CA. If you still see the staging CA, the old secret wasn't deleted and re-issued — delete it again and wait.
cert-manager automatically renews certificates 30 days before expiry by default. You don't need to do anything — it handles the renewal challenge the same way it handled the initial issuance. You can verify renewal is configured by checking the Certificate status:
bashkubectl describe certificate yourdomain-cert -n default | grep -A5 "Renewal Time"
Common Mistakes
Using production before verifying with staging. You will hit a configuration issue on your first attempt. Always validate with staging first or you'll burn your 5-cert/domain/week quota on debugging. Staging is indistinguishable from production during the issuance process — the only difference is the CA root.
Reusing the same secretName across Certificates. Each Certificate resource must have a unique secretName. If two Certificates point to the same secret, cert-manager will fight over it and produce undefined behavior. This is a subtle bug that only surfaces when you have multiple domains.
HTTP-01 with no port 80 exposure. If your Ingress controller only accepts HTTPS (port 443 only), HTTP-01 challenges will never complete. Let's Encrypt needs to reach port 80. Either open port 80 on your load balancer for the challenge path, or switch to DNS-01.
Not deleting the old TLS secret when switching issuers. cert-manager will not replace an existing, valid secret with a certificate from a different issuer. It sees the secret exists, assumes it's managed, and does nothing. Always delete the secret manually when switching from staging to production.
Issuer vs ClusterIssuer namespace mismatch. Issuer is namespace-scoped — a Certificate in namespace: production can't reference an Issuer in namespace: staging. Use ClusterIssuer (cluster-scoped) unless you have a specific reason to scope issuers per namespace. Most teams use ClusterIssuer exclusively.
Cleanup
bash1kubectl delete certificate yourdomain-cert wildcard-cert -n default 2kubectl delete secret yourdomain-tls wildcard-tls -n default 3kubectl delete clusterissuer letsencrypt-staging letsencrypt-production letsencrypt-dns01 4kubectl delete secret cloudflare-api-token -n cert-manager 5 6helm uninstall cert-manager -n cert-manager 7kubectl delete namespace cert-manager
Note that uninstalling the Helm release with crds.enabled=true will also delete the CRDs and all cert-manager resources. If you have other tooling depending on those CRDs, skip the Helm uninstall and clean up manually.
From here, the natural next step is integrating cert-manager with an internal PKI for private certificates using the CA issuer type, or setting up trust-manager to distribute CA bundles across namespaces. But for public-facing services, what you've built here covers the full lifecycle — issuance, renewal, wildcard support, and the debugging path when things go wrong.
Official References
- cert-manager Documentation — Official docs covering installation, issuers, certificates, and troubleshooting
- ACME HTTP-01 Challenge — How cert-manager implements HTTP-01 challenges and the Ingress solver
- ACME DNS-01 Challenge — DNS-01 solvers for all supported providers including Cloudflare, Route53, and Google Cloud DNS
- Let's Encrypt Rate Limits — Official rate limit documentation — read before using the production ACME server
- cert-manager Troubleshooting — Official debugging guide for stuck certificates and challenge failures
We built Podscape to simplify Kubernetes workflows like this — logs, events, and cluster state in one interface, without switching tools.
Struggling with this in production?
We help teams fix these exact issues. Our engineers have deployed these patterns across production environments at scale.