Platform Engineering: Building an Internal Developer Platform (2026)

Platform engineering is the practice of building internal tools and processes that make product engineering teams more productive. An internal developer platform (IDP) is the product that results: the scaffolding, CI/CD templates, infrastructure provisioning, documentation, and services that developers use to ship software.

Most IDPs fail slowly. They start with a platform team that's technically excellent building infrastructure that works — and then watch as developers route around it, build their own scripts, or just ask the platform team directly instead of using the self-service tooling. The problem is almost never technical.

The problem is product design. Platform teams that treat their platform as an engineering project rather than a product end up with infrastructure that developers are grateful exists but don't actually use.

The IDP Mental Model

An IDP has two audiences with fundamentally different relationships to it:

Application developers: Use the platform to deploy, observe, and operate their services. They interact with the platform daily. They care about speed, simplicity, and not being blocked by platform limitations. They will route around the platform if it slows them down.

Platform engineers: Build and operate the platform. They care about security, compliance, cost, and standardisation. They set the guardrails that application developers work within.

The tension: platform engineers want control, application developers want freedom. An IDP that's too controlled becomes a bottleneck; too much freedom undermines the platform's value (security, standardisation, cost management).

The resolution is golden paths — opinionated, well-supported ways to do common things. Not the only way, but the default path that works for 80% of cases with minimal friction. Teams that need something different can deviate, but they take on the maintenance burden of their deviation.

The Platform Product Backlog

Before building, understand what problems your platform should solve. Run a developer survey (one specific question works better than a long form):

"In the last two weeks, what took longer than you expected due to infrastructure, tooling, or processes you don't control?"

Cluster the answers. Common themes become backlog items. Platform work that doesn't address real developer pain points is waste, even if it's technically interesting.

Typical high-value IDP components, in rough priority order:

Service scaffolding — new service bootstrap in minutes, not days
CI/CD templates — standard pipelines that work, with known patterns for secrets, caching, testing
Namespace/environment provisioning — self-service namespace creation with sane defaults (quotas, RBAC, network policies, PSA)
Infrastructure provisioning — managed databases, queues, caches without filing tickets
Observability — logs, metrics, traces discoverable without per-service setup
Secret management — inject secrets without writing YAML plumbing
Service catalog — find what services exist, who owns them, how to reach them

Golden Paths in Practice

A golden path is a default path with enough support and quality to be the obvious first choice for most use cases.

Service Scaffolding

The classic golden path: run one command to create a new service that already has CI/CD, Kubernetes deployment manifests, a ServiceMonitor for Prometheus, a Sentry project, a Slack webhook for alerts, and a Confluence page.

bash

1# Example using Backstage CLI
2npx @backstage/create-app
3# Or a custom scaffolder:
4platform scaffold service \
5  --name payment-processor \
6  --team payments \
7  --language go \
8  --database postgres \
9  --queue sqs

What the scaffold creates:

Git repository with CI/CD workflow (GitHub Actions or GitLab CI)
Kubernetes manifests (Deployment, Service, HPA, PDB, ServiceMonitor)
Dockerfile (optimised multi-stage build for the language)
Helm chart or Kustomize overlays for dev/staging/prod
Terraform module call for the database and queue (or Crossplane claim)
README.md with runbook template and links to dashboards

The quality bar: the scaffolded service should deploy successfully on the first git push without any modification. If developers have to fix the scaffold before they can work, the golden path is broken.

CI/CD Templates

Reusable CI/CD templates prevent each team from reinventing pipeline configuration — and more importantly, prevent each team from making different security decisions:

yaml

1# GitHub Actions reusable workflow
2# .github/workflows/deploy.yml in each service — calls the template
3name: Deploy
4on:
5  push:
6    branches: [main]
7
8jobs:
9  deploy:
10    uses: platform-team/workflows/.github/workflows/deploy-service.yml@main
11    with:
12      service-name: ${{ github.event.repository.name }}
13      environment: production
14    secrets: inherit

The template handles:

Docker image build and push to ECR
Image signing (cosign)
SBOM generation (syft/trivy)
Helm upgrade to the production cluster
Slack deployment notification

Teams get all of this by calling the template. If the platform team improves the deploy pipeline (adds SLSA provenance, switches to a faster builder), all teams get the improvement automatically.

Environment Provisioning

The self-service namespace request flow removes the platform team from the critical path for environment creation:

yaml

1# Developer submits a PR to the environments repo:
2# environments/team-payments/staging.yaml
3apiVersion: platform.codingprotocols.com/v1
4kind: Environment
5metadata:
6  name: payments-staging
7spec:
8  team: payments
9  tier: staging
10  resources:
11    cpu: "8"
12    memory: 16Gi
13  databases:
14    - type: postgres
15      size: small
16  queues:
17    - type: sqs
18      name: payment-events

A platform operator (Argo CD ApplicationSet + Crossplane) reads this and provisions:

Kubernetes namespace with RBAC, quotas, NetworkPolicies
RDS PostgreSQL instance (or PVC-backed if staging uses in-cluster Postgres)
SQS queue
IAM role with Pod Identity binding for the team's service accounts

The developer submits a PR, gets it reviewed (automated policy check + team lead approval), merges it, and the environment is ready in 20 minutes. No ticket, no platform team involvement.

Backstage as the IDP Portal

Backstage (CNCF, open-source) is the most widely used IDP portal — a developer-facing UI that brings together the service catalog, software templates (scaffolding), and plugins for CI/CD, observability, and docs.

Core Backstage Components

Software Catalog: Every service, library, API, pipeline, and team registered in a YAML manifest (catalog-info.yaml) checked into the repo. The catalog becomes the authoritative source of truth for what exists and who owns it.

yaml

1# catalog-info.yaml in each service repository
2apiVersion: backstage.io/v1alpha1
3kind: Component
4metadata:
5  name: payment-processor
6  annotations:
7    github.com/project-slug: my-org/payment-processor
8    backstage.io/techdocs-ref: dir:.
9    pagerduty.com/service-id: PABC123
10    grafana/dashboard-url: https://grafana.example.com/d/payment-processor
11spec:
12  type: service
13  owner: team:payments
14  lifecycle: production
15  providesApis:
16    - payment-api
17  consumesApis:
18    - fraud-detection-api
19  dependsOn:
20    - resource:default/payment-postgres-db

Software Templates: The Backstage UI for service scaffolding. A template YAML defines the input form and the series of actions (GitHub repo creation, file template rendering, Slack notification) that run when a developer creates a new service:

yaml

1apiVersion: scaffolder.backstage.io/v1beta3
2kind: Template
3metadata:
4  name: go-service-template
5  title: Go Microservice
6  description: Creates a new Go microservice with CI/CD, K8s manifests, and monitoring
7spec:
8  parameters:
9    - title: Service Information
10      properties:
11        name:
12          title: Name
13          type: string
14          pattern: '^[a-z-]+$'
15        owner:
16          title: Owning Team
17          type: string
18          ui:field: OwnerPicker
19        description:
20          title: Description
21          type: string
22  steps:
23    - id: create-repo
24      name: Create GitHub Repository
25      action: github:repo:create
26      input:
27        name: ${{ parameters.name }}
28        org: my-org
29    - id: fetch-template
30      name: Apply Service Template
31      action: fetch:template
32      input:
33        url: ./skeleton
34        values:
35          name: ${{ parameters.name }}
36          owner: ${{ parameters.owner }}
37    - id: publish
38      name: Push to GitHub
39      action: publish:github
40      input:
41        repoUrl: github.com?repo=${{ parameters.name }}&owner=my-org
42        defaultBranch: main

What to Put in Backstage vs Not

Good Backstage use cases:

Service catalog (discovery, ownership, API contracts)
Software templates (guided scaffolding)
TechDocs (documentation co-located with services)
Links to CI/CD runs, dashboards, runbooks

Less good Backstage use cases:

Deploying services (use GitOps — Backstage can trigger it, but shouldn't own it)
Managing secrets (use ESO/Vault)
Kubernetes RBAC management (use GitOps)

Backstage is a portal and catalog, not an orchestration engine. Route actions through GitOps (PR to an environments repo) rather than calling APIs directly from Backstage plugins — PRs are auditable, reviewable, and revertible.

Crossplane for Self-Service Infrastructure

Crossplane lets platform teams define infrastructure abstractions as Kubernetes CRDs. Application teams provision infrastructure by creating Kubernetes objects (Claims) — no Terraform, no cloud console access.

yaml

1# Platform team defines an abstraction: PostgresDatabaseClaim
2# This hides whether it's RDS, Cloud SQL, or in-cluster Postgres
3---
4apiVersion: database.platform.example.com/v1alpha1
5kind: PostgresDatabaseClaim
6metadata:
7  name: payment-db
8  namespace: payments-prod
9spec:
10  storageGB: 50
11  tier: production
12  backupsEnabled: true

Crossplane Compositions define what actually gets provisioned behind the claim — the team doesn't need to know.

yaml

1# platform team's Composition (simplified):
2apiVersion: apiextensions.crossplane.io/v1
3kind: Composition
4metadata:
5  name: postgres-production
6spec:
7  compositeTypeRef:
8    apiVersion: database.platform.example.com/v1alpha1
9    kind: PostgresDatabase
10  resources:
11    - name: rds-instance
12      base:
13        apiVersion: rds.aws.upbound.io/v1beta1
14        kind: Instance
15        spec:
16          forProvider:
17            dbInstanceClass: db.t3.medium
18            engine: postgres
19            # ... cloud-specific details hidden from developers

The XRD defines two types: PostgresDatabase (the Composite Resource / XR, referenced by Compositions) and PostgresDatabaseClaim (the Claim, used by application teams in their namespaces). The Composition binds to the XR — when a developer creates a PostgresDatabaseClaim, Crossplane creates the corresponding PostgresDatabase XR, which the Composition then reconciles into actual cloud resources.

The application team sees a simple interface. The platform team controls the cloud-specific details. When the platform team migrates from RDS to Aurora, they update the Composition — application teams notice nothing.

Crossplane is powerful but has a learning curve for the platform team, particularly around Composition design and provider management. Start with one resource type (databases or queues), prove the abstraction works, then expand.

Measuring Platform Success

A platform that nobody uses is not a platform — it's abandoned infrastructure. Measure adoption and developer experience, not just technical metrics:

Adoption metrics:

% of services using golden path CI/CD templates vs custom pipelines
% of environments provisioned via self-service vs tickets
Time to first deployment for new services (target: < 30 minutes)

Developer experience (measured via survey):

"How much cognitive load does infrastructure add to your work?" (1-5)
"Did you have to wait for the platform team to make progress this week?"
"Would you recommend the platform to a new colleague?"

Platform reliability:

Golden path uptime (CI/CD template availability, scaffolding success rate)
Time to resolve platform incidents that block developer work

Survey quarterly. If adoption is low, talk to developers to understand why — the answer is almost always that the golden path has friction you didn't anticipate (documentation is missing, a specific use case isn't covered, the error messages are confusing).

Common IDP Failure Modes

Building for compliance rather than developers. The platform satisfies security audits but is so restrictive that developers can't do their jobs without filing tickets. Security requirements should be encoded as guardrails that enforce at the boundary, not walls that require exception processes.

No migration path for existing services. A new golden path that existing services can't adopt is only valuable for greenfield work. Design the golden path so that incremental adoption is possible — swap in the CI/CD template, then the namespace config, then the infrastructure provisioning — not "rewrite your service to use the platform."

Platform team as bottleneck. If deploying to production requires a platform team member to approve a ticket, the platform is failing its fundamental goal. Self-service is the point. If something requires human review, make it a PR review — not a ticket.

Documentation as an afterthought. The golden path needs to be documented to the level where a new developer can use it independently. If the answer to "how do I deploy?" is "ask the platform team," documentation has failed.

Frequently Asked Questions

Should we build our own IDP or use a commercial platform?

Commercial platforms (Port.io, OpsLevel, Cortex) provide the catalog and portal layer out of the box. They reduce the "build the portal" work but don't reduce the "design the golden path" work — the hard part of platform engineering is the abstractions and self-service workflows, not the UI. If your team has limited platform engineering capacity, a commercial portal reduces investment in tooling so you can focus on the abstractions. If you have capacity, Backstage gives more flexibility.

How many platform engineers do you need?

A rule of thumb: one platform engineer per 10–20 application engineers is enough to run the platform without becoming a bottleneck, assuming the platform is well-designed. Below that ratio, you need to be very selective about what you invest in — prioritise the golden paths with highest adoption potential.

Should platform infrastructure be in the same cluster as applications?

Platform tools (Argo CD, Prometheus, Backstage, cert-manager, Kyverno) can run on a dedicated management cluster and manage workload clusters via federation. This separates platform concerns from application concerns, prevents a misbehaving application from affecting platform tools, and allows different upgrade cycles. The trade-off is operational complexity — more clusters to manage. For small organisations (< 5 clusters), running platform tools on the same cluster with namespace isolation is simpler and acceptable.

How do you handle teams that need to deviate from the golden path?

Define what "deviation" means: teams can deviate from defaults as long as they satisfy non-negotiable requirements (mTLS enabled, PSA enforced, resource requests set, Prometheus metrics exposed). Requirements are enforced by admission control (Kyverno) — passing the requirements check is the gate, not conforming to the golden path's specific implementation. This gives teams flexibility while maintaining the security and reliability baseline.

For the Kubernetes multi-tenancy model that underpins namespace-level self-service, see Kubernetes Multi-Tenancy: Namespaces, Quotas, and Network Isolation. For the GitOps approach to managing platform configuration, see GitOps with Argo CD: Production Setup Guide. For the Golden Path templates that make IDPs self-service — Backstage scaffolding, Crossplane compositions, and ArgoCD ApplicationSets — see Platform Engineering: Golden Paths and Developer Self-Service.

Building an internal developer platform? Talk to us at Coding Protocols — we help platform teams design IDPs that developers actually adopt, not just admire.

Platform Engineering: Building an Internal Developer Platform That Gets Used

The IDP Mental Model

The Platform Product Backlog

Golden Paths in Practice

Service Scaffolding

CI/CD Templates

Environment Provisioning

Backstage as the IDP Portal

Core Backstage Components

What to Put in Backstage vs Not

Crossplane for Self-Service Infrastructure

Measuring Platform Success

Common IDP Failure Modes

Frequently Asked Questions

Should we build our own IDP or use a commercial platform?

How many platform engineers do you need?

Should platform infrastructure be in the same cluster as applications?

How do you handle teams that need to deviate from the golden path?

Related Topics

Read Next

Platform Engineering: Building Golden Paths for Developer Self-Service

Terraform for Kubernetes: Managing EKS with Infrastructure as Code

GitOps with Argo CD: A Production Setup Guide