Multi-Cluster Kubernetes with GitOps and Crossplane: A Composable Cloud Architecture
Managing multiple Kubernetes clusters across different cloud providers used to be a nightmare. You’d have separate Terraform configs for each cloud, different deployment pipelines, and no unified way to track what’s running where. Teams spent more time fighting infrastructure than building features.
That’s changing fast. Organizations are moving toward multi-cluster, multi-cloud setups managed through GitOps and Crossplane. This approach gives you a single control plane that can provision and manage clusters across AWS, Azure, and GCP using the same Kubernetes-native tools you already know.
Here’s the thing: GitOps isn’t just about deploying applications anymore. It’s becoming the standard way to manage your entire infrastructure lifecycle. And Crossplane? It’s turning into the Kubernetes-native alternative to Terraform that actually makes sense.
Why This Matters Now
The complexity is real. You might have production workloads on AWS EKS, staging on Azure AKS, and development clusters on GCP GKE. Each environment needs different configurations, security policies, and monitoring setups. Without a unified approach, you end up with configuration drift, security gaps, and teams that can’t move fast.
GitOps solves the control problem. Instead of running commands manually or through separate CI/CD pipelines, you define everything in Git. When you push changes, ArgoCD automatically syncs them across your clusters. It’s declarative, auditable, and reversible.
Crossplane handles the infrastructure side. Instead of writing Terraform modules for each cloud provider, you define CompositeResourceDefinitions (XRDs) that work across all clouds. Want a Kubernetes cluster? Define it once, deploy it anywhere.
The combination is powerful. You get infrastructure as code that’s actually code, not configuration files. You get GitOps workflows that work for both apps and infrastructure. And you get a control plane that scales with your organization.
How GitOps and Crossplane Work Together
Think of GitOps as your control mechanism and Crossplane as your infrastructure engine. GitOps ensures your desired state matches your actual state. Crossplane makes sure your infrastructure matches your desired state.
Here’s how they connect:
GitOps provides the control loop. You define your desired state in Git repositories. ArgoCD watches these repos and continuously reconciles your clusters to match the desired state. If someone changes something manually, ArgoCD notices and fixes it.
Crossplane provides the infrastructure abstraction. Instead of writing cloud-specific code, you define high-level resources like “KubernetesCluster” or “DatabaseInstance.” Crossplane translates these into the right API calls for each cloud provider.
The Kubernetes API becomes your interface. Everything looks like Kubernetes resources. You use kubectl to manage infrastructure the same way you manage applications. No more switching between different tools and interfaces.
This separation matters because it makes your system composable. You can swap out the GitOps controller (ArgoCD, Flux, etc.) without changing your infrastructure definitions. You can add new cloud providers to Crossplane without changing your GitOps workflows.
Control Plane vs Data Plane
Understanding this separation is crucial for building scalable multi-cluster architectures.
The control plane runs in your management cluster. It contains:
- Crossplane core and providers
- ArgoCD for GitOps coordination
- Policy engines for governance
- Observability and monitoring tools
The data plane consists of your workload clusters. These run your applications and handle the actual traffic. They’re managed by the control plane but operate independently.
This separation gives you several benefits:
Centralized management. You configure everything from one place. Policies, security settings, and monitoring configurations flow down to all clusters automatically.
Independent operation. If your control plane goes down, your workload clusters keep running. They don’t depend on the management cluster for day-to-day operations.
Scalable governance. You can enforce policies across hundreds of clusters without managing each one individually. Compliance becomes a configuration problem, not an operational one.
Cost optimization. You can spin up clusters in different regions or cloud providers based on workload requirements. The control plane handles the complexity of managing multiple environments.
Setting Up Your Multi-Cluster Control Plane
Let’s build this step by step. We’ll start with a management cluster that can provision and manage other clusters across different cloud providers.
Installing Crossplane
First, install Crossplane in your management cluster:
helm repo add crossplane-stable https://charts.crossplane.io/stable
helm repo update
helm install crossplane crossplane-stable/crossplane \
--namespace crossplane-system \
--create-namespace
Now install the cloud providers you need:
# AWS Provider
kubectl crossplane install provider xpkg.upbound.io/crossplane-contrib/provider-aws:v0.44.0
# Azure Provider
kubectl crossplane install provider xpkg.upbound.io/crossplane-contrib/provider-azure:v0.44.0
# GCP Provider
kubectl crossplane install provider xpkg.upbound.io/crossplane-contrib/provider-gcp:v0.44.0
Configuring Cloud Providers
Each provider needs credentials and configuration. Here’s how to set up AWS:
apiVersion: v1
kind: Secret
metadata:
name: aws-creds
namespace: crossplane-system
type: Opaque
data:
credentials: <base64-encoded-aws-credentials>
---
apiVersion: aws.crossplane.io/v1beta1
kind: ProviderConfig
metadata:
name: aws-provider
spec:
credentials:
source: Secret
secretRef:
namespace: crossplane-system
name: aws-creds
key: credentials
For GCP, you’ll need a service account key:
apiVersion: v1
kind: Secret
metadata:
name: gcp-creds
namespace: crossplane-system
type: Opaque
data:
credentials: <base64-encoded-service-account-key>
---
apiVersion: gcp.crossplane.io/v1beta1
kind: ProviderConfig
metadata:
name: gcp-provider
spec:
projectID: your-gcp-project
credentials:
source: Secret
secretRef:
namespace: crossplane-system
name: gcp-creds
key: credentials
Defining Composite Resources
Now create CompositeResourceDefinitions that abstract away cloud-specific details. Here’s one for a Kubernetes cluster:
apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
metadata:
name: xkubernetesclusters.infrastructure.example.org
spec:
group: infrastructure.example.org
names:
kind: XKubernetesCluster
plural: xkubernetesclusters
versions:
- name: v1alpha1
served: true
referenceable: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
parameters:
type: object
properties:
region:
type: string
nodeCount:
type: integer
nodeSize:
type: string
cloudProvider:
type: string
enum: ["aws", "azure", "gcp"]
required:
- parameters
required:
- spec
This XRD defines a high-level resource that works across all cloud providers. The cloudProvider parameter determines which underlying resources get created.
Creating Compositions
Compositions define how to translate the high-level resource into cloud-specific resources. Here’s one for AWS EKS:
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
name: eks-cluster
spec:
writeConnectionSecretsToNamespace: crossplane-system
compositeTypeRef:
apiVersion: infrastructure.example.org/v1alpha1
kind: XKubernetesCluster
resources:
- name: eks-cluster
base:
apiVersion: eks.aws.crossplane.io/v1beta1
kind: Cluster
spec:
forProvider:
region: us-west-2
roleArnRef:
name: eks-cluster-role
version: "1.28"
writeConnectionSecretsToRef:
name: eks-cluster-connection
namespace: crossplane-system
patches:
- type: FromCompositeFieldPath
fromFieldPath: spec.parameters.region
toFieldPath: spec.forProvider.region
- name: eks-nodegroup
base:
apiVersion: eks.aws.crossplane.io/v1beta1
kind: NodeGroup
spec:
forProvider:
region: us-west-2
clusterNameRef:
name: eks-cluster
nodeRoleArnRef:
name: eks-node-role
scalingConfig:
desiredSize: 3
maxSize: 10
minSize: 1
instanceTypes:
- t3.medium
patches:
- type: FromCompositeFieldPath
fromFieldPath: spec.parameters.region
toFieldPath: spec.forProvider.region
- type: FromCompositeFieldPath
fromFieldPath: spec.parameters.nodeCount
toFieldPath: spec.forProvider.scalingConfig.desiredSize
- type: FromCompositeFieldPath
fromFieldPath: spec.parameters.nodeSize
toFieldPath: spec.forProvider.instanceTypes[0]
Connecting ArgoCD
Now set up ArgoCD to manage your clusters. Install it in your management cluster:
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
Create an ApplicationSet that automatically manages clusters based on Git repository structure:
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: cluster-management
namespace: argocd
spec:
generators:
- clusters:
selector:
matchLabels:
argocd.argoproj.io/secret-type: cluster
template:
metadata:
name: '{{name}}-cluster'
spec:
project: default
source:
repoURL: https://github.com/your-org/cluster-configs
targetRevision: HEAD
path: 'clusters/{{name}}'
destination:
server: '{{server}}'
namespace: default
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
This ApplicationSet watches for new cluster secrets in ArgoCD and automatically creates applications to manage them.
Real-World Workflow
Here’s how this works in practice. A developer needs a new staging environment:
Step 1: Developer creates a claim
apiVersion: infrastructure.example.org/v1alpha1
kind: KubernetesCluster
metadata:
name: staging-cluster
spec:
parameters:
region: us-east-1
nodeCount: 3
nodeSize: t3.large
cloudProvider: aws
Step 2: Crossplane provisions infrastructure
- Creates EKS cluster in AWS
- Sets up node groups with specified configuration
- Configures networking and security groups
- Generates kubeconfig for the new cluster
Step 3: ArgoCD registers the cluster
argocd cluster add <new-cluster-endpoint> \
--name staging-cluster \
--kubeconfig /path/to/kubeconfig
Step 4: ArgoCD syncs applications
- Deploys monitoring stack (Prometheus, Grafana)
- Configures ingress controllers
- Sets up security policies
- Deploys application workloads
The entire process takes about 10-15 minutes. The developer gets a fully configured cluster without touching any cloud consoles or running manual commands.
Use Cases That Make Sense
Staging/Production Separation: Keep your staging and production environments completely isolated. Use different cloud providers or regions to test disaster recovery scenarios.
Cost Optimization: Spin up clusters in different regions based on workload requirements. Use spot instances for development, reserved instances for production.
Disaster Recovery: Automatically provision backup clusters in different regions. Use GitOps to keep them in sync with your primary environment.
Multi-Tenant Applications: Give each customer their own cluster while managing them from a central control plane. Enforce consistent policies across all tenants.
Security, RBAC, and Secret Management
Security in multi-cluster environments is complex. You need to manage secrets across multiple clusters, delegate permissions appropriately, and ensure consistent security policies.
External Secrets Operator
Instead of manually copying secrets to each cluster, use External Secrets Operator to sync them from a central store:
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: vault-backend
namespace: external-secrets-system
spec:
provider:
vault:
server: "https://vault.example.com"
path: "secret"
version: "v2"
auth:
kubernetes:
mountPath: "kubernetes"
role: "external-secrets"
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: database-credentials
namespace: default
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-backend
kind: SecretStore
target:
name: database-secret
creationPolicy: Owner
data:
- secretKey: username
remoteRef:
key: database/credentials
property: username
- secretKey: password
remoteRef:
key: database/credentials
property: password
This approach keeps your secrets in one place (Vault, AWS Secrets Manager, etc.) and automatically syncs them to all clusters that need them.
Cluster-Level RBAC Delegation
Use ArgoCD’s RBAC system to delegate cluster management to different teams:
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-rbac-cm
namespace: argocd
data:
policy.default: role:readonly
policy.csv: |
p, role:platform-admin, applications, *, */*, allow
p, role:platform-admin, clusters, *, *, allow
p, role:team-lead, applications, *, team-*/*, allow
p, role:team-lead, applications, sync, team-*/*, allow
g, platform-team, role:platform-admin
g, team-leads, role:team-lead
This configuration gives platform admins full access to all clusters and applications, while team leads can only manage applications in their namespace.
Network Policies
Enforce consistent network policies across all clusters using GitOps:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: default
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-monitoring
namespace: default
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: monitoring
ports:
- protocol: TCP
port: 8080
ArgoCD will automatically apply these policies to all clusters, ensuring consistent security posture.
Observability and Health Management
Managing observability across multiple clusters requires a different approach than single-cluster setups. You need to aggregate metrics, traces, and logs from all clusters while maintaining the ability to drill down into specific environments.
Prometheus Federation
Set up Prometheus federation to collect metrics from all clusters:
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-federation
namespace: monitoring
data:
prometheus.yml: |
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'federate'
scrape_interval: 15s
honor_labels: true
metrics_path: '/federate'
params:
'match[]':
- '{job=~".+"}'
static_configs:
- targets:
- 'staging-prometheus.monitoring.svc.cluster.local:9090'
- 'prod-prometheus.monitoring.svc.cluster.local:9090'
- 'dr-prometheus.monitoring.svc.cluster.local:9090'
This configuration tells your central Prometheus to scrape metrics from all cluster-specific Prometheus instances.
OpenTelemetry for Distributed Tracing
Use OpenTelemetry to trace requests across clusters:
apiVersion: apps/v1
kind: Deployment
metadata:
name: otel-collector
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: otel-collector
template:
metadata:
labels:
app: otel-collector
spec:
containers:
- name: otel-collector
image: otel/opentelemetry-collector-contrib:latest
args:
- --config=/etc/otel-collector-config.yaml
volumeMounts:
- name: otel-collector-config
mountPath: /etc/otel-collector-config.yaml
subPath: otel-collector-config.yaml
volumes:
- name: otel-collector-config
configMap:
name: otel-collector-config
Grafana Dashboards
Create dashboards that show the health of your entire multi-cluster setup:
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboard-multicluster
namespace: monitoring
labels:
grafana_dashboard: "1"
data:
multicluster-overview.json: |
{
"dashboard": {
"title": "Multi-Cluster Overview",
"panels": [
{
"title": "Cluster Health",
"type": "stat",
"targets": [
{
"expr": "up{job=\"kubernetes-cluster\"}",
"legendFormat": "{{cluster}}"
}
]
},
{
"title": "Resource Usage by Cluster",
"type": "graph",
"targets": [
{
"expr": "sum(rate(container_cpu_usage_seconds_total[5m])) by (cluster)",
"legendFormat": "{{cluster}} CPU"
}
]
}
]
}
}
This dashboard gives you a bird’s-eye view of all your clusters and their health status.
Best Practices and Common Gotchas
After working with this setup for a while, here are the things that matter:
Version Control Layout
Organize your Git repositories logically:
cluster-configs/
├── clusters/
│ ├── staging/
│ │ ├── applications/
│ │ ├── policies/
│ │ └── monitoring/
│ ├── production/
│ │ ├── applications/
│ │ ├── policies/
│ │ └── monitoring/
│ └── dr/
│ ├── applications/
│ ├── policies/
│ └── monitoring/
├── crossplane/
│ ├── xrds/
│ ├── compositions/
│ └── claims/
└── argocd/
├── applicationsets/
└── projects/
This structure makes it easy to find what you’re looking for and understand the relationships between different components.
Avoiding Configuration Drift
Use ArgoCD’s sync policies to prevent drift:
spec:
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
- PrunePropagationPolicy=foreground
- PruneLast=true
The selfHeal: true option automatically fixes any manual changes. The prune: true option removes resources that are no longer defined in Git.
Rollback Safety
Always test changes in staging before applying to production:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: staging-app
spec:
source:
repoURL: https://github.com/your-org/app-configs
targetRevision: HEAD
path: staging
destination:
server: https://staging-cluster.example.com
namespace: default
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: production-app
spec:
source:
repoURL: https://github.com/your-org/app-configs
targetRevision: v1.2.3 # Pinned to stable version
path: production
destination:
server: https://production-cluster.example.com
namespace: default
Pin production to specific versions and only promote after testing in staging.
The Future of Cloud-Native Control Planes
This approach represents a fundamental shift in how we think about infrastructure management. Instead of treating infrastructure as a separate concern from applications, we’re treating it as just another type of application that runs on Kubernetes.
The benefits are clear:
Consistency. The same tools and processes work across all environments and cloud providers.
Scalability. You can manage hundreds of clusters with the same effort it used to take to manage one.
Reliability. GitOps ensures your desired state matches your actual state, automatically fixing drift and configuration errors.
Developer Experience. Teams can provision and manage their own infrastructure using the same Kubernetes tools they already know.
Cost Control. You can optimize costs by automatically scaling clusters based on demand and using the most cost-effective cloud provider for each workload.
GitOps and Crossplane aren’t just tools—they’re the foundation for the next generation of DevOps automation. As organizations move toward cloud-native architectures, this approach will become the standard way to manage infrastructure at scale.
The future belongs to composable, declarative systems that can adapt to changing requirements without requiring manual intervention. This is how you build infrastructure that actually works for your organization instead of against it.
Join the Discussion
Have thoughts on this article? Share your insights and engage with the community.