By Ali Abdelrahman

Multi-Cluster Kubernetes with GitOps and Crossplane: A Composable Cloud Architecture

kubernetesgitopscrossplanemulti-clusterinfrastructuredevopscloud-native

Managing multiple Kubernetes clusters across different cloud providers used to be a nightmare. You’d have separate Terraform configs for each cloud, different deployment pipelines, and no unified way to track what’s running where. Teams spent more time fighting infrastructure than building features.

That’s changing fast. Organizations are moving toward multi-cluster, multi-cloud setups managed through GitOps and Crossplane. This approach gives you a single control plane that can provision and manage clusters across AWS, Azure, and GCP using the same Kubernetes-native tools you already know.

Here’s the thing: GitOps isn’t just about deploying applications anymore. It’s becoming the standard way to manage your entire infrastructure lifecycle. And Crossplane? It’s turning into the Kubernetes-native alternative to Terraform that actually makes sense.

Why This Matters Now

The complexity is real. You might have production workloads on AWS EKS, staging on Azure AKS, and development clusters on GCP GKE. Each environment needs different configurations, security policies, and monitoring setups. Without a unified approach, you end up with configuration drift, security gaps, and teams that can’t move fast.

GitOps solves the control problem. Instead of running commands manually or through separate CI/CD pipelines, you define everything in Git. When you push changes, ArgoCD automatically syncs them across your clusters. It’s declarative, auditable, and reversible.

Crossplane handles the infrastructure side. Instead of writing Terraform modules for each cloud provider, you define CompositeResourceDefinitions (XRDs) that work across all clouds. Want a Kubernetes cluster? Define it once, deploy it anywhere.

The combination is powerful. You get infrastructure as code that’s actually code, not configuration files. You get GitOps workflows that work for both apps and infrastructure. And you get a control plane that scales with your organization.

How GitOps and Crossplane Work Together

Think of GitOps as your control mechanism and Crossplane as your infrastructure engine. GitOps ensures your desired state matches your actual state. Crossplane makes sure your infrastructure matches your desired state.

Here’s how they connect:

GitOps provides the control loop. You define your desired state in Git repositories. ArgoCD watches these repos and continuously reconciles your clusters to match the desired state. If someone changes something manually, ArgoCD notices and fixes it.

Crossplane provides the infrastructure abstraction. Instead of writing cloud-specific code, you define high-level resources like “KubernetesCluster” or “DatabaseInstance.” Crossplane translates these into the right API calls for each cloud provider.

The Kubernetes API becomes your interface. Everything looks like Kubernetes resources. You use kubectl to manage infrastructure the same way you manage applications. No more switching between different tools and interfaces.

This separation matters because it makes your system composable. You can swap out the GitOps controller (ArgoCD, Flux, etc.) without changing your infrastructure definitions. You can add new cloud providers to Crossplane without changing your GitOps workflows.

Control Plane vs Data Plane

Understanding this separation is crucial for building scalable multi-cluster architectures.

The control plane runs in your management cluster. It contains:

  • Crossplane core and providers
  • ArgoCD for GitOps coordination
  • Policy engines for governance
  • Observability and monitoring tools

The data plane consists of your workload clusters. These run your applications and handle the actual traffic. They’re managed by the control plane but operate independently.

This separation gives you several benefits:

Centralized management. You configure everything from one place. Policies, security settings, and monitoring configurations flow down to all clusters automatically.

Independent operation. If your control plane goes down, your workload clusters keep running. They don’t depend on the management cluster for day-to-day operations.

Scalable governance. You can enforce policies across hundreds of clusters without managing each one individually. Compliance becomes a configuration problem, not an operational one.

Cost optimization. You can spin up clusters in different regions or cloud providers based on workload requirements. The control plane handles the complexity of managing multiple environments.

Setting Up Your Multi-Cluster Control Plane

Let’s build this step by step. We’ll start with a management cluster that can provision and manage other clusters across different cloud providers.

Installing Crossplane

First, install Crossplane in your management cluster:

helm repo add crossplane-stable https://charts.crossplane.io/stable
helm repo update
helm install crossplane crossplane-stable/crossplane \
  --namespace crossplane-system \
  --create-namespace

Now install the cloud providers you need:

# AWS Provider
kubectl crossplane install provider xpkg.upbound.io/crossplane-contrib/provider-aws:v0.44.0

# Azure Provider  
kubectl crossplane install provider xpkg.upbound.io/crossplane-contrib/provider-azure:v0.44.0

# GCP Provider
kubectl crossplane install provider xpkg.upbound.io/crossplane-contrib/provider-gcp:v0.44.0

Configuring Cloud Providers

Each provider needs credentials and configuration. Here’s how to set up AWS:

apiVersion: v1
kind: Secret
metadata:
  name: aws-creds
  namespace: crossplane-system
type: Opaque
data:
  credentials: <base64-encoded-aws-credentials>

---
apiVersion: aws.crossplane.io/v1beta1
kind: ProviderConfig
metadata:
  name: aws-provider
spec:
  credentials:
    source: Secret
    secretRef:
      namespace: crossplane-system
      name: aws-creds
      key: credentials

For GCP, you’ll need a service account key:

apiVersion: v1
kind: Secret
metadata:
  name: gcp-creds
  namespace: crossplane-system
type: Opaque
data:
  credentials: <base64-encoded-service-account-key>

---
apiVersion: gcp.crossplane.io/v1beta1
kind: ProviderConfig
metadata:
  name: gcp-provider
spec:
  projectID: your-gcp-project
  credentials:
    source: Secret
    secretRef:
      namespace: crossplane-system
      name: gcp-creds
      key: credentials

Defining Composite Resources

Now create CompositeResourceDefinitions that abstract away cloud-specific details. Here’s one for a Kubernetes cluster:

apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
metadata:
  name: xkubernetesclusters.infrastructure.example.org
spec:
  group: infrastructure.example.org
  names:
    kind: XKubernetesCluster
    plural: xkubernetesclusters
  versions:
  - name: v1alpha1
    served: true
    referenceable: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              parameters:
                type: object
                properties:
                  region:
                    type: string
                  nodeCount:
                    type: integer
                  nodeSize:
                    type: string
                  cloudProvider:
                    type: string
                    enum: ["aws", "azure", "gcp"]
            required:
            - parameters
        required:
        - spec

This XRD defines a high-level resource that works across all cloud providers. The cloudProvider parameter determines which underlying resources get created.

Creating Compositions

Compositions define how to translate the high-level resource into cloud-specific resources. Here’s one for AWS EKS:

apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
  name: eks-cluster
spec:
  writeConnectionSecretsToNamespace: crossplane-system
  compositeTypeRef:
    apiVersion: infrastructure.example.org/v1alpha1
    kind: XKubernetesCluster
  resources:
  - name: eks-cluster
    base:
      apiVersion: eks.aws.crossplane.io/v1beta1
      kind: Cluster
      spec:
        forProvider:
          region: us-west-2
          roleArnRef:
            name: eks-cluster-role
          version: "1.28"
        writeConnectionSecretsToRef:
          name: eks-cluster-connection
          namespace: crossplane-system
    patches:
    - type: FromCompositeFieldPath
      fromFieldPath: spec.parameters.region
      toFieldPath: spec.forProvider.region
  - name: eks-nodegroup
    base:
      apiVersion: eks.aws.crossplane.io/v1beta1
      kind: NodeGroup
      spec:
        forProvider:
          region: us-west-2
          clusterNameRef:
            name: eks-cluster
          nodeRoleArnRef:
            name: eks-node-role
          scalingConfig:
            desiredSize: 3
            maxSize: 10
            minSize: 1
          instanceTypes:
          - t3.medium
    patches:
    - type: FromCompositeFieldPath
      fromFieldPath: spec.parameters.region
      toFieldPath: spec.forProvider.region
    - type: FromCompositeFieldPath
      fromFieldPath: spec.parameters.nodeCount
      toFieldPath: spec.forProvider.scalingConfig.desiredSize
    - type: FromCompositeFieldPath
      fromFieldPath: spec.parameters.nodeSize
      toFieldPath: spec.forProvider.instanceTypes[0]

Connecting ArgoCD

Now set up ArgoCD to manage your clusters. Install it in your management cluster:

kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

Create an ApplicationSet that automatically manages clusters based on Git repository structure:

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: cluster-management
  namespace: argocd
spec:
  generators:
  - clusters:
      selector:
        matchLabels:
          argocd.argoproj.io/secret-type: cluster
  template:
    metadata:
      name: '{{name}}-cluster'
    spec:
      project: default
      source:
        repoURL: https://github.com/your-org/cluster-configs
        targetRevision: HEAD
        path: 'clusters/{{name}}'
      destination:
        server: '{{server}}'
        namespace: default
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
        - CreateNamespace=true

This ApplicationSet watches for new cluster secrets in ArgoCD and automatically creates applications to manage them.

Real-World Workflow

Here’s how this works in practice. A developer needs a new staging environment:

Step 1: Developer creates a claim

apiVersion: infrastructure.example.org/v1alpha1
kind: KubernetesCluster
metadata:
  name: staging-cluster
spec:
  parameters:
    region: us-east-1
    nodeCount: 3
    nodeSize: t3.large
    cloudProvider: aws

Step 2: Crossplane provisions infrastructure

  • Creates EKS cluster in AWS
  • Sets up node groups with specified configuration
  • Configures networking and security groups
  • Generates kubeconfig for the new cluster

Step 3: ArgoCD registers the cluster

argocd cluster add <new-cluster-endpoint> \
  --name staging-cluster \
  --kubeconfig /path/to/kubeconfig

Step 4: ArgoCD syncs applications

  • Deploys monitoring stack (Prometheus, Grafana)
  • Configures ingress controllers
  • Sets up security policies
  • Deploys application workloads

The entire process takes about 10-15 minutes. The developer gets a fully configured cluster without touching any cloud consoles or running manual commands.

Use Cases That Make Sense

Staging/Production Separation: Keep your staging and production environments completely isolated. Use different cloud providers or regions to test disaster recovery scenarios.

Cost Optimization: Spin up clusters in different regions based on workload requirements. Use spot instances for development, reserved instances for production.

Disaster Recovery: Automatically provision backup clusters in different regions. Use GitOps to keep them in sync with your primary environment.

Multi-Tenant Applications: Give each customer their own cluster while managing them from a central control plane. Enforce consistent policies across all tenants.

Security, RBAC, and Secret Management

Security in multi-cluster environments is complex. You need to manage secrets across multiple clusters, delegate permissions appropriately, and ensure consistent security policies.

External Secrets Operator

Instead of manually copying secrets to each cluster, use External Secrets Operator to sync them from a central store:

apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: vault-backend
  namespace: external-secrets-system
spec:
  provider:
    vault:
      server: "https://vault.example.com"
      path: "secret"
      version: "v2"
      auth:
        kubernetes:
          mountPath: "kubernetes"
          role: "external-secrets"
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: database-credentials
  namespace: default
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: vault-backend
    kind: SecretStore
  target:
    name: database-secret
    creationPolicy: Owner
  data:
  - secretKey: username
    remoteRef:
      key: database/credentials
      property: username
  - secretKey: password
    remoteRef:
      key: database/credentials
      property: password

This approach keeps your secrets in one place (Vault, AWS Secrets Manager, etc.) and automatically syncs them to all clusters that need them.

Cluster-Level RBAC Delegation

Use ArgoCD’s RBAC system to delegate cluster management to different teams:

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-rbac-cm
  namespace: argocd
data:
  policy.default: role:readonly
  policy.csv: |
    p, role:platform-admin, applications, *, */*, allow
    p, role:platform-admin, clusters, *, *, allow
    p, role:team-lead, applications, *, team-*/*, allow
    p, role:team-lead, applications, sync, team-*/*, allow
    g, platform-team, role:platform-admin
    g, team-leads, role:team-lead

This configuration gives platform admins full access to all clusters and applications, while team leads can only manage applications in their namespace.

Network Policies

Enforce consistent network policies across all clusters using GitOps:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: default
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-monitoring
  namespace: default
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: monitoring
    ports:
    - protocol: TCP
      port: 8080

ArgoCD will automatically apply these policies to all clusters, ensuring consistent security posture.

Observability and Health Management

Managing observability across multiple clusters requires a different approach than single-cluster setups. You need to aggregate metrics, traces, and logs from all clusters while maintaining the ability to drill down into specific environments.

Prometheus Federation

Set up Prometheus federation to collect metrics from all clusters:

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-federation
  namespace: monitoring
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
    scrape_configs:
    - job_name: 'federate'
      scrape_interval: 15s
      honor_labels: true
      metrics_path: '/federate'
      params:
        'match[]':
          - '{job=~".+"}'
      static_configs:
      - targets:
        - 'staging-prometheus.monitoring.svc.cluster.local:9090'
        - 'prod-prometheus.monitoring.svc.cluster.local:9090'
        - 'dr-prometheus.monitoring.svc.cluster.local:9090'

This configuration tells your central Prometheus to scrape metrics from all cluster-specific Prometheus instances.

OpenTelemetry for Distributed Tracing

Use OpenTelemetry to trace requests across clusters:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-collector
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: otel-collector
  template:
    metadata:
      labels:
        app: otel-collector
    spec:
      containers:
      - name: otel-collector
        image: otel/opentelemetry-collector-contrib:latest
        args:
        - --config=/etc/otel-collector-config.yaml
        volumeMounts:
        - name: otel-collector-config
          mountPath: /etc/otel-collector-config.yaml
          subPath: otel-collector-config.yaml
      volumes:
      - name: otel-collector-config
        configMap:
          name: otel-collector-config

Grafana Dashboards

Create dashboards that show the health of your entire multi-cluster setup:

apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-dashboard-multicluster
  namespace: monitoring
  labels:
    grafana_dashboard: "1"
data:
  multicluster-overview.json: |
    {
      "dashboard": {
        "title": "Multi-Cluster Overview",
        "panels": [
          {
            "title": "Cluster Health",
            "type": "stat",
            "targets": [
              {
                "expr": "up{job=\"kubernetes-cluster\"}",
                "legendFormat": "{{cluster}}"
              }
            ]
          },
          {
            "title": "Resource Usage by Cluster",
            "type": "graph",
            "targets": [
              {
                "expr": "sum(rate(container_cpu_usage_seconds_total[5m])) by (cluster)",
                "legendFormat": "{{cluster}} CPU"
              }
            ]
          }
        ]
      }
    }

This dashboard gives you a bird’s-eye view of all your clusters and their health status.

Best Practices and Common Gotchas

After working with this setup for a while, here are the things that matter:

Version Control Layout

Organize your Git repositories logically:

cluster-configs/
├── clusters/
│   ├── staging/
│   │   ├── applications/
│   │   ├── policies/
│   │   └── monitoring/
│   ├── production/
│   │   ├── applications/
│   │   ├── policies/
│   │   └── monitoring/
│   └── dr/
│       ├── applications/
│       ├── policies/
│       └── monitoring/
├── crossplane/
│   ├── xrds/
│   ├── compositions/
│   └── claims/
└── argocd/
    ├── applicationsets/
    └── projects/

This structure makes it easy to find what you’re looking for and understand the relationships between different components.

Avoiding Configuration Drift

Use ArgoCD’s sync policies to prevent drift:

spec:
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
    - CreateNamespace=true
    - PrunePropagationPolicy=foreground
    - PruneLast=true

The selfHeal: true option automatically fixes any manual changes. The prune: true option removes resources that are no longer defined in Git.

Rollback Safety

Always test changes in staging before applying to production:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: staging-app
spec:
  source:
    repoURL: https://github.com/your-org/app-configs
    targetRevision: HEAD
    path: staging
  destination:
    server: https://staging-cluster.example.com
    namespace: default
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: production-app
spec:
  source:
    repoURL: https://github.com/your-org/app-configs
    targetRevision: v1.2.3  # Pinned to stable version
    path: production
  destination:
    server: https://production-cluster.example.com
    namespace: default

Pin production to specific versions and only promote after testing in staging.

The Future of Cloud-Native Control Planes

This approach represents a fundamental shift in how we think about infrastructure management. Instead of treating infrastructure as a separate concern from applications, we’re treating it as just another type of application that runs on Kubernetes.

The benefits are clear:

Consistency. The same tools and processes work across all environments and cloud providers.

Scalability. You can manage hundreds of clusters with the same effort it used to take to manage one.

Reliability. GitOps ensures your desired state matches your actual state, automatically fixing drift and configuration errors.

Developer Experience. Teams can provision and manage their own infrastructure using the same Kubernetes tools they already know.

Cost Control. You can optimize costs by automatically scaling clusters based on demand and using the most cost-effective cloud provider for each workload.

GitOps and Crossplane aren’t just tools—they’re the foundation for the next generation of DevOps automation. As organizations move toward cloud-native architectures, this approach will become the standard way to manage infrastructure at scale.

The future belongs to composable, declarative systems that can adapt to changing requirements without requiring manual intervention. This is how you build infrastructure that actually works for your organization instead of against it.

Join the Discussion

Have thoughts on this article? Share your insights and engage with the community.