Secondary Clusters Guide

Secondary clusters use a minimal Kubecost deployment to send their metrics to a central storage-bucket (aka durable storage) that is accessed by the primary cluster to provide a single-pane-of-glass view into all aggregated cluster costs globally. This aggregated cluster view is exclusive to Kubecost Enterprise.

Kubecost's UI will appear broken when set to a secondary cluster. It should only be used for troubleshooting.

This guide explains settings that can be tuned in order to run the minimum Kubecost components to run Kubecost more efficiently.

See the Additional resources section below for complete examples in our GitHub repo.

Kubecost global

Disable product caching and reduce query concurrency with the following parameters:

--set federatedETL.federatedCluster=true
--set federatedETL.agentOnly=true

Grafana

Grafana is not needed on secondary clusters.

--set global.grafana.enabled=false
--set global.grafana.proxy=false

Prometheus

Kubecost and its accompanying Prometheus collect a reduced set of metrics that allow for lower resource/storage usage than a standard Prometheus deployment.

The following configuration options further reduce resource consumption when not using the Kubecost frontend:

--set prometheus.server.retention=2d

Potentially reducing retention even further, metrics are sent to the storage-bucket every 2 hours.

You can tune prometheus.server.persistentVolume.size depending on scale, or outright disable persistent storage.

Node-Exporter

Node-exporter is disabled by default. You should keep it this way if cluster/node right-sizing recommendations are not required.

Node-exporter must remain disabled if there is an existing DaemonSet. More info here.

Helm values

For reference, this secondary-clusters.yaml snippet is a list of the most common settings for efficient secondary clusters:

kubecostProductConfigs:
  clusterName: kubecostProductConfigs_clusterName
  # productKey not needed on secondary clusters
kubecostModel:
  warmCache: false
  warmSavingsCache: false
  etl: false
  etlCloudAsset: false
  maxQueryConcurrency: 1
global:
  grafana:
    enabled: false
    proxy: false
prometheus:
  server:
    global:
      external_labels:
        # cluster_id should be unique for all clusters and the same value as .kubecostProductConfigs.clusterName
        cluster_id: kubecostProductConfigs_clusterName
    retention: 2d
  # nodeExporter:
  #   enabled: false
  # serviceAccounts:
  #   nodeExporter:
  #     create: false
thanos:
  compact:
    enabled: false
  bucket:
    enabled: false
  query:
    enabled: false
  queryFrontend:
    enabled: false
  store:
    enabled: false

Additional resources

You can find complete installation guides and sample files on our repo.

Additional considerations for properly tuning resource consumption is here.

Last updated