Secondary Clusters Guide
Secondary clusters use a minimal Kubecost deployment to send their metrics to a central storage-bucket (aka durable storage) that is accessed by the primary cluster to provide a single-pane-of-glass view into all aggregated cluster costs globally. This aggregated cluster view is exclusive to Kubecost Enterprise.
Kubecost's UI will appear broken when set to a secondary cluster. It should only be used for troubleshooting.
This guide explains settings that can be tuned in order to run the minimum Kubecost components to run Kubecost more efficiently.
See the Additional resources section below for complete examples in our GitHub repo.
Kubecost Global
Disable product caching and reduce query concurrency with the following parameters:
Grafana
Grafana is not needed on secondary clusters.
Prometheus
Kubecost and its accompanying Prometheus collect a reduced set of metrics that allow for lower resource/storage usage than a standard Prometheus deployment.
The following configuration options further reduce resource consumption when not using the Kubecost frontend:
Potentially reducing retention even further, metrics are sent to the storage-bucket every 2 hours.
You can tune prometheus.server.persistentVolume.size
depending on scale, or outright disable persistent storage.
Thanos
Disable Thanos components. These are only used for troubleshooting on secondary clusters. See this guide for troubleshooting via kubectl logs.
Secondary clusters write to the global storage-bucket via the thanos-sidecar on the prometheus-server pod.
Node-Exporter
You can disable node-exporter and the service account if cluster/node rightsizing recommendations are not required.
node-export must be disabled if there is an existing DaemonSet. More info here.
Helm values
For reference, this secondary-clusters.yaml
snippet is a list of the most common settings for efficient secondary clusters:
Additional resources
You can find complete installation guides and sample files on our repo.
Additional considerations for properly tuning resource consumption is here.
Last updated