Kubecost can run on clusters with thousands of nodes when resource consumption is properly tuned. Here's a chart with some of the steps you can take to tune Kubecost, along with descriptions of each.
CloudCost allows Kubecost to pull in spend data from your integrated cloud service providers.
Cloud cost metrics for all accounts can be pulled in on your primary cluster by pointing Kubecost to one or more management accounts. Therefore, you can disable CloudCost on secondary clusters by setting the following Helm value:
-- cloudCost.enabled=false
Secondary clusters can be configured strictly as metric emitters to save memory. Learn more about how to best configure them in our Secondary Clusters Guide.
This method is only available for AWS cloud billing integrations. Kubecost is capable of tracking each individual cloud billing line item. However on certain accounts this can be quite large. If provider IDs are excluded, Kubecost won't cache granular data. Instead, Kubecost caches aggregate data and make an ad-hoc query to the AWS Cost and Usage Report to get granular data resulting in slow load times but less memory consumption.
Lowering query concurrency for the Kubecost ETL build will mean ETL takes longer to build, but consumes less memory. The default value is: 5
. This can be adjusted with the Helm value:
--set kubecostModel.maxQueryConcurrency=1
Lowering query duration results in Kubecost querying for smaller windows when building ETL data. This can lead to slower ETL build times, but lower memory peaks because of the smaller datasets. The default values is: 1440
This can be tuned with the Helm value:
--set kubecostModel.maxPrometheusQueryDurationMinutes=300
Lowering query resolution will reduce memory consumption but will cause short running pods to be sampled and rounded to the nearest interval for their runtime. The default value is: 300s
. This can be tuned with the Helm value:
--set kubecostModel.etlResolutionSeconds=600
Fewer data points scraped from Prometheus means less data to collect and store, at the cost of Kubecost making estimations that possibly miss spikes of usage or short running pods. The default value is: 60s
. This can be tuned in our Helm values for the Prometheus scrape job.
Node-exporter is optional. Some health alerts will be disabled if node-exporter is disabled, but savings recommendations and core cost allocation will function normally. This can be disabled with the following Helm values:
--set prometheus.server.nodeExporter.enabled=false
--set prometheus.serviceAccounts.nodeExporter.create=false
Optionally enabling impactful memory thresholds can ensure the Go runtime garbage collector throttles at more aggressive frequencies at or approaching the soft limit. There is not a one-size fits all value here, and users looking to tune the parameters should be aware that lower values may reduce overall performance if setting the value too low. If users set the the resources.requests
memory values appropriately, using the same value for softMemoryLimit
will instruct the Go runtime to keep its heap acquisition and release within the same bounds as the expectations of the pod memory use. This can be tuned with the Helm value:
--set kubecostModel.softMemoryLimit=<Units><B, KiB, MiB, GiB>
More info on this environment variable can be found in A Guide to the Go Garbage Collector.