Tuning Resource Consumption

Kubecost can run on clusters with thousands of nodes when resource consumption is properly tuned. Here's a chart with some of the steps you can take to tune Kubecost, along with descriptions of each.

Disable Cloud Costs on secondary clusters

Cloud Costs allow Kubecost to pull in spend data from your integrated cloud service providers.

Cloud cost metrics for all accounts can be pulled in on your primary cluster by pointing Kubecost to one or more management accounts. Therefore, you can disable CloudCost on secondary clusters by setting the following Helm value:

-- cloudCost.enabled=false

Secondary clusters can be configured strictly as metric emitters to save memory. Learn more about how to best configure them in our Secondary Clusters Guide.

Lower query concurrency

Lowering query concurrency for the Kubecost ETL build will mean ETL takes longer to build, but consumes less memory. The default value is: 5. This can be adjusted with the Helm value:

--set kubecostModel.maxQueryConcurrency=1

Lower query duration

Lowering query duration results in Kubecost querying for smaller windows when building ETL data. This can lead to slower ETL build times, but lower memory peaks because of the smaller datasets. The default values is: 1440 This can be tuned with the Helm value:

--set kubecostModel.maxPrometheusQueryDurationMinutes=300

Lower query resolution

Lowering query resolution will reduce memory consumption but will cause short running pods to be sampled and rounded to the nearest interval for their runtime. The default value is: 300s. This can be tuned with the Helm value:

--set kubecostModel.etlResolutionSeconds=600

Lengthen scrape interval

Fewer data points scraped from Prometheus means less data to collect and store, at the cost of Kubecost making estimations that possibly miss spikes of usage or short running pods. The default value is: 60s. This can be tuned in our Helm values for the Prometheus scrape job.

Keep node exporter disabled

The Node Exporter is disabled by default, and is an optional feature. Some health alerts will be disabled if the Node Exporter is disabled, but savings recommendations and core cost allocation will function normally. You can enable the Node Exporter with the following Helm values:

--set prometheus.server.nodeExporter.enabled=true
--set prometheus.serviceAccounts.nodeExporter.create=true

Soft memory limit field

Optionally enabling impactful memory thresholds can ensure the Go runtime garbage collector throttles at more aggressive frequencies at or approaching the soft limit. There is not a one-size fits all value here, and users looking to tune the parameters should be aware that lower values may reduce overall performance if setting the value too low. If users set the the resources.requests memory values appropriately, using the same value for softMemoryLimit will instruct the Go runtime to keep its heap acquisition and release within the same bounds as the expectations of the pod memory use. This can be tuned with the Helm value:

--set kubecostModel.softMemoryLimit=<Units><B, KiB, MiB, GiB>

More info on this environment variable can be found in A Guide to the Go Garbage Collector.

Last updated