Kubecost Aggregator
Aggregator is the primary query backend for Kubecost. It is enabled in all configurations of Kubecost. In a default installation, it runs within the cost-analyzer pod, but in a multi-cluster installation of Kubecost, some settings must be changed. Multi-cluster Kubecost uses the Federated ETL configuration without Thanos (replacing the Federator component).
Existing documentation for Kubecost APIs will use endpoints for non-Aggregator environments unless otherwise specified, but will still be compatible after configuring Aggregator.
Configuring Aggregator
Prerequisites
Multi-cluster Aggregator can only be configured in a Federated ETL environment
All clusters in your Federated ETL environment must be configured to build & push ETL files to the object store via
.Values.federatedETL.federatedCluster
and.Values.kubecostModel.federatedStorageConfigSecret
. See our Federated ETL doc for more details.If you've enabled Cloud Integration, it must be configured via the cloud integration secret. Other methods are now deprecated. See our Multi-Cloud Integrations doc for more details.
This documentation is for Kubecost v2 and higher.
If you are upgrading to Kubecost v2 from the following environments, see our specialized migration guides instead:
Basic configuration
This configuration is estimated to be sufficient for environments monitoring < 20k unique containers or $50k cloud spend per day. You can check this metric on the /diagnostics
page.
Aggregator Optimizations
For larger deployments of Kubecost, Aggregator can be tuned. The settings below are in addition to the basic configuration above.
This configuration is estimated to be sufficient for environments monitoring < 60k unique containers per day. You can check this metric on the /diagnostics
page.
Aggregator is a memory and disk-intensive process. Ensure that your cluster has enough resources to support the configuration below.
Because the Aggregator PV is relatively small, the least expensive performance gain will be to move the storage class to a faster SSD. The storageClass name varies by provider, the terms used are gp3/extreme/premium/etc.
Running the upgrade
If you have not already, create the required Kubernetes secrets. Refer to the Federated ETL doc and Cloud Integration doc for more details.
Finally, upgrade your existing Kubecost installation. This command will install Kubecost if it does not already exist.
If you are upgrading from an existing installation, make sure to append your existing values.yaml
configurations to the ones described above.
Validating Aggregator pod is running successfully
When first enabled, the aggregator pod will ingest the last 90 days (if applicable) of ETL data from the federated-store. Because the combined folder is ignored, the legacy Federator pod is not used here, but can still run if needed. As ETL_DAILY_STORE_DURATION_DAYS
increases, the amount of time it will take for Aggregator to make data available will increase. You can run kubectl get pods
and ensure the aggregator
pod is running, but should still wait for all data to be ingested.
Troubleshooting Aggregator
Understanding the state of the Aggregator
This is a common endpoint for debugging the state of the Aggregator. It returns a JSON response with details such as:
What is Aggregator's current state? If ingesting, it is downloading & processing ETL files into the DB. If deriving it is pre-computing commonly used queries into saved tables.
What is the ingestion progress for each of the data types? (e.g. asset, allocation, cloud cost, etc.)
How fresh is my read database? An epoch timestamp can be found in the
readDBPath
.How frequently is my newly ingested data being promoted into the read database? Reference
currentBucketRefreshInterval
.
Resetting Aggregator StatefulSet data
When deploying the Aggregator as a StatefulSet, it is possible to perform a reset of the Aggregator data. The Aggregator itself doesn't store any data, and relies on object storage. As such, a reset involves removing that Aggregator's local storage, and allowing it to re-ingest data from the object store. The procedure is as follows:
Scale down the Aggregator StatefulSet to 0
When the Aggregator pod is gone, delete the
aggregator-db-storage-xxx-0
PVCScale the Aggregator StatefulSet back to 1. This will re-create the PVC, empty.
Wait for Kubecost to re-ingest data from the object store. This could take from several minutes to several hours, depending on your data size and retention settings.
Checking the database for node metadata
Confirming whether node metadata exists in your database can be useful when troubleshooting missing data. Run the following command which will open a shell into the Aggregator pod:
Point to the path where your database exists
Copy the database to a new file for testing to avoid modifications to the original data
Open a DuckDB REPL pointed at the copied database
Run the following debugging queries to check if node data is available:
Last updated