Kubecost's extract, transform, load (ETL) process is a computed cache built upon Prometheus metrics and cloud billing data, from which nearly all API requests made by the user and the Kubecost frontend currently rely upon.
The ETL data is stored in a PersistentVolume mounted to the kubecost-cost-analyzer pod. In most multicluster environments, the ETL data is pushed to object storage. In the event that you lose or are looking to rebuild the ETL data, the following endpoints should be used.
1. Repair Asset ETL
The Asset ETL builds upon the Prometheus metrics listed here. It's important to ensure that you are able to query for Prometheus or Thanos data for the specified window you use. Otherwise, an absence of metrics will result in an empty ETL.
If the window parameter is within .Values.kubecostModel.etlHourlyStoreDurationHours, this endpoint will repair both the daily [1d] and hourly [1h] Asset ETL.
# Repair: /model/etl/asset/repair?window=$curl"https://kubecost.your.com/model/etl/asset/repair?window=2023-01-01T00:00:00Z,2023-01-04T00:00:00Z"{"code":200,"data":"Repairing Asset ETL"}# Check logs to watch this job run until completion$kubectllogsdeploy/kubecost-cost-analyzer|grep"Asset\[1d\]"INFETL:Asset[1d]:ETLStore.Repair[cfDKJ]:repairing2023-01-0100:00:00+0000UTC,2023-01-0400:00:00+0000UTCINF ETL: Asset[1d]: AggregatedStore.Run[fvkKR]: run: aggregated [2023-01-01T00:00:00+0000, 2023-01-02T00:00:00+0000) from 19 to 3 in 68.417µs
INF ETL: Asset[1d]: AggregatedStore.Run[fvkKR]: run: aggregated [2023-01-02T00:00:00+0000, 2023-01-03T00:00:00+0000) from 19 to 3 in 68.417µs
INF ETL: Asset[1d]: AggregatedStore.Run[fvkKR]: run: aggregated [2023-01-03T00:00:00+0000, 2023-01-04T00:00:00+0000) from 19 to 3 in 68.417µs
2. Repair Allocation ETL
The Allocation ETL builds upon all previous Asset data to compute cost and resource allocations for Kubernetes entities. Read our Kubecost Diagnostics doc for more info.
If the window parameter is within .Values.kubecostModel.etlHourlyStoreDurationHours, this endpoint will repair both the daily [1d] and hourly [1h] Allocation ETL.
# Repair: /model/etl/allocation/repair?window=$curl"https://kubecost.your.com/model/etl/allocation/repair?window=2023-01-01T00:00:00Z,2023-01-04T00:00:00Z"{"code":200,"data":"Repairing Allocation ETL"}# Check logs to watch this job run until completion$kubectllogsdeploy/kubecost-cost-analyzer|grep"Allocation\[1d\]"INFETL:Allocation[1d]:ETLStore.Repair[lSGre]:repairing2023-01-0100:00:00+0000UTC,2023-01-0400:00:00+0000UTCINF Allocation[1d]: AggregatedStoreDriver[hvfrl]: run: aggregated [2023-01-01T00:00:00+0000, 2023-01-02T00:00:00+0000) from 283 to 70 in 4.917963ms
INF Allocation[1d]: AggregatedStoreDriver[hvfrl]: run: aggregated [2023-01-02T00:00:00+0000, 2023-01-03T00:00:00+0000) from 130 to 62 in 983.216µs
INF Allocation[1d]: AggregatedStoreDriver[hvfrl]: run: aggregated [2023-01-03T00:00:00+0000, 2023-01-04T00:00:00+0000) from 130 to 62 in 1.462092ms
3. Repair CloudCost ETL
The CloudCost ETL pulls information from your cloud billing integration. Ensure it's been configured properly, otherwise, no data will be retrieved. Review our Cloud Billing Integrations doc for more info.
# Repair: /model/cloudCost/repair?window=$curl"https://kubecost.your.com/model/cloudCost/repair?window=2023-01-01T00:00:00Z,2023-01-04T00:00:00Z"{"code":200,"data":"Rebuilding Cloud Usage For All Providers"}# Check logs to watch this job run until completion$kubectllogsdeploy/kubecost-cost-analyzer|grepCloudCost
By default, CloudUsage ETL is disabled (.Values.kubecostModel.etlCloudUsage, .Values.kubecostModel.etlCloudAsset). If you are using CloudUsage ETL, use the commands below:
# Repair: /model/etl/cloudUsage/repair?window=$curl"https://kubecost.your.com/model/etl/cloudUsage/repair?window=2023-01-01T00:00:00Z,2023-01-04T00:00:00Z"{"code":200,"data":"Cloud Usage Repair process has begun for [2023-01-01T00:00:00+0000, 2023-01-04T00:00:00+0000) for all providers"}
# Check logs to watch this job run until completion$kubectllogsdeploy/kubecost-cost-analyzer|grepCloudUsage
4. Run Reconciliation Pipeline
The Reconciliation Pipeline reconciles the existing ETL with the newly gathered data in the CloudUsage ETL, further ensuring parity between Kubecost and your cloud bill.
Reconciliation repairs should be automatically triggered after a CloudCost repair.
# Repair: /model/etl/asset/reconciliation/repair?window=$ curl "https://kubecost.your.com/model/etl/asset/reconciliation/repair?window=2023-01-01T00:00:00Z,2023-01-04T00:00:00Z"
{"code":200,"data":"Reconciliation Repair process has begun for [2023-01-01T00:00:00+0000, 2023-01-04T00:00:00+0000) for all providers"}
# Check logs to watch this job run until completion$kubectllogsdeploy/kubecost-cost-analyzer|grepReconciliation
Repairs in Federated ETL environments
In a Federated ETL environment, each individual Kubecost deployment builds its own ETL data before pushing it to the bucket. Therefore the repair commands above must be run on each affected cluster.
After a repair has been completed on any cluster in your environment, the Kubecost Federator will detect the new data, re-federate the ETL data, then place the merged data into the /federated/combined directory in the bucket. The Federator runs 5 minutes after startup, and every 30 minutes afterwards.
In this doc, we reference repairs using the https://kubecost.your.com URL. If that is not accessible to you, you can instead port-forward to the kubecost-cost-analyzer pod and use localhost.
Verify that Prometheus metrics exist consistently during the time window you wish to repair
For installs using Prometheus verify retention is long enough to meet the requested repair window. By default .Values.prometheus.server.retention is set to 15 days.