Kubecost diagnostics run a series of tests to determine if resources necessary for accurate cost reporting are available.
You can access the Diagnostics page in the Kubecost UI by selecting Settings from the left navigation, then selecting View Full Diagnostics.
cAdvisor metrics are generated by cAdvisor directly and are required for core application functionality, including Kubecost Allocation and Savings insights.
A limited set of KSM data is required for core application functionality, including Kubecost Allocation and Savings insights.
Kubecost metrics are generated by the
kubecost-cost-analyzerpod and are required for the core application to function. Specifically, these metrics are used for Kubecost Allocations, Assets, and Savings functionality.
Node exporter metrics are used for the following features:
- Reserved Instance Recommendations in Savings
- Showing a compute 'breakdown' on Overview's Resource Efficiency graph, i.e. system vs idle vs user. The Compute bar on this graph will appear as a single solid-colored bar when this diagnostic is failing.
- Various Kubecost Grafana dashboards
kube-state-metrics >= v1.6.0. This version check is completed by verifying the existence of the
kube_persistentvolume_capacity_bytesmetric. If this diagnostic test is failing, we recommend you:
- 1.Confirm kube-state-metrics version requirement is met.
- 2.Verify this, and potentially other, kube-state-metrics metrics are not being dropped with Prometheus relabel rules.
- 3.Determine if no persistent volumes are present in this cluster. If so, you can ignore this diagnostic check.
A diagnostic view is provided for both the Allocation and Assets pipelines and is designed to assist in diagnosing missing data found in the Allocation or Assets views. Kubecost's ETL pipelines run in the background to build a daily composition of the data required to build the cost model. For each day the data is collected, a file is written to disk containing the results. These files are used as both a cache and data backup, which the diagnostic view displays:
ETL Allocation Status with cache files
In the event of a problem, the diagnostic view would help you identify specific days where the ETL pipeline failed to collect data.
ETL Allocation Status with collection failures
The file on
Nov 20, 2020in the above image appears in red. This is because the data in this file has been flagged by our diagnostics page as empty (failed to pass a minimum size threshold). This could happen if the database was temporarily unavailable while building that day.
The ETL pipelines provide a way to repair a specific day in the pipeline using the following URL:
In order to repair the file for the problematic date above (Note it's for Allocation), navigate to the following in a browser:
Previous versions of Kubecost (1.81.0 and prior) provided a similar repair feature under the
/rebuildendpoint by passing a window:
Once cloud integrations have been set up, Each Cloud Store will have its own diagnostic view which will include its provider key in the title. This view will include the Cloud Connection Status and metrics for the Reconciliation and Cloud Asset Processes of that provider including:
- Coverage: The window of time that the historical subprocess has covered
- LastRun: The last time that the process ran, updates each time the periodic subprocess runs
- NextRun: Next scheduled run of the periodic subprocess
- Progress: Ratio of Coverage to Total amount of time to be covered
- RefreshRate: The interval that the periodic subprocess runs
- Resolution: The size of the assets being retrieved
- StartTime: When the Cloud Process was started
Below are the minimum required versions:
node-exporter - v0.16 (May 18)
kube-state-metrics - v1.6.0 (May 19)
cAdvisor - kubelet v1.11.0 (May 18)
Confirm that each pod is in a
Runningstate for the particular metric exporter. For example, you can confirm that a
kube-state-metricspod is Running with the following command:
kubectl get pod -l app.kubernetes.io/name=kube-state-metrics --all-namespaces
You can see this information directly on the Kubecost Diagnostics page (screenshot below) or by visiting your Prometheus console and then Status > Targets in the top navigation bar.
Kubecost Diagnostics page
If the necessary scrape target is not added to your Prometheus, then refer to this resource to learn how to add a new job under your Prometheus
scrape_configsblock. You can visit
<your-prometheus-console-url>/configto view the current scrape_configs block being passed to your Prometheus.
You can see information on recent Prometheus scrape errors directly on the Kubecost Diagnostics page when present or by visiting your Prometheus console and then Status > Targets in the top navigation bar.
If metrics are being collected on a supported version of the desired metrics exporter, the final step is to verify that individual metrics are not being dropped in your Prometheus pipeline. This could be in the form of an add or rule under a drop
metric_relabel_configsblock in your Prometheus .yaml configuration files.