Kubecost Diagnostics

Kubecost Diagnostics

Kubecost diagnostics run a series of tests to determine if resources necessary for accurate cost reporting are available.

You can access the Diagnostics page in the Kubecost UI by selecting Settings from the left navigation, then selecting View Full Diagnostics.

cAdvisor metrics available

cAdvisor metrics are generated by cAdvisor directly and are required for core application functionality, including Kubecost Allocation and Savings insights.

Kube-state-metrics (KSM) metrics available

A limited set of KSM data is required for core application functionality, including Kubecost Allocation and Savings insights.

Kubecost metrics available

Kubecost metrics are generated by the kubecost-cost-analyzer pod and are required for the core application to function. Specifically, these metrics are used for Kubecost Allocations, Assets, and Savings functionality.

Node-exporter metrics available

Node exporter metrics are used for the following features:

  • Reserved Instance Recommendations in Savings

  • Showing a compute 'breakdown' on Overview's Resource Efficiency graph, i.e. system vs idle vs user. The Compute bar on this graph will appear as a single solid-colored bar when this diagnostic is failing.

  • Various Kubecost Grafana dashboards

These metrics are not used in the core Assets and Allocation and therefore can be considered optional. Learn how to disable here.

If any of the above diagnostic tests fail, view the How to Troubleshoot Missing Metrics section below.

Expected kube-state-metrics version found

Kubecost requests kube-state-metrics >= v1.6.0. This version check is completed by verifying the existence of the kube_persistentvolume_capacity_bytes metric. If this diagnostic test is failing, we recommend you:

  1. Confirm kube-state-metrics version requirement is met.

  2. Verify this, and potentially other, kube-state-metrics metrics are not being dropped with Prometheus relabel rules.

  3. Determine if no persistent volumes are present in this cluster. If so, you can ignore this diagnostic check.

Kubecost ETL pipeline metrics

A diagnostic view is provided for both the Allocation and Assets pipelines and is designed to assist in diagnosing missing data found in the Allocation or Assets views. Kubecost's ETL pipelines run in the background to build a daily composition of the data required to build the cost model. For each day the data is collected, a file is written to disk containing the results. These files are used as both a cache and data backup, which the diagnostic view displays:

In the event of a problem, the diagnostic view would help you identify specific days where the ETL pipeline failed to collect data.

The file on Nov 20, 2020 in the above image appears in red. This is because the data in this file has been flagged by our diagnostics page as empty (failed to pass a minimum size threshold). This could happen if the database was temporarily unavailable while building that day.

The ETL pipelines provide a way to repair a specific day in the pipeline using the following URL:

http://<kubecost-url>:<port>/model/etl/[allocation|asset]/repair?window=<RFC3339-start>,<RFC3339-end>

In order to repair the file for the problematic date above (Note it's for Allocation), navigate to the following in a browser:

http://<kubecost-url>:<port>/model/etl/allocation/repair?window=2020-11-20T00:00:00Z,2020-11-21T00:00:00Z

Previous versions of Kubecost (1.81.0 and prior) provided a similar repair feature under the /rebuild endpoint by passing a window:

http://<kubecost-url>:<port>/model/etl/[allocation|assets]/rebuild?window=<RFC3339-start>,<RFC3339-end>&commit=true

Kubecost ETL pipeline cloud metrics

Once cloud integrations have been set up, Each Cloud Store will have its own diagnostic view which will include its provider key in the title. This view will include the Cloud Connection Status and metrics for the Reconciliation and Cloud Asset Processes of that provider including:

  • Coverage: The window of time that the historical subprocess has covered

  • LastRun: The last time that the process ran, updates each time the periodic subprocess runs

  • NextRun: Next scheduled run of the periodic subprocess

  • Progress: Ratio of Coverage to Total amount of time to be covered

  • RefreshRate: The interval that the periodic subprocess runs

  • Resolution: The size of the assets being retrieved

  • StartTime: When the Cloud Process was started

For more information about Cloud Integration and related APIs, read the cloud-integration documentation.

Troubleshooting missing metrics

Step 1: Confirm you are running the correct version of the metric exporter

Below are the minimum required versions:

node-exporter - v0.16 (May 18)
kube-state-metrics - v1.6.0 (May 19)
cAdvisor - kubelet v1.11.0 (May 18)

Step 2: Confirm pod(s) are currently running

Confirm that each pod is in a Running state for the particular metric exporter.

Step 3: Confirm the required Prometheus Targets are available

You can see this information directly on the Kubecost Diagnostics page (screenshot below) or by visiting your Prometheus console and then Status > Targets in the top navigation bar.

If the necessary scrape target is not added to your Prometheus, then refer to this resource to learn how to add a new job under your Prometheus scrape_configs block. You can visit <your-prometheus-console-url>/config to view the current scrape_configs block being passed to your Prometheus.

Step 4: No recent Prometheus scrape errors

You can see information on recent Prometheus scrape errors directly on the Kubecost Diagnostics page when present or by visiting your Prometheus console and then Status > Targets in the top navigation bar.

Contact support@kubecost.com or send a message in our Slack workspace if you encounter an error that you do not recognize.

Step 5: Metrics not being dropped with Prometheus relabel rules

If metrics are being collected on a supported version of the desired metrics exporter, the final step is to verify that individual metrics are not being dropped in your Prometheus pipeline. This could be in the form of an add or rule under a drop metric_relabel_configs block in your Prometheus .yaml configuration files.

Last updated