Kubecost Metrics
Kubecost Cost Model
The Cost Model both exports and consumes the following metrics.
Metric | Description |
---|---|
| Hourly cost per vCPU on this node |
| Hourly cost per GPU on this node |
| Hourly cost per Gb of memory on this node |
| Total node cost per hour |
| Hourly cost of a load balancer |
| Hourly cost paid as a cluster management fee |
| Hourly cost per Gb on a persistent volume |
| Number of GPUs available on node |
| Average number of CPUs requested/used over last 1m |
| Average number of GPUs requested over last 1m |
| Average bytes of RAM requested/used over last 1m |
| Bytes provisioned for a PVC attached to a pod |
| Cloud provider info about node preemptibility |
| Total cost per GB egress across zones |
| Total cost per GB egress across regions |
| Total cost per GB of internet egress |
| Service Selector Labels |
| Deployment Match Labels |
| StatefulSet Match Labels |
| (Created by recording rule) |
Kubecost Network Costs
The Kubecost network-costs DaemonSet collects node network data and exports the egress, ingress, and performance statistics.
Metric | Description |
---|---|
| egressed byte counts by pod |
| ingressed byte counts by pod |
| total parsed conntrack entries |
| total time in milliseconds it took to parse conntrack entries |
cAdvisor
cAdvisor (Container Advisor) provides container users an understanding of the resource usage and performance characteristics of their running containers. It is a running daemon that collects, aggregates, processes, and exports information about running containers.
GitHub: https://github.com/google/cadvisor
Metric | Description |
---|---|
| Current memory usage, including all memory regardless of when it was accessed |
| Number of bytes that can be consumed by the container on this filesystem |
| Number of bytes that are consumed by the container on this filesystem |
| Current working set |
| Cumulative count of bytes received |
| Cumulative count of bytes transmitted |
| Cumulative cpu time consumed |
| Number of elapsed enforcement period intervals |
| Number of throttled period intervals |
Kube-State-Metrics (KSM)
Although the default Kubecost installation does not include a KSM deployment, Kubecost does calculate & emit the below metrics. The below metrics and labels follow conventions of KSMv1, not KSMv2.
Metric | Description |
---|---|
| Number of pods specified for a Deployment |
| Number of pods currently available for a Deployment |
| The number of pods which reached Phase Failed and the reason for failure |
| Kubernetes annotations converted to Prometheus labels |
| Kubernetes labels converted to Prometheus labels |
| Kubernetes labels converted to Prometheus labels |
| The allocatable for different resources of a node that are available for scheduling |
| Total allocatable cpu cores of the node (Deprecated in ksm 2.0.0) |
| Total allocatable memory bytes of the node (Deprecated in ksm 2.0.0) |
| The capacity for different resources of a node |
| Total cpu cores available on the the node (Deprecated in ksm 2.0.0) |
| Total memory available on the node (bytes) (Deprecated in ksm 2.0.0) |
| The condition of a cluster node |
| Total capacity of a persistent volume (bytes) |
| Status of a persistent volume (Bound |
| Information about persistent volume claim |
| The capacity of storage requested by the persistent volume claim |
| Kubernetes annotations converted to Prometheus labels |
| The number of requested limit resource by a container |
| Limit on CPU cores that can be used by the container. (Deprecated in ksm 2.0.0) |
| Limit on the amount of memory that can be used by the container. (Deprecated in ksm 2.0.0) |
| The number of requested request resource by a container |
| The number of container restarts per container |
| Describes whether the container is currently in running state |
| Describes the reason the container is currently in terminated state |
| Kubernetes labels converted to Prometheus labels |
| Information about the Pod's owner |
| The pods current phase (Pending |
| Information about the ReplicaSet's owner |
Node exporter
Prometheus exporter for hardware and OS metrics exposed by *NIX kernels, written in Go with pluggable metric collectors.
The Node Exporter is disabled by default. You can enable it with the flags:
GitHub: https://github.com/prometheus/node_exporter
Metric | Description |
---|---|
| Seconds the cpus spent in each mode |
| The total number of reads completed successfully |
| The total number of reads completed successfully |
| The total number of writes completed successfully |
| The total number of writes completed successfully |
| Whether an error occurred while getting statistics for the given device |
| Memory information field Buffers_bytes |
| Memory information field Cached_bytes |
| Memory information field MemAvailable_bytes |
| Memory information field MemFree_bytes |
| Memory information field MemTotal_bytes |
| Network device statistic transmit_bytes |
Prometheus
Prometheus emits metrics which are used by Kubecost for diagnostic purposes:
Metric | Description |
---|---|
| Scrape target status |
| Amount of time between target scrapes |
NVIDIA GPUs
NVIDIA GPU monitoring support can be explained in more detail in the Kubecost Docs: NVIDIA GPU Monitoring Configurations and on the Kubecost Blog: Monitoring NVIDIA GPU Usage in Kubernetes with Prometheus. Monitoring of NVIDIA GPUs requires DCGM Exporter. While all metrics exposed by DCGM Exporter are collected, the following metrics are the ones currently used by Kubecost:
Metric | Description |
---|---|
| GPU utilization |
Last updated