Links

Allocation API

Note: Throughout our API documentation, we use localhost:9090 as the default Kubecost URL, but your Kubecost instance may be exposed by a service or ingress. To reach Kubecost at port 9090, run: kubectl port-forward deployment/kubecost-cost-analyzer -n kubecost 9090. When querying the cost-model container directly (ex. localhost:9003), the /model part of the URI should be removed.
get
http://<your-kubecost-address>/model
/allocation
Allocation API

Allocation schema

Field
Description
name
Name of each relevant Kubernetes concept described by the allocation, delimited by slashes, e.g. "cluster/node/namespace/pod/container"
properties
Map of name-to-value for all relevant property fields, including: cluster, node, namespace, controller, controllerKind, pod, container, labels, annotation, etc. Note: Prometheus only supports underscores (_) in label names. Dashes (-) and dots (.), while supported by Kubernetes, will be translated to underscores by Prometheus. This may cause the merging of labels, which could result in aggregated costs being charged to a single label.
window
Period of time over which the allocation is defined.
start
Precise starting time of the allocation. By definition must be within the window.
end
Precise ending time of the allocation. By definition must be within the window.
minutes
Number of minutes running; i.e. the minutes from start until end.
cpuCores
Average number of CPU cores allocated while running.
cpuCoreRequestAverage
Average number of CPU cores requested while running.
cpuCoreUsageAverage
Average number of CPU cores used while running.
cpuCoreHours
Cumulative CPU core-hours allocated.
cpuCost
Cumulative cost of allocated CPU core-hours.
cpuCostAdjustment
Change in cost after allocated CPUs have been reconciled with updated node cost
cpuEfficiency
Ratio of cpuCoreUsageAverage-to-cpuCoreRequestAverage, meant to represent the fraction of requested resources that were used.
gpuCount
Number of GPUs allocated to the workload.
gpuHours
Cumulative GPU-hours allocated.
gpuCost
Cumulative cost of allocated GPU-hours.
gpuCostAdjustment
Change in cost after allocated GPUs have been reconciled with updated node cost
networkTransferBytes
Total bytes sent from the workload
networkReceiveBytes
Total bytes received by the workload
networkCost
Cumulative cost of network usage.
networkCrossZoneCost
Cumulative cost of Cross-zone network egress usage.
networkCrossRegionCost
Cumulative cost of Cross-region network egress usage.
networkInternetCost
Cumulative cost of internet egress usage.
networkCostAdjustment
Updated network cost
loadBalancerCost
Cumulative cost of allocated load balancers.
loadBalancerCostAdjustment
Updated load balancer cost.
pvBytes
Average number of bytes of PersistentVolumes allocated while running.
pvByteHours
Cumulative PersistentVolume byte-hours allocated.
pvCost
Cumulative cost of allocated PersistentVolume byte-hours.
pvs
Map of PersistentVolumeClaim costs that have been allocated to the workload
pvCostAdjustment
Updated persistent volume cost.
ramBytes
Average number of RAM bytes allocated. An allocated resource is the source of cost, according to Kubecost - regardless of if a requested resource is used.
ramByteRequestAverage
Average of the RAM requested by the workload. Requests are a Kubernetes tool for preallocating/reserving resources for a given container.
ramByteUsageAverage
Average of the RAM used by the workload. This comes from moment-to-moment measurements of live RAM byte usage of each container. This is roughly the number you see under RAM if you pull up Task Manager (Windows), top on Linux, or Activity Monitor (MacOS).
ramByteHours
Cumulative RAM byte-hours allocated.
ramCost
Cumulative cost of allocated RAM byte-hours.
ramEfficiency
Ratio of ramByteUsageAverage-to-ramByteRequestAverage, meant to represent the fraction of requested resources that were used.
sharedCost
Cumulative cost of shared resources, including shared namespaces, shared labels, shared overhead.
externalCost
Cumulative cost of external resources.
totalCost
Total cumulative cost
totalEfficiency
Cost-weighted average of cpuEfficiency and ramEfficiency. In equation form: ((cpuEfficiency * cpuCost) + (ramEfficiency * ramCost)) / (cpuCost + ramCost)
rawAllocationOnly
Object with fields cpuCoreUsageMax and ramByteUsageMax, which are the maximum usages in the window for the Allocation. If the Allocation query is aggregated or accumulated, this object will be null because the meaning of maximum is ambiguous in these situations. Consider aggregating by namespace: should the maximum be the maximum of each Allocation individually, or the maximum combined usage of all Allocations (at any point in time in the window) in the namespace?

Quick start

Request allocation data for each 24-hour period in the last three days, aggregated by namespace:
Request
Response
$ curl http://localhost:9090/model/allocation \
-d window=3d \
-d aggregate=namespace \
-d accumulate=false \
-d shareIdle=false \
-G
{
"code": 200,
"data": [
{
"__idle__": { ... },
"default": { ... },
"kube-system": { ... },
"kubecost": { ... }
},
{
"__idle__": { ... },
"default": { ... },
"kube-system": { ... },
"kubecost": { ... }
},
{
"__idle__": { ... },
"default": { ... },
"kube-system": { ... },
"kubecost": { ... }
},
{
"__idle__": { ... },
"default": { ... },
"kube-system": { ... },
"kubecost": { ... }
}
]
}
Note: Querying for window=3d will likely return a range of four sets because the queried range will overlap with four precomputed 24-hour sets, each aligned to the configured time zone. For example, querying window=3d on 2021/01/04T12:00:00 will return:
  • 2021/01/04 00:00:00 until 2021/01/04T12:00:00 (now)
  • 2021/01/03 00:00:00 until 2021/01/04 00:00:00
  • 2021/01/02 00:00:00 until 2021/01/03 00:00:00
  • 2021/01/01 00:00:00 until 2021/01/02 00:00:00
See Querying for the full list of arguments and Examples for more example queries.

Special types of allocation

  • __idle__ refers to resources on a cluster that were not dedicated to a Kubernetes object (e.g. unused CPU core-hours on a node). An idle resource can be shared (proportionally or evenly) with the other allocations from the same cluster. (See the argument shareIdle.)
  • __unallocated__ refers to aggregated allocations without the selected aggregate field; e.g. aggregating by label:app might produce an __unallocated__ allocation composed of allocations without the app label.
  • __unmounted__ (or "Unmounted PVs") refers to the resources used by PersistentVolumes that aren't mounted to a pod using a PVC, and thus cannot be allocated to a pod.

Query examples

Allocation data for today unaggregated:
Request
Response
$ curl http://localhost:9090/model/allocation \
-d window=today \
-G
{
"code": 200,
"data": [
{
"__idle__": { ... },
"cluster-one/gke-niko-pool-2-9182dfa7-okb2/kubecost/kubecost-cost-analyzer-94dc86fc-lwvrm/cost-model": { ... },
"cluster-one/gke-niko-pool-2-9182dfa7-okb2/kubecost/kubecost-cost-analyzer-94dc86fc-lwvrm/cost-analyzer-frontend": { ... },
"cluster-one/gke-niko-pool-2-9182dfa7-okb2/kubecost/kubecost-grafana-6df5cc66b6-dzszt/grafana": { ... }
}
]
}
Allocation data for last week, per day, aggregated by cluster:
Request
Response
$ curl http://localhost:9090/model/allocation \
-d window=lastweek \
-d aggregate=cluster \
-G
{
"code": 200,
"data": [
{
"__idle__": { ... },
"cluster-one": { ... },
"cluster-two": { ... }
},
{
"__idle__": { ... },
"cluster-one": { ... },
"cluster-two": { ... }
},
{
"__idle__": { ... },
"cluster-one": { ... },
"cluster-two": { ... }
},
{
"__idle__": { ... },
"cluster-one": { ... },
"cluster-two": { ... }
},
{
"__idle__": { ... },
"cluster-one": { ... },
"cluster-two": { ... }
},
{
"__idle__": { ... },
"cluster-one": { ... },
"cluster-two": { ... }
},
{
"__idle__": { ... },
"cluster-one": { ... },
"cluster-two": { ... }
}
]
}
Allocation data for the last 30 days, aggregated by the "app" label, sharing idle allocation, sharing allocations from two namespaces, sharing $100/mo in overhead, and accumulated into one allocation for the entire window:
Request
Response
$ curl http://localhost:9090/model/allocation \
-d window=30d \
-d aggregate=label:app \
-d accumulate=true \
-d shareIdle=weighted \
-d shareNamespaces=kube-system,kubecost \
-d shareCost=100 \
-G
{
"code": 200,
"data": [
{
"__unallocated__": { ... },
"app=redis": { ... },
"app=cost-analyzer": { ... },
"app=prometheus": { ... },
"app=grafana": { ... },
"app=nginx": { ... },
"app=helm": { ... }
}
]
}
Allocation data for 2021-03-10T00:00:00 to 2021-03-11T00:00:00 (i.e. 24h), multi-aggregated by namespace and the "app" label, filtering by properties.cluster == "cluster-one", and accumulated into one allocation for the entire window.
Request
Response
$ curl http://localhost:9090/model/allocation \
-d window=2021-03-10T00:00:00Z,2021-03-11T00:00:00Z \
-d aggregate=namespace,label:app \
-d accumulate=true \
-d filterClusters=cluster-one \
-G
{
"code": 200,
"data": [
{
"default/app=redis": { ... },
"kubecost/app=cost-analyzer": { ... },
"kubecost/app=prometheus": { ... },
"kubecost/app=grafana": { ... },
"kubecost/app=prometheus": { ... },
"kube-system/app=helm": { ... }
}
]
}
Allocation data for today, aggregated by annotation. See Enabling Annotation Emission to enable annotations.
Request
Response
$ curl http://localhost:9090/model/allocation \
-d window=today \
-d aggregate=annotation:my_annotation \
-G
{
"code": 200,
"data": [
{
"__unallocated__": { ... },
"my_annotation=foo": { ... },
"my_annotation=bar": { ... }
}
]
}

Allocation of asset costs

Both the reconcile and shareTenancyCosts flags start processes that distribute the costs of Assets to Allocations related to them. For the reconcile flag, these connections can be straightforward like the connection between a node Asset and an Allocation where the CPU, GPU, and RAM usage can be used to distribute a proportion of the node's cost to the Allocations that run on it. For Assets and Allocations where the connection is less well-defined, such as network Assets we have opted for a method of distributing the cost that we call Distribution by Usage Hours.
Distribution by Usage Hours takes the usage of the windows (start time and end time) of an Asset and all the Allocations connected to it and finds the number of hours that both the Allocation and Asset were running. The number of hours for each Allocation related to an Asset is called Alloc_Usage_Hours. The sum of all Alloc_Usage_Hours for a single Assets is Total_Usage_Hours. With these values, an Assets cost is distributed to each connected Allocation using the formula Asset_Cost * Alloc_Usage_Hours/Total_Usage_Hours. Depending on the Asset type an Allocation can receive proportions of multiple Asset Costs.
Asset types that use this distribution method include:
  • Network (reconcile): When the network pod is not enabled cost is distributed by usage hours. If the network pod is enabled cost is distributed to Allocations proportionally to usage.
  • Load Balancer (reconcile)
  • Cluster Management (shareTenancyCosts)
  • Attached disks (shareTenancyCosts): Does not include PVs, which are handled by reconcile

Querying on-demand (experimental)

Warning: Querying on-demand with high resolution for long windows can cause serious Prometheus performance issues, including OOM errors. Start with short windows (1d or less) and proceed with caution.
Computing allocation data on-demand allows for greater flexibility with respect to step size and accuracy-versus-performance. (See resolution and error bounds for details.) Unlike the standard endpoint, which can only serve results from precomputed sets with predefined step sizes (e.g. 24h aligned to the UTC time zone), asking for a "7d" query will almost certainly result in 8 sets, including "today" and the final set, which might span 6.5d-7.5d ago. With this endpoint, however, you will be computing everything on-demand, so "7d" will return exactly seven days of data, starting at the moment the query is received. (You can still use window keywords like "today" and "lastweek", of course, which should align perfectly with the same queries of the standard ETL-driven endpoint.)
Additionally, unlike the standard endpoint, querying on-demand will not use reconciled asset costs. Therefore, the results returned will show all adjustments (e.g. CPU, GPU, RAM) to be 0.
get
http://<kubecost>/model
/allocation/compute
Allocation On-Demand API

On-demand query examples

Allocation data for the last 60m, in steps of 10m, with resolution 1m, aggregated by cluster.
Request
Response
$ curl http://localhost:9090/model/allocation/compute \
-d window=60m \
-d step=10m \
-d resolution=1m \
-d aggregate=cluster \
-d accumulate=false \
-G
{
"code": 200,
"data": [
{
"cluster-one": { ... },
"cluster-two": { ... }
},
{
"cluster-one": { ... },
"cluster-two": { ... }
},
{
"cluster-one": { ... },
"cluster-two": { ... }
}
]
}
Allocation data for the last 9d, in steps of 3d, with a 10m resolution, aggregated by namespace.
Request
Response
$ curl http://localhost:9090/model/allocation/compute \
-d window=9d \
-d step=3d \
-d resolution=10m
-d aggregate=namespace \
-d accumulate=false \
-G
{
"code": 200,
"data": [
{
"default": { ... },
"kubecost": { ... },
"kube-system": { ... }
},
{
"default": { ... },
"kubecost": { ... },
"kube-system": { ... }
},
{
"default": { ... },
"kubecost": { ... },
"kube-system": { ... }
}
]
}

Theoretical error bounds

Tuning the resolution parameter allows the querier to make tradeoffs between accuracy and performance. For long-running pods (>1d) resolution can be tuned aggressively low (>10m) with relatively little effect on accuracy. However, even modestly low resolutions (5m) can result in significant accuracy degradation for short-running pods (<1h).
Here, we provide theoretical error bounds for different resolution values given pods of differing running durations. The tuple represents lower- and upper-bounds for accuracy as a percentage of the actual value. For example:
  • 1.00, 1.00 means that results should always be accurate to less than 0.5% error
  • 0.83, 1.00 means that results should never be high by more than 0.5% error, but could be low by as much as 17% error
  • -1.00, 10.00 means that the result could be as high as 1000% error (e.g. 30s pod being counted for 5m) or the pod could be missed altogether, i.e. -100% error.
resolution
30s pod
5m pod
1h pod
1d pod
7d pod
1m
-1.00, 2.00
0.80, 1.00
0.98, 1.00
1.00, 1.00
1.00, 1.00
2m
-1.00, 4.00
0.80, 1.20
0.97, 1.00
1.00, 1.00
1.00, 1.00
5m
-1.00, 10.00
-1.00, 1.00
0.92, 1.00
1.00, 1.00
1.00, 1.00
10m
-1.00, 20.00
-1.00, 2.00
0.83, 1.00
0.99, 1.00
1.00, 1.00
30m
-1.00, 60.00
-1.00, 6.00
0.50, 1.00
0.98, 1.00
1.00, 1.00
60m
-1.00, 120.00
-1.00, 12.00
-1.00, 1.00
0.96, 1.00
0.99, 1.00