Network Traffic Cost Allocation
This document summarizes Kubecost network cost allocation, how to enable it, and what it provides.
When this feature is enabled, Kubecost gathers network traffic metrics in combination with provider-specific network costs to provide insight on network data sources as well as the aggregate costs of transfers.
Metrics include egress and ingress data transfers by pod and are classified as internet, cross-region and cross-zone.
If using the included Prometheus instance, the scrape is automatically configured.
If you are integrating with an existing Prometheus, you can set
networkCosts.prometheusScrape=trueand the network costs service should be auto-discovered.
Note: Network cost, which is disabled by default, needs to be run as a privileged pod to access the relevant networking kernel module on the host machine.
In order to reduce resource usage, Kubecost recommends setting a CPU limit on the network-costs daemonset. This will cause a few seconds delay during peak usage and does not effect overall accuracy. This is done by default in Kubecost 1.99+.
For existing deployments, these are the recommended values:
The network-simulator was used to real-time simulate updating conntrack entries while simultaneously running a cluster simulated network-costs instance. To profile the heap, after a warmup of roughly five minutes, a heap profile of 1,000,000 conntrack entries was gathered and examined.
Each conntrack entry is equivalent to two transport directions, so every conntrack entry is two map entries (connections).
After modifications were made to the network-costs to parallelize the delta and dispatch, large map comparisons were significantly lighter in memory. The same tests were performed against simulated data with the following footprint results.
The primary source of network metrics is a DaemonSet Pod hosted on each of the nodes in a cluster. Each daemonset pod uses
hostNetwork: truesuch that it can leverage an underlying kernel module to capture network data. Network traffic data is gathered and the destination of any outbound networking is labeled as:
- Internet Egress: Network target destination was not identified within the cluster.
- Cross Region Egress: Network target destination was identified, but not in the same provider region.
- Cross Zone Egress: Network target destination was identified, and was part of the same region but not the same zone.
These classifications are important because they correlate with network costing models for most cloud providers. To see more detail on these metric classifications, you can view pod logs with the following command:
kubectl logs kubecost-network-costs-<pod-identifier> -n kubecost
This will show you the top source and destination IP addresses and bytes transferred on the node where this Pod is running. To disable logs, you can set the helm value
For traffic routed to addresses outside of your cluster but inside your VPC, Kubecost supports the ability to directly classify network traffic to a particular IP address or CIDR block. This feature can be configured in your values.yaml under
networkCosts.config. Classifications are defined as follows:
- In-zone: A list of destination addresses/ranges that will be classified as an in-zone traffic, which is free for most providers.
- In-region: A list of addresses/ranges that will be classified as the same region between source and destinations but different zones.
- Cross-region: A list of addresses/ranges that will be classified as the different region from the source regions
When traffic is directed towards a cloud providers service, the network traffic pod can tag the traffic with the relevant service name (e.g. AWS S3, Azure Storage, Google Cloud Storage).
To enable this feature, set the following Helm values:
To verify this feature is functioning properly, you can complete the following steps:
- 1.Confirm the
kubecost-network-costsPods are Running. If these Pods are not in a Running state, kubectl describe them and/or view their logs for errors.
kubecost-networkingtarget is Up in your Prometheus Targets list. View any visible errors if this target is not Up. You can further verify data is being scrapped by the presence of the
kubecost_pod_network_egress_bytes_totalmetric in Prometheus.
- 3.Verify Network Costs are available in your Kubecost Allocation view. View your browser's Developer Console on this page for any access/permissions errors if costs are not shown.
- Failed to locate network pods -- Error message displayed when the Kubecost app is unable to locate the network pods, which we search for by a label that includes our release name. In particular, we depend on the label
app=<release-name>-network-coststo locate the pods. If the app has a blank release name this issue may happen.
- Resource usage is a function of unique src and dest IP/port combinations. Most deployments use a small fraction of a CPU and it is also ok to have this Pod CPU throttled. Throttling should increase parse times but should not have other impacts. The following Prometheus metrics are available in v15.3 for determining the scale and the impact of throttling:
kubecost_network_costs_parsed_entriesis the last number of conntrack entries parsed
kubecost_network_costs_parse_timeis the last recorded parse time
- Today this feature is supported on Unix-based images with conntrack
- Actively tested against GCP, AWS, and Azure
- Daemonsets have shared IP addresses on certain clusters