Kubecost utilizes AWS SigV4 proxy to securely communicate with AMP. It enables password-less authentication using service roles to reduce the risk of exposing credentials.
Federated architecture
To support the large-scale infrastructure (over 100 clusters), Kubecost leverages a Federated ETL architecture. In addition to Amazon Prometheus Workspace, Kubecost stores its data in a streamlined format (ETL) and ships this to a central S3 bucket. Kubecost's ETL data is a computed cache based on Prometheus's metrics, from which users can perform all possible Kubecost queries. By storing the ETL data on an S3 bucket, this integration offers resiliency to your cost allocation data, improves the performance and enables high availability architecture for your Kubecost setup.
Support
See the troubleshooting section of this article if you run into any errors while setting up the integration. For support from AWS, you can submit a support request through your existing AWS support contract.
Add recording rules (optional)
You can add these recording rules to improve the performance. Recording rules allow you to precompute frequently needed or computationally expensive expressions and save their results as a new set of time series. Querying the precomputed result is often much faster than running the original expression every time it is needed. Follow these AWS instructions to add the following rules:
groups: - name:CPUrules: - expr:sum(rate(container_cpu_usage_seconds_total{container_name!=""}[5m]))record:cluster:cpu_usage:rate5m - expr:rate(container_cpu_usage_seconds_total{container_name!=""}[5m])record:cluster:cpu_usage_nosum:rate5m - expr: avg(irate(container_cpu_usage_seconds_total{container_name!="POD", container_name!=""}[5m])) by (container_name,pod_name,namespace)
record:kubecost_container_cpu_usage_irate - expr: sum(container_memory_working_set_bytes{container_name!="POD",container_name!=""}) by (container_name,pod_name,namespace)
record:kubecost_container_memory_working_set_bytes - expr:sum(container_memory_working_set_bytes{container_name!="POD",container_name!=""})record:kubecost_cluster_memory_working_set_bytes - name:Savingsrules: - expr:sum(avg(kube_pod_owner{owner_kind!="DaemonSet"}) by (pod) * sum(container_cpu_allocation) by (pod))record:kubecost_savings_cpu_allocationlabels:daemonset:"false" - expr: sum(avg(kube_pod_owner{owner_kind="DaemonSet"}) by (pod) * sum(container_cpu_allocation) by (pod)) / sum(kube_node_info)
record:kubecost_savings_cpu_allocationlabels:daemonset:"true" - expr: sum(avg(kube_pod_owner{owner_kind!="DaemonSet"}) by (pod) * sum(container_memory_allocation_bytes) by (pod))
record:kubecost_savings_memory_allocation_byteslabels:daemonset:"false" - expr: sum(avg(kube_pod_owner{owner_kind="DaemonSet"}) by (pod) * sum(container_memory_allocation_bytes) by (pod)) / sum(kube_node_info)
record:kubecost_savings_memory_allocation_byteslabels:daemonset:"true"
Troubleshooting
The RunDiagnostic logs in the cost-model container will contain the most useful information.
Verify connection to AMP and that the metric for container_memory_working_set_bytes is available:
If you have set kubecostModel.promClusterIDLabel, you will need to change the query (CLUSTER_ID) to match the label (typically cluster or alpha_eksctl_io_cluster_name).