Kubecost

Product and technical docs

View project on GitHub

Getting Started

Welcome to Kubecost! This page provides commonly used product configurations and feature overviews to help get you up and running after the Kubecost product has been installed.

Configuration
Configuring metric storage
Setting Requests & Limits
Using an existing Prometheus or Grafana installation
Using an existing node exporter installation
Exposing Kubecost with an Ingress
Adding a spot instance configuration (AWS only)
Allocating out of cluster costs

Next Steps
Measure cluster cost efficiency
Cost monitoring best practices
Understanding cost allocation metrics

Storage configuration

The default Kubecost installation comes with a 32Gb persistent volume and 15-day retention period for Prometheus metrics. This is enough space to retain data for ~300 pods, depending on your exact node and container count. See the Kubecost Helm chart configuration options to adjust both retention period and storage size. Note: we do not recommend retaining greater than 30 days of data in Prometheus. For long-term data retention, contact us (team@kubecost.com) about using Kubecost with durable storage enabled.

Custom Prometheus & Grafana

The Kubecost Prometheus deployment is used as both as source & sink for cost & capacity metrics. It’s optimized to not interfere with other observability products and by default only contains metrics that are useful to the Kubecost product.

While we generally recommend using the bundled Prometheus & Grafana, using your existing Grafana & Prometheus installation is supported in our paid products today. You can see basic setup instructions here. In our free product, we only provide best efforts support for this integration because of nuances required in completing this integration successfully. Please contact us (team@kubecost.com) if you want to learn more or if you think we can help!

Setting Requests & Limits

It’s recommended that users set and/or update requests and limits before taking Kubecost into production. This can be done in the Kubecost helm values.yaml for Kubecost modules + subcharts. The exact recommended values for these parameters depends on the size of your cluster, availability requirements, and usage of the Kubecost product. Suggested values for each container can be found within Kubecost itself on the namespace page. More info on these recommendations is available here.

Using an existing node exporter

You can use an existing node exporter DaemonSet, instead of installing another, by toggling the Kubecost helm chart config options (prometheus.nodeExporter.enabled and prometheus.serviceAccounts.nodeExporter.create) shown here. Note: to do this successfully your existing node exporter must be configured to export metrics on the default endpoint/port.

Kubecost Ingress example

Enabling external access to the Kubecost product simply requires exposing access to port 9090 on the kubecost-cost-analyzer pod. This can be accomplished with a number of approaches, including Ingress or Service definitions. The following definition provides an example Ingress with basic auth.

Note: on GCP, you will need to update the kubecost-cost-analyzer service to become a NodePort instead of a ClusterIP type service.

apiVersion: v1
data:
  auth: YWRtaW46JGFwcjEkZ2tJenJxU2ckMWx3RUpFN1lFcTlzR0FNN1VtR1djMAo= # default is admin:admin -- to be replaced
kind: Secret
metadata:
  name: kubecost-auth
  namespace: kubecost
type: Opaque
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: kubecost-ingress
  namespace: kubecost
  labels:
    app: kubecost
  annotations:
     nginx.ingress.kubernetes.io/auth-type: basic
     nginx.ingress.kubernetes.io/auth-secret: kubecost-auth
     nginx.ingress.kubernetes.io/auth-realm: "Authentication Required - ok"
spec:
  backend:
    serviceName: kubecost-cost-analyzer
    servicePort: 9090

Spot Instance Configuration (AWS only)

For more accurate Spot pricing data, visit Settings in the Kubecost frontend to configure a data feed for AWS Spot instances. This enables the Kubecost product to have actual Spot node prices vs user-provided estimates.

AWS Spot info

Necessary Steps

  1. Enable the AWS Spot Instance data feed.
  2. Provide the required S3 bucket information in Settings (example shown above).
  3. Create and attach an IAM role account which can be used to read this bucket. Here’s an example policy:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:Get*",
                "s3:List*"
            ],
            "Resource": "*"
        }
    ]
}

Spot Verification

View logs from the cost-model container in the kubecost-cost-analyzer pod to confirm there are no Spot data feed access errors. You should also see a confirmation log statement like this:

I1104 00:21:02.905327       1 awsprovider.go:1397] Found spot info {Timestamp:2019-11-03 20:35:48 UTC UsageType:USE2-SpotUsage:t2.micro Operation:RunInstances:SV050 InstanceID:i-05487b228492b1a54 MyBidID:sir-9s4rvgbj MyMaxPrice:0.010 USD MarketPrice:0.004 USD Charge:0.004 USD Version:1}
I1104 00:21:02.922372       1 awsprovider.go:1376] Spot feed version is "#Version: 1.0"

The Charge figures in logs should be reflected in your node_total_hourly_cost metrics in Prometheus.

Allocating out of cluster costs

[AWS] Provide your congifuration info in Settings. The information needs includes S3 bucket name, Athena table name, Athena table region, and Athena database name. View this page for more information on completing this process.

[GCP] Provide configuration info by selecting “Add key” from the Cost Allocation Page. View this page for more information on completing this process.

Measuring cluster cost efficiency

For teams interested in reducing their Kubernetes costs, we have seen it be beneficial to first understand how efficiently provisioned resources have been used. This can be answered by measuring the cost of idle resources (e.g. compute, memory, etc) as a percentage of your overall cluster spend. This figure represents the impact of many infrastructure and application-level decision, i.e. machine type selection, bin packing efficiency, and more. The Kubecost product (Cluster Overview page) provides a view into this data for an initial assessment of resource efficiency and the cost of waste.

With an overall understanding of idle spend you will have a better sense for where to focus efforts for efficiency gains. Each resource type can now be tuned for your business. Most teams we’ve seen end up targeting utilization in the following ranges:

  • CPU: 50%-65%
  • Memory: 45%-60%
  • Storage: 65%-80%

Target figures are highly dependent on the predictability and distribution of your resource usage (e.g. P99 vs median), the impact of high utilization on your core product/business metrics, and more. While too low resource utilization is wasteful, too high utilization can lead to latency increases, reliability issues, and other negative behavior.