1 of 23

Additional Configuration

Adding a Product Key

You can apply your product key at any time within the product UI or during an install or upgrade process. More details on both options are provided below.

If you have a multi-cluster setup, you only need to apply your product key on the Kubecost primary cluster, and not on any of the Kubecost secondary clusters.

kubecostToken is a different concept from your product key and is used for managing trial access.

Option 1: Apply your product key at install

Many Kubecost product configuration options can be specified at install-time, including your product key.

Option 1: Storing product key in a secret

To create a secret you will need to create a JSON file called productkey.json with the following format. Be sure to replace <YOUR_PRODUCT_KEY> with your Kubecost product key.

{ 
  "key": "<YOUR_PRODUCT_KEY>"
}

Run the following command to create the secret. Replace <SECRET_NAME> with a name for the secret (example: productkeysecret):

$ kubectl create secret generic <SECRET_NAME> -n kubecost --from-file=productkey.json

Update your values.yaml to enable the product key and specify the secret name:

kubecostProductConfigs.productKey.enabled=true
kubecostProductConfigs.productKey.secretname=<SECRET_NAME>

Run a helm upgrade command to start using your product key.

Option 2: Apply your product key to values.yaml and upgrade Kubecost

This specific parameter can be configured under kubecostProductConfigs.productKey.key in your values.yaml.

You must also set the kubecostProductConfigs.productKey.enabled=true when using this option. That this will leave your secrets unencrypted in values.yaml. Use a Kubernetes secret as in the previous method to avoid this.

Option 3: Apply your product key in the Kubecost UI

To apply your license key within the Kubecost UI, visit the Overview page, then select Upgrade in the page header.

Next, select Add Key in the dialog menu shown below.

You can then supply your Kubecost provided license key in the input box that is now visible.

Verification

To verify that your key has been applied successfully, visit Settings to confirm the final digits are as expected:

Enabling Annotation Emission

If interested in filtering or aggregating by Kubernetes Annotations when using the Allocation API, you will need to enable annotation emission. This will configure your Kubecost installation to generate the kube_pod_annotations and kube_namespace_annotations metrics as listed in our Kubecost Metrics doc.

You can enable it in your values.yaml:

kubecostMetrics:
  emitPodAnnotations: true
  emitNamespaceAnnotations: true

You can also enable it via your helm install or helm upgrade command:

helm upgrade -i kubecost kubecost/cost-analyzer \
  --namespace kubecost \
  --set kubecostMetrics.emitNamespaceAnnotations=true \
  --set kubecostMetrics.emitPodAnnotations=true

These flags can be set independently. Setting one of these to true and the other to false will omit one and not the other.

Network Cost Configuration

Overview

The network costs DaemonSet is an optional utility that gives Kubecost more detail to attribute costs to the correct pods.

When networkCost is enabled, Kubecost gathers pod-level network traffic metrics to allocate network transfer costs to the pod responsible for the traffic.

See this doc for more detail on network cost allocation methodology.

The network costs metrics are collected using a DaemonSet (one pod per node) that uses source and destination detail to determine egress and ingress data transfers by pod and are classified as internet, cross-region and cross-zone.

Usage

With the network costs DaemonSet enabled, the Network column on the Allocations page will reflect the portion of network transfer costs based on the chart-level aggregation.

When using Kubecost version 1.99 and above: Greater detail can be accessed through Allocations UI only when aggregating by namespace and selecting the link on that namespace. This opens the namespace detail page where there is a card at the bottom.

Grafana dashboard

A Grafana dashboard is included with the Kubecost installation, but you can also find it in our cost-analyzer-helm-chart repository.

Enabling network costs

To enable this feature, set the following parameter in values.yaml during or after Helm installation:

networkCosts:
  enabled: true

Additional configuration

You can view a list of common config options in this values.yaml template.

Prometheus

If using Kubecost-bundled Prometheus instance, the scrape is automatically configured.
If you are integrating with an existing Prometheus, you can set networkCosts.prometheusScrape=true and the network costs service should be auto-discovered.
Alternatively, a serviceMonitor is also available.

Log Level

You can adjust log level using the extraArgs config:

networkCosts:
  enabled: true
  extraArgs:
    - "-v=0"

The levels range from 0 to 5, with 0 being the least verbose (only showing panics) and 5 being the most verbose (showing trace-level information).
Ref: sig-instrumentation

Cloud Provider Service Tagging

Service tagging allows Kubecost to identify network activity between the pods and various cloud services (e.g. AWS S3, EC2, RDS, Azure Storage, Google Cloud Storage).

To enable this, set the following Helm values:

networkCosts:
  config:
    services:
      # google-cloud-services: when set to true, enables labeling traffic metrics with google cloud
      # service endpoints
      google-cloud-services: false
      # amazon-web-services: when set to true, enables labeling traffic metrics with amazon web service
      # endpoints.
      amazon-web-services: false
      # azure-cloud-services: when set to true, enables labeling traffic metrics with azure cloud service
      # endpoints
      azure-cloud-services: false
      # user defined services provide a way to define custom service endpoints which will label traffic metrics
      # falling within the defined address range.
      #services:
      #  - service: "test-service-1"
      #    ips:
      #      - "19.1.1.2"
      #  - service: "test-service-2"
      #    ips:
      #      - "15.128.15.2"
      #      - "20.0.0.0/8"

Resource limiting

In order to reduce resource usage, Kubecost recommends setting a CPU limit on the network costs DaemonSet. This will cause a few seconds of delay during peak usage and does not affect overall accuracy. This is done by default in Kubecost 1.99+.

For existing deployments, these are the recommended values:

networkCosts:
  config:
    resources:
      limits:
        cpu: 500m
      requests:
        cpu: 50m
        memory: 20Mi

Benchmarking metrics

The network-simulator was used to real-time simulate updating ConnTrack entries while simultaneously running a cluster simulated network costs instance. To profile the heap, after a warmup of roughly five minutes, a heap profile of 1,000,000 ConnTrack entries was gathered and examined.

Each ConnTrack entry is equivalent to two transport directions, so every ConnTrack entry is two map entries (connections).

After modifications were made to the network costs to parallelize the delta and dispatch, large map comparisons were significantly lighter in memory. The same tests were performed against simulated data with the following footprint results.

Kubernetes network traffic metrics

The primary source of network metrics is a DaemonSet Pod hosted on each of the nodes in a cluster. Each DaemonSet pod uses hostNetwork: true such that it can leverage an underlying kernel module to capture network data. Network traffic data is gathered and the destination of any outbound networking is labeled as:

Internet Egress: Network target destination was not identified within the cluster.
Cross Region Egress: Network target destination was identified, but not in the same provider region.
Cross Zone Egress: Network target destination was identified, and was part of the same region but not the same zone.

These classifications are important because they correlate with network costing models for most cloud providers. To see more detail on these metric classifications, you can view pod logs with the following command:

kubectl logs kubecost-network-costs-<pod-identifier> -n kubecost

This will show you the top source and destination IP addresses and bytes transferred on the node where this Pod is running. To disable logs, you can set the helm value networkCosts.trafficLogging to false.

Overriding traffic classifications

For traffic routed to addresses outside of your cluster but inside your VPC, Kubecost supports the ability to directly classify network traffic to a particular IP address or CIDR block. This feature can be configured in values.yaml under networkCosts.config. Classifications are defined as follows:

As of Kubecost 1.101, LoadBalancers that proxy traffic to the Internet (ingresses and gateways) can be specifically classified.

In-zone: A list of destination addresses/ranges that will be classified as in-zone traffic, which is free for most providers.
In-region: A list of addresses/ranges that will be classified as the same region between source and destinations but different zones.
Cross-region: A list of addresses/ranges that will be classified as different regions from the source regions.
Internet: By design, all IP addresses not in a specific list are considered internet. This list can include IPs that would otherwise be "in-zone" or local to be classified as Internet traffic.

networkCosts:
  config:
    destinations:
      # In Zone contains a list of address/range that will be
      # classified as in zone.
      in-zone:
        # Loopback Addresses in "IANA IPv4 Special-Purpose Address Registry"
        - "127.0.0.0/8"
        # IPv4 Link Local Address Space
        - "169.254.0.0/16"
        # Private Address Ranges in RFC-1918
        - "10.0.0.0/8" # Remove this entry if using Multi-AZ Kubernetes
        - "172.16.0.0/12"
        - "192.168.0.0/16"

      # In Region contains a list of address/range that will be
      # classified as in region. This is synonymous with cross
      # zone traffic, where the regions between source and destinations
      # are the same, but the zone is different.
      in-region: []

      # Cross Region contains a list of address/range that will be
      # classified as non-internet egress from one region to another.
      cross-region: []

      # Internet contains a list of address/range that will be
      # classified as internet traffic. This is synonymous with traffic
      # that cannot be classified within the cluster.
      # NOTE: Internet classification filters are executed _after_
      # NOTE: direct-classification, but before in-zone, in-region,
      # NOTE: and cross-region.
      internet: []

      # Direct Classification specifically maps an ip address or range
      # to a region (required) and/or zone (optional). This classification
      # takes priority over in-zone, in-region, and cross-region configurations.
      direct-classification: []
      # - region: "us-east1"
      #   zone: "us-east1-c"
      #   ips:
      #     - "10.0.0.0/24"

Permissions

The network costs DaemonSet requires a privileged spec.containers[*].securityContext and hostNetwork: true in order to leverage an underlying kernel module to capture network data.

Additionally, the network costs DaemonSet mounts to the following directories on the host filesytem. It needs both read & write access. The network costs DaemonSet will only write to the filesystem to enable conntrack (docs ref)

/proc/net/
/proc/sys/net/netfilter

Troubleshooting

To verify this feature is functioning properly, you can complete the following steps:

Confirm the kubecost-network-costs pods are Running. If these Pods are not in a Running state, kubectl describe them and/or view their logs for errors.
Ensure kubecost-networking target is Up in your Prometheus Targets list. View any visible errors if this target is not Up. You can further verify data is being scrapped by the presence of the kubecost_pod_network_egress_bytes_total metric in Prometheus.
Verify Network Costs are available in your Kubecost Allocation view. View your browser's Developer Console on this page for any access/permissions errors if costs are not shown.

Common issues

Failed to locate network pods: Error message is displayed when the Kubecost app is unable to locate the network pods, which we search for by a label that includes our release name. In particular, we depend on the label app=<release-name>-network-costs to locate the pods. If the app has a blank release name this issue may happen.
Resource usage is a function of unique src and dest IP/port combinations. Most deployments use a small fraction of a CPU and it is also ok to have this Pod CPU throttled. Throttling should increase parse times but should not have other impacts. The following Prometheus metrics are available in v15.3 for determining the scale and the impact of throttling:
- kubecost_network_costs_parsed_entries is the last number of ConnTrack entries parsed kubecost_network_costs_parse_time is the last recorded parse time

Feature limitations

Today this feature is supported on Unix-based images with ConnTrack
Actively tested against GCP, AWS, and Azure
Pods that use hostNetwork share the host IP address

User Management (SAML)

SSO and RBAC are only officially supported on Kubecost Enterprise plans.

Kubecost supports single sign-on (SSO) and role-based access control (RBAC) with SAML 2.0. Kubecost works with most identity providers including Okta, Auth0, Microsoft Entra ID (formerly Azure AD), PingID, and KeyCloak.

Overview of features

User authentication (.Values.saml): SSO provides a simple mechanism to restrict application access internally and externally
Pre-defined user roles (.Values.saml.rbac):
- admin: Full control with permissions to manage users, configure model inputs, and application settings.
- readonly: User role with read-only permission.
- editor: Role can change and build alerts and reports, but cannot edit application settings and otherwise functions as read-only.
Custom access roles (filters.json): Limit users based on attributes or group membership to view a set of namespaces, clusters, or other aggregations

# EXAMPLE CONFIGURATION
# View setup guides below, for full list of Helm configuration values
saml:
  enabled: true
  secretName: "kubecost-okta"
  idpMetadataURL: "https://your.idp.subdomain.okta.com/app/exk4h09oysB785123/sso/saml/metadata"
  appRootURL: "https://kubecost.your.com"
  authTimeout: 1440
  audienceURI: "https://kubecost.your.com"
  nameIDFormat: "urn:oasis:names:tc:SAML:2.0:attrname-format:basic"
  rbac:
    enabled: true
    groups:
      - name: admin
        enabled: true
        assertionName: "kubecost_group"
        assertionValues:
          - "kubecost_admin"
          - "kubecost_superusers"
      - name: readonly
        enabled: true
        assertionName:  "kubecost_group"
        assertionvalues:
          - "kubecost_users"
    customGroups:
      - assertionName: "kubecost_group"

SAML setup guides

Microsoft Entra ID (formerly Azure AD) SAML Integration for Kubecost
Okta setup guide

All SAML 2.0 providers also work. The above guides can be used as templates for what is required.

Using the Kubecost API

When SAML SSO is enabled in Kubecost, ports 9090 and 9003 of service/kubecost-cost-analyzer will require authentication. Therefore user API requests will need to be authenticated with a token. The token can be obtained by logging into the Kubecost UI and copying the token from the browser’s local storage. Alternatively, a long-term token can be issued to users from your identity provider.

curl -L 'http://kubecost.mycompany.com/model/allocation?window=1d' \
  -H 'Cookie: token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOlsiYWRtaW4iLCJncm91cDprdWJlY29zdF9hZG1pbiIsImdyb3VwOmFkbWluQG15Y29tcGFueS5jb20iXSwiZXhwIjoxNjkwMzA2MjYwLjk0OTYyMX0.iLbUuMo0eYhNg0hzv_EEHLIX5Z0du4woPevX3wEnAh8'

For admins, Kubecost additionally exposes an unauthenticated API on port 9004 of service/kubecost-cost-analyzer.

kubectl port-forward service/kubecost-cost-analyzer 9004:9004
curl -L 'localhost:9004/allocation?window=1d'

View your SAML Group

You will be able to view your current SAML Group in the Kubecost UI by selecting Settings from the left navigation, then scrolling to 'SAML Group'. Your access level will be displayed in the 'Current SAML Group' box.

SAML troubleshooting guide

Disable SAML and confirm that the cost-analyzer pod starts.
If step 1 is successful, but the pod is crashing or never enters the ready state when SAML is added, it is likely that there is panic loading or parsing SAML data.
kubectl logs deployment/kubecost-cost-analyzer -c cost-model -n kubecost

If you’re supplying the SAML from the address of an Identity Provider Server, curl the SAML metadata endpoint from within the Kubecost pod and ensure that a valid XML EntityDescriptor is being returned and downloaded. The response should be in this format:

$ kubectl exec deployment/kubecost-cost-analyzer -c cost-analyzer-frontend -n kubecost -it -- /bin/sh
$ curl https://dev-elu2z98r.auth0.com/samlp/metadata/c6nY4M37rBP0qSO1IYIqBPPyIPxLS8v2

<EntityDescriptor entityID="urn:dev-elu2z98r.auth0.com" xmlns="urn:oasis:names:tc:SAML:2.0:metadata">
  <IDPSSODescriptor protocolSupportEnumeration="urn:oasis:names:tc:SAML:2.0:protocol">
    <KeyDescriptor use="signing">
      <KeyInfo xmlns="http://www.w3.org/2000/09/xmldsig#">
        <X509Data>
         <X509Certificate>...</X509Certificate>
        </X509Data>
      </KeyInfo>
    </KeyDescriptor>
    <SingleLogoutService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-Redirect" Location="https://dev-elu2z98r.auth0.com/samlp/c6nY4M37rBP0qSO1IYIqBPPyIPxLS8v2/logout"/>
    <SingleLogoutService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST" Location="https://dev-elu2z98r.auth0.com/samlp/c6nY4M37rBP0qSO1IYIqBPPyIPxLS8v2/logout"/>
    <NameIDFormat>urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress</NameIDFormat>
    <NameIDFormat>urn:oasis:names:tc:SAML:2.0:nameid-format:persistent</NameIDFormat>
    <NameIDFormat>urn:oasis:names:tc:SAML:2.0:nameid-format:transient</NameIDFormat>
    <SingleSignOnService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-Redirect" Location="https://dev-elu2z98r.auth0.com/samlp/c6nY4M37rBP0qSO1IYIqBPPyIPxLS8v2"/>
    <SingleSignOnService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST" Location="https://dev-elu2z98r.auth0.com/samlp/c6nY4M37rBP0qSO1IYIqBPPyIPxLS8v2"/>
    <Attribute Name="http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri" FriendlyName="E-Mail Address" xmlns="urn:oasis:names:tc:SAML:2.0:assertion"/>
    <Attribute Name="http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri" FriendlyName="Given Name" xmlns="urn:oasis:names:tc:SAML:2.0:assertion"/>
    <Attribute Name="http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri" FriendlyName="Name" xmlns="urn:oasis:names:tc:SAML:2.0:assertion"/>
    <Attribute Name="http://schemas.xmlsoap.org/ws/2005/05/identity/claims/surname" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri" FriendlyName="Surname" xmlns="urn:oasis:names:tc:SAML:2.0:assertion"/>
    <Attribute Name="http://schemas.xmlsoap.org/ws/2005/05/identity/claims/nameidentifier" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri" FriendlyName="Name ID" xmlns="urn:oasis:names:tc:SAML:2.0:assertion"/>
  </IDPSSODescriptor>
</EntityDescriptor>

Common SAML errors

The URL returns a 404 error or returning HTML

Contact your SAML admin to find the URL on your identity provider that serves the raw XML file.

Returning an EntitiesDescriptor instead of an EntityDescriptor

Certain metadata URLs could potentially return an EntitiesDescriptor, instead of an EntityDescriptor. While Kubecost does not currently support using an EntitiesDescriptor, you can instead copy the EntityDescriptor into a new file you create called metadata.xml:

Download the XML from the metadata URL into a file called metadata.xml
Copy all the attributes from EntitiesDescriptor to the EntityDescriptor that are not present.
Remove the <EntitiesDescriptor> tag from the beginning.
Remove the </EntitiesDescriptor> from the end of the XML file.

You are left with data in a similar format to the example below:

<EntityDescriptor xmlns="urn:oasis:names:tc:SAML:2.0:metadata" xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" entityID="kubecost-entity-id">
  .... 
</EntityDescriptor>

Then, you can upload the EntityDescriptor to a secret in the same namespace as kubecost and use that directly.

kubectl create secret generic metadata-secret --from-file=./metadata.xml --namespace kubecost

To use this secret, in your helm values set metadataSecretName to the name of the secret created above, and set idpMetadataURL to the empty string:

saml:
  metadataSecretName: “metadata-secret”
  idpMetadataURL: “”

Invalid NameID format

On Keycloak, if you receive an “Invalid NameID format” error, you should set the option “force nameid format” in Keycloak. See Keycloak docs for more details.

Users of CSI driver for storing SAML secret

For users who want to use CSI driver for storing SAML secret, we suggest this guide.

InvalidNameIDPolicy format

From a PingIdentity article:

An alternative solution is to add an attribute called "SAML_SP_NAME_QUALIFIER" to the connection's attribute contract with a TEXT value of the requested SPNameQualifier. When you do this, select the following for attribute name format: urn:oasis:names:tc:SAML:1.1:nameid-format:unspecified

On the PingID side: specify an attribute contract “SAML_SP_NAME_QUALIFIER” with the format urn:oasis:names:tc:SAML:1.1:nameid-format:unspecified.

On the Kubecost side: in your Helm values, set saml.nameIDFormat to the same format set by PingID:

saml:
  nameIDFormat: “urn:oasis:names:tc:SAML:1.1:nameid-format:unspecified”

Make sure audienceURI and appRootURL match the entityID configured within PingFed.

Microsoft Entra ID SAML Integration for Kubecost

SSO and RBAC are only officially supported on Kubecost Enterprise plans.

This guide will show you how to configure Kubecost integrations for SAML and RBAC with Microsoft Entra ID.

Entra ID SAML configuration

Step 1: Create an enterprise application in Microsoft Entra ID for Kubecost

In the Azure Portal, go to the Microsoft Entra ID Overview page and select Enterprise applications in the left navigation underneath Manage.
On the Enterprise applications page, select New application.
On the Browse Microsoft Entra ID Gallery page, select Create your own application and select Create. The 'Create your own application window' opens.
Provide a custom name for your app. Then, select Integrate any other application you don't find in the gallery. Select Create.

Step 2: Configuring an Entra ID Enterprise Application

Return to the Enterprise applications page from Step 1.2. Find and select your Enterprise application from the table.
Select Properties in the left navigation under Manage to begin editing the application. Start by updating the logo, then select Save. Feel free to use an official Kubecost logo.
Select Users and groups in the left navigation. Assign any users or groups you want to have access to Kubecost, then select Assign.
Select Single sign-on from the left navigation. In the 'Basic SAML Configuration' box, select Edit. Populate both the Identifier and Reply URL with the URL of your Kubecost environment without a trailing slash (ex: http://localhost:9090), then select Save. If your application is using OpenId Connect and OAuth, most of the SSO configuration will have already been completed.

(Optional) If you intend to use RBAC, you also need to add a group claim. Without leaving the SAML-based Sign-on page, select Edit next to Attributes & Claims. Select Add a group claim. Configure your group association, then select Save. The claim name will be used as the assertionName value in the values-saml.yaml file.
On the SAML-based Sign-on page, in the SAML Certificates box, copy the login of 'App Federation Metadata Url' and add it to your values-saml.yaml as the value of idpMetadataURL.

In the SAML Certificates box, select the Download link next to Certificate (Base64) to download the X.509 cert. Name the file myservice.cert.
Create a secret using the cert with the following command:

kubectl create secret generic kubecost-azuread --from-file myservice.cert --namespace kubecost

With your existing Helm install command, append -f values-saml.yaml to the end.

At this point, test your SSO configuration to make sure it works before moving on to the next section. There is a Troubleshooting section at the end of this doc for help if you are experiencing problems.

Entra ID RBAC configuration

Admin/read only

The simplest form of RBAC in Kubecost is to have two groups: admin and read only. If your goal is to simply have these two groups, you do not need to configure filters. If you do not configure filters, this message in the logs is expected: file corruption: '%!s(MISSING)'

The values-saml.yaml file contains the admin and readonly groups in the RBAC section:

  rbac:
    enabled: true
    groups:
      - name: admin
        enabled: true # if admin is disabled, all SAML users will be able to make configuration changes to the kubecost frontend
        assertionName: "http://schemas.microsoft.com/ws/2008/06/identity/claims/groups" # a SAML Assertion, one of whose elements has a value that matches on of the values in assertionValues
        assertionValues:
          - "{group-object-id-1}"
          - "{group-object-id-2}"
      - name: readonly
        enabled: true # if readonly is disabled, all users authorized on SAML will default to readonly
        assertionName:  "http://schemas.microsoft.com/ws/2008/06/identity/claims/groups"
        assertionvalues:
          - "{group-object-id-3}"
    customGroups: # not needed for simple admin/readonly RBAC
      - assertionName: "http://schemas.microsoft.com/ws/2008/06/identity/claims/groups"

Remember the value of assertionName needs to match the claim name given in Step 2.5 above.

Filtering

Filters are used to give visibility to a subset of objects in Kubecost. RBAC filtering is capable can filter for any types as the Allocation API. Examples of the various filters available are these files:

These filters can be configured using groups or user attributes in your Entra ID directory. It is also possible to assign filters to specific users. The example below is using groups.

You can combine filtering with admin/read only rights, and it can be configured the same way. The same assertionName and values will be used, as is the case in this example.

The values-saml.yaml file contains this customGroups section for filtering:

    customGroups: # not needed for simple admin/readonly RBAC
      - assertionName: "http://schemas.microsoft.com/ws/2008/06/identity/claims/groups"

The array of groups obtained during the authentication request will be matched to the subject key in the filters.yaml. See this example filters.json (linked above) to understand how your created groups will be formatted:

{
   "{group-object-id-a}":{
      "allocationFilters":[
         {
            "namespace":"*",
            "cluster":"*"
         }
      ]
   },
   "{group-object-id-b}":{
      "allocationFilters":[
         {
            "namespace":"",
            "cluster":"*"
         }
      ]
   },
   "{group-object-id-c}":{
      "allocationFilters":[
         {
            "namespace":"dev-*,nginx-ingress",
            "cluster":"*"
         }
      ]
   }
}

As an example, we will configure the following:

Admins will have full access to the Kubecost UI and have visibility to all resources
Kubecost users, by default, will not have visibility to any namespace and will be read only. If a group doesn't have access to any resources, the Kubecost UI may appear to be broken.
The dev-namespaces group will have read only access to the Kubecost UI and only have visibility to namespaces that are prefixed with dev- or are exactly nginx-ingress

In the Entra ID left navigation, select Groups. Select New group to create a new group.
For Group type, select Security. Enter a name your group. For this demonstration, create groups for kubecost_users, kubecost_admin and kubecost_dev-namespaces. By selecting No members selected, Azure will pull up a list of all users in your organization for you to add (you can add or remove members after creating the group also). Add all users to the kubecost_users group, and the appropriate users to each of the other groups for testing. Kubecost admins will be part of both the read only kubecost_users and kubecost_admin groups. Kubecost will assign the most rights/least restrictions when there are conflicts.
When you are done, select Create at the bottom of the page. Repeat Steps 1-2 as needed for all groups.
Return to your created Enterprise application and select Users and groups from the left navigation. Select Add user/group. Select and add all relevant groups you created. Then select Assign at the bottom of the page to confirm.
Modify filters.json as depicted above.
1. Replace {group-object-id-a} with the Object Id for kubecost_admin
2. Replace {group-object-id-b} with the Object Id for kubecost_users
3. Replace {group-object-id-c} with the Object Id for kubecost_dev-namespaces

Create the ConfigMap:

kubectl create configmap group-filters --from-file filters.json -n kubecost

You can modify the ConfigMap without restarting any pods.

kubectl delete configmap -n kubecost group-filters && kubectl create configmap -n kubecost group-filters --from-file filters.json

Troubleshooting

You can look at the logs on the cost-model container. This script is currently a work in progress.

kubectl logs deployment/kubecost-cost-analyzer -c cost-model --follow |grep -v -E 'resourceGroup|prometheus-server'|grep -i -E 'group|xmlname|saml|login|audience'

When the group has been matched, you will see:

2022-08-27T05:32:03.657455982Z INF AUDIENCE: [readonly group:readonly@kubecost.com]
2022-08-27T05:51:02.681813711Z INF AUDIENCE: [admin group:admin@kubecost.com]

configwatchers.go:69] ERROR UPDATING group-filters CONFIG: []map[string]string: ReadMapCB: expect }, but found l, error found in #10 byte of ...|el": "{ "label": "ap|..., bigger context ...|nFilters": [
         {
            "label": "{ "label": "app", "value": "nginx" }"
         }
     |...

This is what a normal output looks like:

2022-09-01T03:47:28.556977486Z INF   http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name: {XMLName:{Space:urn:oasis:names:tc:SAML:2.0:assertion Local:Attribute} FriendlyName: Name:http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name NameFormat: Values:[{XMLName:{Space:urn:oasis:names:tc:SAML:2.0:assertion Local:AttributeValue} Type: Value:admin@kubecost.com}]}
2022-09-01T03:47:28.55700579Z INF   http://schemas.microsoft.com/identity/claims/tenantid: {XMLName:{Space:urn:oasis:names:tc:SAML:2.0:assertion Local:Attribute} FriendlyName: Name:http://schemas.microsoft.com/identity/claims/tenantid NameFormat: Values:[{XMLName:{Space:urn:oasis:names:tc:SAML:2.0:assertion Local:AttributeValue} Type: Value:<TENANT_ID_GUID>}}]}
2022-09-01T03:47:28.557019809Z INF   http://schemas.microsoft.com/identity/claims/objectidentifier: {XMLName:{Space:urn:oasis:names:tc:SAML:2.0:assertion Local:Attribute} FriendlyName: Name:http://schemas.microsoft.com/identity/claims/objectidentifier NameFormat: Values:[{XMLName:{Space:urn:oasis:names:tc:SAML:2.0:assertion Local:AttributeValue} Type: Value:<OBJECT_ID_GUID>}]}
2022-09-01T03:47:28.557052714Z INF   http://schemas.microsoft.com/ws/2008/06/identity/claims/groups: {XMLName:{Space:urn:oasis:names:tc:SAML:2.0:assertion Local:Attribute} FriendlyName: Name:http://schemas.microsoft.com/ws/2008/06/identity/claims/groups NameFormat: Values:[{XMLName:{Space:urn:oasis:names:tc:SAML:2.0:assertion Local:AttributeValue} Type: Value:<GROUP_ID_GUID>}]}
2022-09-01T03:47:28.557067146Z INF   http://schemas.microsoft.com/identity/claims/identityprovider: {XMLName:{Space:urn:oasis:names:tc:SAML:2.0:assertion Local:Attribute} FriendlyName: Name:http://schemas.microsoft.com/identity/claims/identityprovider NameFormat: Values:[{XMLName:{Space:urn:oasis:names:tc:SAML:2.0:assertion Local:AttributeValue} Type: Value:https://sts.windows.net/<TENANT_ID_GUID>/}]}
2022-09-01T03:47:28.557079034Z INF   http://schemas.microsoft.com/claims/authnmethodsreferences: {XMLName:{Space:urn:oasis:names:tc:SAML:2.0:assertion Local:Attribute} FriendlyName: Name:http://schemas.microsoft.com/claims/authnmethodsreferences NameFormat: Values:[{XMLName:{Space:urn:oasis:names:tc:SAML:2.0:assertion Local:AttributeValue} Type: Value:http://schemas.microsoft.com/ws/2008/06/identity/authenticationmethod/password}]}
2022-09-01T03:47:28.557118706Z INF Adding authorizations '[admin group:admin@kubecost.com]' for user
2022-09-01T03:47:28.594663386Z INF Login called
2022-09-01T03:47:28.629402419Z INF Attempting to authenticate saml...
2022-09-01T03:47:28.629509235Z INF Authenticated saml
...
2022-09-01T03:47:29.11007143Z INF AUDIENCE: [admin group:seanp@teamkubecost.onmicrosoft.com]

Okta SAML Integration for Kubecost

SSO and RBAC are only officially supported on Kubecost Enterprise plans.

This guide will show you how to configure Kubecost integrations for SSO and RBAC with Okta.

Okta SSO configuration

To enable SSO for Kubecost, this tutorial will show you how to create an application in Okta.

Go to the Okta admin dashboard (https://[your-subdomain]okta.com/admin/dashboard) and select Applications from the left navigation. On the Applications page, select Create App Integration > SAML 2.0 > Next.
On the 'Create SAML Integration' page, provide a name for your app. Feel free to also use this official Kubecost logo for the App logo field. Then, select Next.
Your SSO URL should be your application root URL followed by '/saml/acs', like: https://[your-kubecost-address].com/saml/acs
Your Audience URI (SP Entity ID) should be set to your application root without a trailing slash: https://[your-kubecost-address.com
(Optional) If you intend to use RBAC: under Group Attribute Statements, enter a name (ex: kubecost_group) and a filter based on your group naming standards (example Starts with kubecost_). Then, select Next.
Provide any feedback as needed, then select Finish.
Return to the Applications page, select your newly-created app, then select the Sign On tab. Copy the URL for Identity Provider metadata, and add that value to .Values.saml.idMetadataURL in this values-saml.yaml file.

To fully configure SAML 2.0, select View Setup Instructions, download the X.509 certificate, and name the file myservice.cert.

Create a secret using the certificate with the following command:

kubectl create secret generic kubecost-okta --from-file myservice.cert --namespace kubecost

For configuring single app logout, read Okta's documentation on the subject. then, update the values.saml:redirectURLvalue in your values.yaml file.

Use this Okta document to assign individuals or groups access to your Kubecost application.
Finally, add -f values-saml.yaml to your Kubecost Helm upgrade command:

helm upgrade --install kubecost \
  --repo https://kubecost.github.io/cost-analyzer/ cost-analyzer \
  --namespace kubecost --create-namespace \
  -f values-saml.yaml

At this point, test your SSO to ensure it is working properly before moving on to the next section.

Okta RBAC configuration (admin/readonly)

The simplest form of RBAC in Kubecost is to have two groups: admin and readonly. If your goal is to simply have these two groups, you do not need to configure filters. This will result in the logs message: file corruption: '%!s(MISSING)', but this is expected.

The values-saml.yaml file contains the admin and readonly groups in the RBAC section:

  rbac:
    enabled: true
    groups:
      - name: admin
        enabled: true # if admin is disabled, all SAML users will be able to make configuration changes to the kubecost frontend
        assertionName: "kubecost_group" # a SAML Assertion, one of whose elements has a value that matches on of the values in assertionValues
        assertionValues:
          - "kubecost_admin"
          - "kubecost_superusers"
      - name: readonly
        enabled: true # if readonly is disabled, all users authorized on SAML will default to readonly
        assertionName:  "kubecost_group"
        assertionvalues:
          - "kubecost_users"

The assertionName: "kubecost_group" value needs to match the name given in Step 5 of the Okta SSO Configuration section.

Okta RBAC configuration (filtering)

Filters are used to give visibility to a subset of objects in Kubecost. Examples of the various filters available are in filters.json and filters-examples.json. RBAC filtering is capable of all the same types of filtering features as that of the Allocation API.

It's possible to combine filtering with admin/readonly rights

These filters can be configured using groups or user attributes in your Okta directory. It is also possible to assign filters to specific users. The example below is using groups.

Filtering is configured very similarly to the admin/readonly above. The same group pattern match (kubecost_group) can be used for both, as is the case in this example:

    customGroups: # not needed for simple admin/readonly RBAC
      - assertionName: "kubecost_group"

The array of groups obtained during the authorization request will be matched to the subject key in the filters.json:

{
   "kubecost_admin":{
      "allocationFilters":[
         {
            "namespace":"*",
            "cluster":"*"
         }
      ]
   },
   "kubecost_users":{
      "allocationFilters":[
         {
            "namespace":"",
            "cluster":"*"
         }
      ]
   },
   "kubecost_dev-namespaces":{
      "allocationFilters":[
         {
            "namespace":"dev-*,nginx-ingress",
            "cluster":"*"
         }
      ]
   }
}

As an example, we will configure the following:

Admins will have full access to the Kubecost UI and have visibility to all resources
Kubecost users, by default, will not have visibility to any namespace and will be readonly. If a group doesn't have access to any resources, the Kubecost UI may appear to be broken
The dev-namespaces group will have read only access to the Kubecost UI and only have visibility to namespaces that are prefixed with dev- or are exactly nginx-ingress

Go to the Okta admin dashboard (https://[your-subdomain]okta.com/admin/dashboard) and select Directory > Groups from the left navigation. On the Groups page, select Add group.
Create groups for kubecost_users, kubecost_admin and kubecost_dev-namespaces by providing each value as the name with an optional description, then select Save. You will need to perform this step three times, one for each group.
Select each group, then select Assign people and add the appropriate users for testing. Select Done to confirm edits to a group. Kubecost admins will be part of both the read only kubecost_users and kubecost_admin groups. Kubecost will assign the most rights if there are conflicts.
Return to the Groups page. Select kubecost_users, then in the Applications tab, assign the Kubecost application. You do not need to assign the other kubecost_ groups to the Kubecost application because all users already have access in the kubecost_users group.
Modify filters.json as depicted above.
Create the ConfigMap using the following command:

kubectl create configmap group-filters --from-file filters.json -n kubecost

You can modify the ConfigMap without restarting any pods.

kubectl delete configmap -n kubecost group-filters && kubectl create configmap -n kubecost group-filters --from-file filters.json

Encrypted SAML claims

Generate an X509 certificate and private key. Below is an example using OpenSSL:

openssl genpkey -algorithm RSA -out saml-encryption-key.pem -pkeyopt rsa_keygen_bits:2048

Generate a certificate signing request (CSR)

openssl req -new -key saml-encryption-key.pem -out request.csr

Request your organization's domain owner to sign the certificate, or generate a self-signed certificate:

openssl x509 -req -days 365 -in request.csr -signkey saml-encryption-key.pem -out saml-encryption-cert.cer

Go to your application, then under the General tab, edit the following SAML Settings:

Assertion Encryption: Encrypted
In the Encryption Algorithm box that appears, select AES256-CBC.
Select Browse Files in the Encryption Certificate field and upload an image file of your certifcate.

Create a secret with the certificate. The file name must be saml-encryption-cert.cer.

kubectl create secret generic kubecost-saml-cert --from-file saml-encryption-cert.cer --namespace kubecost

Create a secret with the private key. The file name must be saml-encryption-key.pem.

kubectl create secret generic kubecost-saml-decryption-key --from-file saml-encryption-key.pem --namespace kubecost

Pass the following values via Helm into your values.yaml:

saml:
   encryptionCertSecret: "kubecost-saml-cert"
   decryptionKeySecret: "kubecost-saml-decryption-key"

Troubleshooting

You can view the logs on the cost-model container. In this example, the assumption is that the prefix for Kubecost groups is kubecost_. This command is currently a work in progress.

When the group has been matched, you will see:

auth.go:167] AUDIENCE: [readonly group:readonly@kubecost.com]
auth.go:167] AUDIENCE: [admin group:admin@kubecost.com]

configwatchers.go:69] ERROR UPDATING group-filters CONFIG: []map[string]string: ReadMapCB: expect }, but found l, error found in #10 byte of ...|el": "{ "label": "ap|..., bigger context ...|nFilters": [
         {
            "label": "{ "label": "app", "value": "nginx" }"
         }
     |...

This is what you should expect to see:

I0330 14:48:20.556725       1 costmodel.go:3421]   kubecost_user_type: {XMLName:{Space:urn:oasis:names:tc:SAML:2.0:assertion Local:Attribute} FriendlyName: Name:kubecost_user_type NameFormat:urn:oasis:names:tc:SAML:2.0:attrname-format:basic Values:[{XMLName:{Space:urn:oasis:names:tc:SAML:2.0:assertion Local:AttributeValue} Type: Value:}]}
I0330 14:48:20.556767       1 costmodel.go:3421]   firstname: {XMLName:{Space:urn:oasis:names:tc:SAML:2.0:assertion Local:Attribute} FriendlyName: Name:firstname NameFormat:urn:oasis:names:tc:SAML:2.0:attrname-format:basic Values:[{XMLName:{Space:urn:oasis:names:tc:SAML:2.0:assertion Local:AttributeValue} Type: Value:cost_admin}]}
I0330 14:48:20.556776       1 costmodel.go:3421]   kubecost_group: {XMLName:{Space:urn:oasis:names:tc:SAML:2.0:assertion Local:Attribute} FriendlyName: Name:kubecost_group NameFormat:urn:oasis:names:tc:SAML:2.0:attrname-format:basic Values:[{XMLName:{Space:urn:oasis:names:tc:SAML:2.0:assertion Local:AttributeValue} Type: Value:kubecost_admin}]}
I0330 14:48:20.556788       1 log.go:47] [Info] Adding authorizations '[admin group:admin@kubecost.com]' for user
I0330 14:48:20.556802       1 log.go:47] [Info] Token expiration set to 2022-03-31 14:48:20.556796875 +0000 UTC m=+86652.635776798
I0330 14:48:20.589730       1 log.go:47] [Info] Login called
I0330 14:48:20.619630       1 log.go:47] [Info] Attempting to authenticate saml...
I0330 14:48:20.619839       1 costmodel.go:813] Authenticated saml
I0330 14:48:20.702125       1 log.go:47] [Info] Attempting to authenticate saml...
I0330 14:48:20.702229       1 costmodel.go:813] Authenticated saml
...
I0330 14:48:21.011787       1 auth.go:167] AUDIENCE: [admin group:admin@kubecost.com]

User Management (OIDC)

OIDC and RBAC are only officially supported on Kubecost Enterprise plans.

Overview of features

The OIDC integration in Kubecost is fulfilled via the .Values.oidc configuration parameters in the Helm chart.

# EXAMPLE CONFIGURATION
# View setup guides below, for full list of Helm configuration values
oidc:
  enabled: true
  useIDToken: false # Set to 'true' for IdP's that use an 'id_token' cookie
  clientID: ""
  clientSecret: ""
  secretName: "kubecost-oidc-secret"
  authURL: "https://my.auth.server/authorize"
  loginRedirectURL: "http://my.kubecost.url/model/oidc/authorize"
  discoveryURL: "https://my.auth.server/.well-known/openid-configuration"
  skipOnlineTokenValidation: false # Set to 'true' to skip online token validation and attempt to locally validate JWT claims
  rbac:
    enabled: false
    groups:
      - name: admin
        enabled: false
        claimName: "roles"
        claimValues:
          - "admin"
          - "superusers"
      - name: readonly
        enabled: false
        claimName:  "roles"
        claimValues:
          - "readonly"

authURL may require additional request parameters depending on the provider. Some commonly required parameters are client_id=*** and response_type=code. Please check the provider documentation for more information.

Setup guides

Microsoft Entra ID (formerly Azure AD) guide
Configure Keycloak Identity Provider for Kubecost
Gluu Server with OIDC Configuration Guide

Supported identity providers

Please refer to the following references to find out more about how to configure the Helm parameters to suit each OIDC identity provider integration.

Auth0 does not support Introspection; therefore we can only validate the access token by calling /userinfo within our current remote token validation flow. This will cause the Kubecost UI to not function under an Auth0 integration, as it makes a large number of continuous calls to load the various components on the page and the Auth0 /userinfo endpoint is rate limited. Independent calls against Kubecost endpoints (eg. via cURL or Postman) should still be supported.

Token validation

Once the Kubecost application has been successfully integrated with OIDC, we will expect requests to Kubecost endpoints to contain the JWT access token, either:

As a cookie named token,
As a cookie named id_token (Set .Values.oidc.useIDToken = true),
Or as part of the Authorization header Bearer token

The token is then validated remotely in one of two ways:

POST request to Introspect URL configured by identity provider
If no Introspect URL configured, GET request to /userinfo configured by identity provider

If skipOnlineTokenValidation is set to true, Kubecost will skip accessing the OIDC introspection endpoint for online token validation and will instead attempt to locally validate the JWT claims.

Setting skipOnlineTokenValidation to true will prevent tokens from being manually revoked.

Hosted domain

This parameter is only supported if using the Google OAuth 2.0 identity provider

If the hostedDomain parameter is configured in the Helm chart, the application will deny access to users for which the identified domain is not equal to the specified domain. The domain is read from the hd claim in the ID token commonly returned alongside the access token.

If the domain is configured alongside the access token, then requests should contain the JWT ID token, either:

As a cookie named id_token
As part of an Identification header

The JWT ID token must contain a field (claim) named hd with the desired domain value. We verify that the token has been properly signed (using provider certificates) and has not expired before processing the claim.

To remove a previously set Helm value, you will need to set the value to an empty string: .Values.oidc.hostedDomain = "". To validate that the config has been removed, you can check the /var/configs/oidc/oidc.json inside the cost-model container.

Read-only mode

Kubecost's OIDC supports read-only mode. This leverages OIDC for authentication, then assigns all authenticated users as read-only users.

oidc:
  enabled: true
readonly: true

Troubleshooting

Option 1: Inspect all network requests made by browser

Use your browser's devtools to observe network requests made between you, your Identity Provider, and your Kubecost. Pay close attention to cookies, and headers.

Option 2: Review logs, and decode your JWT tokens

kubectl logs deploy/kubecost-cost-analyzer

Search for oidc in your logs to follow events
Pay attention to any WRN related to OIDC
Search for Token Response, and try decoding both the access_token and id_token to ensure they are well formed (https://jwt.io/)

Option 3: Enable debug logs for more granularity on what is failing

Code reference for the below example can be found here.

kubecostModel:
  extraEnv:
    - name: LOG_LEVEL
      value: debug

For further assistance, reach out to support@kubecost.com and provide both logs and a HAR file.

Microsoft Entra ID OIDC Integration for Kubecost

OIDC is only officially supported on Kubecost Enterprise plans.

This guide will take you through configuring OIDC for Kubecost using a Microsoft Entra ID (formerly Azure AD) integration for SSO and RBAC.

Prerequisites

Before following this guide, ensure that:

Kubecost is already installed
Kubecost is accessible via a TLS-enabled ingress
You are established as a Cloud Application Administrator in Microsoft. This may otherwise prevent you from accessing certain features required in this tutorial.

Entra ID OIDC configuration

Step 1: Registering your application in Entra ID

In the Microsoft Entra admin center, select Microsoft Entra ID (Azure AD).
In the left navigation, select Applications > App registrations. Then, on the App registrations page, select New registration.
Select an appropriate name, and provide supported account types for your app.
To configure Redirect URI, select Web from the dropdown, then provide the URI as https://{your-kubecost-address}/model/oidc/authorize.
Select Register at the bottom of the page to finalize your changes.

Step 2: Configuring values.yaml

After creating your application, you should be taken directly to the app's Overview page. If not, return to the App registrations page, then select the application you just created.
On the Overview page for your application, obtain the Application (client) ID and the Directory (tenant) ID. These will be needed in a later step.
Next to 'Client credentials', select Add a certificate or secret. The 'Certificates & secrets' page opens.
Select New client secret. Provide a description and expiration time, then select Add.
Obtain the value created with your secret.
Add the three saved values, as well as any other values required relating to your Kubecost/Microsoft account details, into the following values.yaml template:

# values.yaml
oidc:
  enabled: true
  useIDToken: true
  clientID: "{APPLICATION_CLIENT_ID}"
  clientSecret: "{CLIENT_CREDENTIALS} > {SECRET_VALUE}"
  secretName: "kubecost-oidc-secret"
  authURL: "https://login.microsoftonline.com/{YOUR_TENANT_ID}/oauth2/v2.0/authorize?client_id={YOUR_CLIENT_ID}&response_type=code&scope=openid&nonce=123456"
  loginRedirectURL: "https://{YOUR_KUBECOST_DOMAIN}/model/oidc/authorize"
  discoveryURL: "https://login.microsoftonline.com/{YOUR_TENANT_ID}/v2.0/.well-known/openid-configuration"

If you are using one Entra ID app to authenticate multiple Kubecost endpoints, you must to pass an additional redirect_uri parameter in your authURL, which will include the URI you configured in Step 1.4. Otherwise, Entra ID may redirect to an incorrect endpoint. You can read more about this in Microsoft Entra ID's troubleshooting docs. View the example below to see how you should format your URI:

  authURL: "https://login.microsoftonline.com/{YOUR_TENANT_ID}/oauth2/v2.0/authorize?client_id={YOUR_CLIENT_ID}&response_type=code&scope=openid&nonce=123456&redirect_uri=https%3A%2F%2F{YOUR_KUBECOST_DOMAIN}/model/oidc/authorize"

Step 3 (optional): Configuring RBAC

First, you need to configure an admin role for your app. For more information on this step, see Microsoft's documentation.

Return to the Overview page for the application you created in Step 1.
Select App roles > Create app role. Provide the following values:

Display name: admin
Allowed member types: Users/Groups
Value: admin
Description: Admins have read/write permissions via the Kubecost frontend (or provide a custom description as needed)
Do you want to enable this app role?: Select the checkbox

Select Apply.

Then, you need to attach the role you just created to users and groups.

In the Azure AD left navigation, select Applications > Enterprise applications. Select the application you created in Step 1.
Select Users & groups.
Select Add user/group. Select the desired group. Select the admin role you created, or another relevant role. Then, select Assign to finalize changes.
Update your existing values.yaml with this template:

oidc:
  enabled: true
  # THIS IS REQUIRED FOR AZURE. Azure communicates roles via the id_token instead of the access_token.
  useIDToken: true
  rbac:
    enabled: true
    groups:
      - name: admin
        # If admin is disabled, all authenticated users will be able to make configuration changes to the kubecost frontend
        enabled: true
        # SET THIS EXACT VALUE FOR ENTRA ID. This is the string Entra ID uses in its OIDC tokens.
        claimName: "roles"
        # These strings need to exactly match with the app roles created in Entra ID
        claimValues:
          - "admins"
          - "superusers"
      - name: readonly
        # If readonly is disabled, all authenticated users will default to readonly
        enabled: true
        claimName: "roles"
        claimValues:
          - "readonly"

Troubleshooting

Option 1: Inspect all network requests made by browser

Use your browser's devtools to observe network requests made between you, your Identity Provider, and Kubecost. Pay close attention to cookies and headers.

Option 2: Review logs, and decode your JWT tokens

Run the following command:

kubectl logs deploy/kubecost-cost-analyzer

Search for oidc in your logs to follow events. Pay attention to any WRN related to OIDC. Search for Token Response, and try decoding both the access_token and id_token to ensure they are well formed. Learn more about JSON web tokens.

Option 3: Enable debug logs for more granularity on what is failing

You can find more details on these flags in Kubecost's cost-analyzer-helm-chart repo README.

kubecostModel:
  extraEnv:
    - name: LOG_LEVEL
      value: debug

Configure Keycloak Identity Provider for Kubecost

Create a new Keycloak Realm.
Navigate to Realm Settings > General > Endpoints > OpenID Endpoint Configuration > Clients.
Select Create to add Kubecost to the list of clients. Define a clientID. Ensure the Client Protocol is set to openid-connect.
Select your newly created client, then go to Settings.
1. Set Access Type to confidential.
2. Set Valid Redirect URIs to http://YOUR_KUBECOST_ADDRESS/model/oidc/authorize.
3. Set Base URL to http://YOUR_KUBECOST_ADDRESS.

The .Values.oidc for Keycloak should be as follows:

oidc:
  enabled: true
  # This should be the same as the `clientID` set in step 3 above
  clientID: "YOUR_CLIENT_ID"
  # Find this in Keycloak UI by going to your Kubecost client, then clicking on "Credentials".
  clientSecret: "YOUR_CLIENT_SECRET"
  # The k8s secret where clientSecret will be stored
  secretName: "kubecost-oidc-secret"
  # The login endpoint for the auth server
  authURL: "http://YOUR_KEYCLOAK_ADDRES/realms/YOUR_REALM_ID/protocol/openid-connect/auth?client_id=YOUR_CLIENT_ID&response_type=code"
  # Redirect after authentication
  loginRedirectURL: "http://YOUR_KUBECOST_ADDRESS/model/oidc/authorize"
  # Navigate to "Realm Settings" -> "General" -> "Endpoints" -> "OpenID Endpoint Configuration". Set to the discovery URL shown on this page.
  discoveryURL: "YOUR_DISCOVERY_URL"

Gluu Server with OIDC Configuration Guide

Gluu is an open-source Identity and Access Management (IAM) platform that can be used to authenticate and authorize users for applications and services. It can be configured to use the OpenID Connect (OIDC) protocol, which is an authentication layer built on top of OAuth 2.0 that allows applications to verify the identity of users and obtain basic profile information about them.

To configure a Gluu server with OIDC, you will need to install and set up the Gluu server software on a suitable host machine. This will typically involve performing the following steps:

Install the necessary dependencies and packages.
Download and extract the Gluu server software package.
Run the installation script to set up the Gluu server.
Configure the Gluu server by modifying the /etc/gluu/conf/gluu.properties file and setting the values for various properties, such as the hostname, LDAP bind password, and OAuth keys.
Start the Gluu server by running the /etc/init.d/gluu-serverd start command.
You can read Gluu's own documentation for more detailed help with these steps.
Note: Later versions of Gluu Server also support deployment to Kubernetes environments. You can read more about their Kubernetes support here.
Once the Gluu server is up and running, you can connect it to a Kubecost cluster by performing the following steps:
Obtain the OIDC client ID and client secret for the Gluu server. These can be found in the /etc/gluu/conf/gluu.properties file under the oxAuthClientId and oxAuthClientPassword properties, respectively.
In the Kubecost cluster, create a new OIDC identity provider by running kubectl apply -f oidc-provider.yaml command, where oidc-provider.yaml is a configuration file that specifies the OIDC client ID and client secret, as well as the issuer URL and authorization and token endpoints for the Gluu server.
In this file, you will need to replace the following placeholders with the appropriate values:
- <OIDC_CLIENT_ID>: The OIDC client ID for the Gluu server. This can be found in the /etc/gluu/conf/gluu.properties file under the oxAuthClientId property.
- <OIDC_CLIENT_SECRET>: The OIDC client secret for the Gluu server. This can be found in the /etc/gluu/conf/gluu.properties file under the oxAuthClientPassword property.
- <GLUU_SERVER_HOSTNAME>: The hostname of the Gluu server.
- <BASE64_ENCODED_OIDC_CLIENT_ID>: The OIDC client ID, encoded in base64.
- <BASE64_ENCODED_OIDC_CLIENT_SECRET>: The OIDC client secret, encoded in base64.
Set up a Kubernetes service account and bind it to the OIDC identity provider. This can be done by running the kubectl apply -f service-account.yaml command, where service-account.yaml is a configuration file that specifies the name of the service account and the OIDC identity provider.
In this file, you will need to replace the following placeholders with the appropriate values:
- <SERVICE_ACCOUNT_NAME>: The name of the service account. This can be any name that you choose.
- <GLUU_SERVER_HOSTNAME>: The hostname of the Gluu server.
- <OIDC_CLIENT_ID>: The OIDC client ID for the Gluu server. This can be found in the /etc/gluu/conf/gluu.properties file under the oxAuthClientId property.
Note: You should also ensure that the kubernetes.io/oidc-issuer-url, kubernetes.io/oidc-client-id, kubernetes.io/oidc-username-claim, and kubernetes.io/oidc-groups-claim annotations are set to the correct values for your Gluu server and configuration. These annotations specify the issuer URL and client ID for the OIDC identity provider, as well as the claims to use for the username and group membership of authenticated users.

Once these steps are completed, the Gluu server should be configured to use OIDC and connected to the Kubecost cluster, allowing users to authenticate and authorize themselves using their Gluu credentials.

Tuning Resource Consumption

Kubecost can run on clusters with thousands of nodes when resource consumption is properly tuned. Here's a chart with some of the steps you can take to tune Kubecost, along with descriptions of each.

Disable CloudCost on secondary clusters

Cloud cost metrics for all accounts can be pulled in on your primary cluster by pointing Kubecost to one or more management accounts. Therefore, you can disable CloudCost on secondary clusters by setting the following Helm value:

-- cloudCost.enabled=false

Exclude provider IDs in Cloud Assets

This method is only available for AWS cloud billing integrations. Kubecost is capable of tracking each individual cloud billing line item. However on certain accounts this can be quite large. If provider IDs are excluded, Kubecost won't cache granular data. Instead, Kubecost caches aggregate data and make an ad-hoc query to the AWS Cost and Usage Report to get granular data resulting in slow load times but less memory consumption.

Lower query concurrency

--set kubecostModel.maxQueryConcurrency=1

Lower query duration

--set kubecostModel.maxPrometheusQueryDurationMinutes=300

Lower query resolution

Lowering query resolution will reduce memory consumption but will cause short running pods to be sampled and rounded to the nearest interval for their runtime. The default value is: 300s. This can be tuned with the Helm value:

--set kubecostModel.etlResolutionSeconds=600

Lengthen scrape interval

Disable or stop scraping node exporter

--set prometheus.server.nodeExporter.enabled=false
--set prometheus.serviceAccounts.nodeExporter.create=false

Soft memory limit field

Optionally enabling impactful memory thresholds can ensure the Go runtime garbage collector throttles at more aggressive frequencies at or approaching the soft limit. There is not a one-size fits all value here, and users looking to tune the parameters should be aware that lower values may reduce overall performance if setting the value too low. If users set the the resources.requests memory values appropriately, using the same value for softMemoryLimit will instruct the Go runtime to keep its heap acquisition and release within the same bounds as the expectations of the pod memory use. This can be tuned with the Helm value:

--set kubecostModel.softMemoryLimit=<Units><B, KiB, MiB, GiB>

Deploying Kubecost Staging Builds

Staging builds for the Kubecost Helm Chart are produced at least daily before changes are moved to production. To upgrade an existing Kubecost Helm Chart deployment to the latest staging build, follow these quick steps:

Add the with the following command:

helm repo add kubecoststagingrepo https://kubecost.github.io/staging-repo/

Upgrade Kubecost to use the staging repo:

 helm upgrade kubecost kubecoststagingrepo/cost-analyzer -n kubecost

Cluster Controller

The Cluster Controller is currently in beta. Please read the documentation carefully.

Kubecost's Cluster Controller allows you to access additional Savings features through automated processes. To function, the Cluster Controller requires write permission to certain resources on your cluster, and for this reason, the Cluster Controller is disabled by default.

The Cluster Controller enables features like:

Feature functionality

The Cluster Controller can be enabled on any cluster type, but certain functionality will only be enabled based on the cloud service provider (CSP) of the cluster and its type:

The Cluster Controller can only be enabled on your primary cluster.
The Controller itself and container RRS are available for all cluster types and configurations.
Cluster turndown, cluster right-sizing, and Kubecost Actions are only available for GKE, EKS, and Kops-on-AWS clusters, after setting up a provider service key.

Therefore, the 'Provider service key setup' section below is optional depending on your cluster environment, but will limit functionality if you choose to skip it. Read the caution banner in the below section for more details.

Provider service key setup

If you are enabling the Cluster Controller for a GKE/EKS/Kops AWS cluster, follow the specialized instructions for your CSP(s) below. If you aren't using a GKE/EKS Kops AWS cluster, skip ahead to the section below.

GKE setup

/bin/bash -c "$(curl -fsSL https://github.com/kubecost/cluster-turndown/releases/latest/download/gke-create-service-key.sh)" -- <Project ID> <Service Account Name> <Namespace> cluster-controller-service-key

Project ID: The GCP project identifier. Can be found via: gcloud config get-value project
Namespace: The namespace which Kubecost will be installed, e.g kubecost
Service Account Name: The name of the service account to be created. Should be between 6 and 20 characters, e.g. kubecost-controller
Secret Name: The Kubecost will automatically look for a secret called cluster-controller-service-key. This can be changed by setting .Values.clusterController.secretName.

EKS setup

Create a new User with AutoScalingFullAccess permissions, plus the following EKS-specific permissions:

{
    "Effect": "Allow",
    "Action": [
        "eks:ListClusters",
        "eks:DescribeCluster",
        "eks:DescribeNodegroup",
        "eks:ListNodegroups",
        "eks:CreateNodegroup",
        "eks:UpdateClusterConfig",
        "eks:UpdateNodegroupConfig",
        "eks:DeleteNodegroup",
        "eks:ListTagsForResource",
        "eks:TagResource",
        "eks:UntagResource"
    ],
    "Resource": "*"
},
{
    "Effect": "Allow",
    "Action": [
        "iam:GetRole",
        "iam:ListAttachedRolePolicies",
        "iam:PassRole"
    ],
    "Resource": "*"
}

Create a new file, service-key.json, and use the access key ID and secret access key to fill out the following template:

{
    "aws_access_key_id": "<ACCESS_KEY_ID>",
    "aws_secret_access_key": "<SECRET_ACCESS_KEY>"
}

Then, run the following to create the secret:

$ kubectl create secret generic cluster-controller-service-key -n <NAMESPACE> --from-file=service-key.json

Here is a full example of this process using the AWS CLI and a simple IAM user (requires jq):

NEW_IAM_USER
aws iam create-user \
    --user-name $NEW_IAM_USER

aws iam attach-user-policy \
    --user-name $NEW_IAM_USER \
    --policy-arn arn:aws:iam::aws:policy/AutoScalingFullAccess

read -r -d '' EKSPOLICY << EOM
{
    "Version": "2012-10-17",
    "Statement": [
    {
        "Effect": "Allow",
        "Action": [
            "eks:ListClusters",
            "eks:DescribeCluster",
            "eks:DescribeNodegroup",
            "eks:ListNodegroups",
            "eks:CreateNodegroup",
            "eks:UpdateClusterConfig",
            "eks:UpdateNodegroupConfig",
            "eks:DeleteNodegroup",
            "eks:ListTagsForResource",
            "eks:TagResource",
            "eks:UntagResource"
        ],
        "Resource": "*"
    },
    {
        "Effect": "Allow",
        "Action": [
            "iam:GetRole",
            "iam:ListAttachedRolePolicies",
            "iam:PassRole"
        ],
        "Resource": "*"
    }
    ]
}
EOM

aws iam put-user-policy \
    --user-name $NEW_IAM_USER \
    --policy-name "eks-permissions" \
    --policy-document "${EKSPOLICY}"

aws iam create-access-key \
    --user-name $NEW_IAM_USER --output json \
    > /tmp/aws-key.json

AAKI="$(jq -r '.AccessKey.AccessKeyId' /tmp/aws-key.json)"
ASAK="$(jq -r '.AccessKey.SecretAccessKey' /tmp/aws-key.json)"
kubectl create secret generic \
    cluster-controller-service-key \
    -n kubecost \
    --from-literal="service-key.json={\"aws_access_key_id\": \"${AAKI}\", \"aws_secret_access_key\": \"${ASAK}\"}"

Kops-on-AWS setup

Create a new user or IAM role with AutoScalingFullAccess permissions. JSON definition of those permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "autoscaling:*",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "cloudwatch:PutMetricAlarm",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeAccountAttributes",
                "ec2:DescribeAvailabilityZones",
                "ec2:DescribeImages",
                "ec2:DescribeInstanceAttribute",
                "ec2:DescribeInstances",
                "ec2:DescribeKeyPairs",
                "ec2:DescribeLaunchTemplateVersions",
                "ec2:DescribePlacementGroups",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSpotInstanceRequests",
                "ec2:DescribeSubnets",
                "ec2:DescribeVpcClassicLink"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "elasticloadbalancing:DescribeLoadBalancers",
                "elasticloadbalancing:DescribeTargetGroups"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "iam:CreateServiceLinkedRole",
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "iam:AWSServiceName": "autoscaling.amazonaws.com"
                }
            }
        }
    ]
}

Create a new file, service-key.json, and use the access key ID and secret access key to fill out the following template:

{
    "aws_access_key_id": "<ACCESS_KEY_ID>",
    "aws_secret_access_key": "<SECRET_ACCESS_KEY>"
}

Then run the following to create the secret:

$ kubectl create secret generic cluster-controller-service-key -n <NAMESPACE> --from-file=service-key.json

Deploying

You can now enable the Cluster Controller in the Helm chart by finding the clusterController Helm flag and setting enabled: true

clusterController:
  enabled: true

You may also enable via --set when running Helm install:

--set clusterController.enabled=true

Verify the Cluster Controller is running

You can verify that the Cluster Controller is running by issuing the following:

kubectl get pods -n kubecost -l app=kubecost-cluster-controller

Once the Cluster Controller has been enabled successfully, you should automatically have access to the listed Savings features.

Cluster Turndown

Cluster turndown is currently in beta. Please read the documentation carefully.

Cluster turndown is an automated scale down and scaleup of a Kubernetes cluster's backing nodes based on a custom schedule and turndown criteria. This feature can be used to reduce spend during down hours and/or reduce surface area for security reasons. The most common use case is to scale non-production (prod) environments (e.g. development (dev) clusters) to zero during off hours.

If you are upgrading from a pre-1.94 version of the Kubecost Helm chart, you will have to migrate your custom resources. turndownschedules.kubecost.k8s.io has been changed to turndownschedules.kubecost.com and finalizers.kubecost.k8s.io has been changed to finalizers.kubecost.com. See the TurndownSchedule Migration Guide for an explanation.

How it works

Cluster turndown is only available for clusters on GKE, EKS, or Kops-on-AWS.

Managed cluster strategy (e.g. GKE + EKS)

When the turndown schedule occurs, a new node pool with a single g1-small node is created. Taints are added to this node to only allow specific pods to be scheduled there. The cluster-turndown pod deployment is updated so the pod is allowed to schedule on the singleton node. Once the pod is moved to the new node, it will start back up and resume scale down. This is done by cordoning all nodes in the cluster (other than our new g1-small node), and then reducing the node pool sizes to 0.

GKE autoscaler strategy

Whenever there exists at least one NodePool with the cluster-autoscaler enabled, the cluster-turndown pod will:

Resize all non-autoscaling nodepools to 0
Schedule the turndown on one of the autoscaler nodepool nodes
Once it is brought back up (rescheduled to the selected node), the turndown pod will start a process called "flattening" which attempts to set deployment replicas to 0, turn off jobs, and annotate pods with labels that allow the autoscaler to do the rest of the work. Flattening persists pre-turndown values in the annotations of Kubernetes objects. The GKE autoscaler behavior is expected to handle the rest: removing now-unneeded nodes from the node pools. A limitation of this strategy is that the autoscaled node pools won't go below their configured minimum node count.
When turn up occurs, deployments and DaemonSets are "expanded" to their original sizes/replicas.

There are four annotations that can be applied for this process:

kubecost.kubernetes.io/job-suspend: Stores a bool containing the previous paused state of a kubernetes CronJob.
kubecost.kubernetes.io/turn-down-replicas: Stores the previous number of replicas set on the deployment.
kubecost.kubernetes.io/turn-down-rollout: Stores the previous maxUnavailable for the deployment rollout.
kubecost.kubernetes.io/safe-evict: Uses the cluster-autoscaler.kubernetes.io/safe-to-evict for autoscaling clusters to have the autoscaler preserve any deployments that previously had this annotation set, so scale up occurs, this value isn't unintentionally reset.

AWS Kops strategy

This turndown strategy schedules the cluster-turndown pod on the Master node, then resizes all Auto Scaling Groups (ASG) other than the master to 0. Similar to flattening in GKE (see above), the previous min/max/current values of the ASG prior to turndown will be set on the tag. When turn up occurs, those values can be read from the tags and restored to their original sizes. For the standard strategy, turn up will reschedule the turndown pod off the Master upon completion (occurs 5 minutes after turn up). This is to allow any modifications via Kops without resetting any cluster specific scheduling setup by turndown. The tag label used to store the min/max/current values for a node group is cluster.turndown.previous. Once turn up happens and the node groups are resized to their original size, the tag is deleted.

Setup

Prerequisites

kubectl
Enable the Cluster Controller

You will receive full turndown functionality once the Cluster Controller is enabled via a provider service key setup and Helm upgrade. Review the Cluster Controller doc linked above under Prerequisites for more information, then return here when you've confirmed the Cluster Controller is running.

Verify the pod is running

You can verify that the cluster-turndown pod is running with the following command:

$ kubectl get pods -l app=cluster-turndown -n turndown

Setting a turndown schedule

Turndown uses a Kubernetes Custom Resource Definition to create schedules. Here is an example resource located at artifacts/example-schedule.yaml:

apiVersion: kubecost.com/v1alpha1
kind: TurndownSchedule
metadata:
  name: example-schedule
  finalizers:
  - "finalizer.kubecost.com"
spec:
  start: 2020-03-12T00:00:00Z
  end: 2020-03-12T12:00:00Z
  repeat: daily

This definition will create a schedule that starts by turning down at the designated start date-time and turning back up at the designated end date-time. Both the start and end times should be in RFC3339 format, i.e. times based on offsets to UTC. There are three possible values for repeat:

none: Single schedule turndown and turnup.
daily: Start and end times will reschedule every 24 hours.
weekly: Start and end times will reschedule every 7 days.

To create this schedule, you may modify example-schedule.yaml to your desired schedule and run:

$ kubectl apply -f artifacts/example-schedule.yaml

Currently, updating a resource is not supported, so if the scheduling of the example-schedule.yaml fails, you will need to delete the resource via:

$ kubectl delete tds example-schedule

Then make the modifications to the schedule and re-apply.

Viewing a turndown schedule

The turndownschedule resource can be listed via kubectl as well:

$ kubectl get turndownschedules

or using the shorthand:

$ kubectl get tds

Details regarding the status of the turndown schedule can be found by outputting as a JSON or YAML:

$ kubectl get tds example-schedule -o yaml

apiVersion: kubecost.com/v1alpha1
kind: TurndownSchedule
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"kubecost.com/v1alpha1","kind":"TurndownSchedule","metadata":{"annotations":{},"finalizers":["finalizer.kubecost.com"],"name":"example-schedule"},"spec":{"end":"2020-03-17T00:35:00Z","repeat":"daily","start":"2020-03-17T00:20:00Z"}}
  creationTimestamp: "2020-03-17T00:18:39Z"
  finalizers:
  - finalizer.kubecost.com
  generation: 1
  name: example-schedule
  resourceVersion: "33573"
  selfLink: /apis/kubecost.com/v1alpha1/turndownschedules/example-schedule
  uid: d9b16aed-67e4-11ea-b591-42010a8e0075
spec:
  end: "2020-03-17T00:35:00Z"
  repeat: daily
  start: "2020-03-17T00:20:00Z"
status:
  current: scaledown
  lastUpdated: "2020-03-17T00:36:39Z"
  nextScaleDownTime: "2020-03-18T00:21:38Z"
  nextScaleUpTime: "2020-03-18T00:36:38Z"
  scaleDownId: 38ebf595-4e2b-46e9-951a-1e3ceff30536
  scaleDownMetadata:
    repeat: daily
    type: scaledown
  scaleUpID: 869ec89f-a8d8-450b-9ebb-71cd4d7fbaf8
  scaleUpMetadata:
    repeat: daily
    type: scaleup
  state: ScheduleSuccess

The status field displays the current status of the schedule including next schedule times, specific schedule identifiers, and the overall state of schedule.

state: The state of the turndown schedule. This can be:
- ScheduleSuccess: The schedule has been set and is waiting to run.
- ScheduleFailed: The scheduling failed due to a schedule already existing, scheduling for a date-time in the past.
- ScheduleCompleted: For schedules with repeat: none, the schedule will move to a completed state after turn up.
current: The next action to run.
lastUpdated: The last time the status was updated on the schedule.
nextScaleDownTime: The next time a turndown will be executed.
nextScaleUpTime: The next time at turn up will be executed.
scaleDownId: Specific identifier assigned by the internal scheduler for turndown.
scaleUpId: Specific identifier assigned by the internal scheduler for turn up.
scaleDownMetadata: Metadata attached to the scaledown job, assigned by the turndown scheduler.
scaleUpMetadata: Metadata attached to the scale up job, assigned by the turndown scheduler.

Canceling a turndown schedule

A turndown can be canceled before turndown actually happens or after. This is performed by deleting the resource:

$ kubectl delete tds example-schedule

Canceling while turndown is currently scaling down or scaling up will result in a delayed cancellation, as the schedule must complete its operation before processing the deletion/cancellation.

If the turndown schedule is canceled between a turndown and turn up, the turn up will occur automatically upon cancellation.

Using cluster turndown via UI

Cluster turndown has limited functionality via the Kubecost UI. To access cluster turndown in the UI, you must first enable Kubecost Actions. Once this is completed, you will be able to create and delete turndown schedules instantaneously for your supported clusters. Read more about turndown's UI functionality in this section of the above Kubecost Actions doc. Review the entire doc for more information on Kubecost Actions functionality and limitations.

Limitations

The internal scheduler only allows one schedule at a time to be used. Any additional schedule resources created will fail (kubectl get tds -o yaml will display the status).
Do not attempt to kubectl edit a turndown schedule. This is currently not supported. Recommended approach for modifying is to delete and then create a new schedule.
There is a 20-minute minimum time window between start and end of turndown schedule.

Kubescaler

This feature is in currently in alpha. Please read the documentation carefully.

Kubecost's Kubescaler implements continuous request right-sizing: the automatic application of Kubecost's high-fidelity to your containers' resource requests. This provides an easy way to automatically improve your allocation of cluster resources by improving efficiency.

Kubescaler can be enabled and configured on a per-workload basis so that only the workloads you want edited will be edited.

Setup

Kubescaler is part of , and should be configured after the Cluster Controller is enabled.

Usage

Kubescaler is configured on a workload-by-workload basis via annotations. Currently, only deployment workloads are supported.

Annotation

Description

Example(s)

Notable Helm values:

Helm value

Description

Example(s)

Supported workload types

Kubescaler supports:

apps/v1 Deployments
apps/v1 DaemonSets
batch/v1 CronJobs (K8s v1.21+). No attempt will be made to autoscale a CronJob until it has run at least once.

Kubescaler cannot support:

Example

export NS="kubecost"
export DEP="kubecost-cost-analyzer"
export AN_ENABLE="request.autoscaling.kubecost.com/enabled=true"
export AN_FREQ="request.autoscaling.kubecost.com/frequencyMinutes=660"
export AN_TCPU="cpu.request.autoscaling.kubecost.com/targetUtilization=0.9"
export AN_TMEM="memory.request.autoscaling.kubecost.com/targetUtilization=0.9"
export AN_WINDOW="request.autoscaling.kubecost.com/recommendationQueryWindow=3d"

kubectl annotate -n "${NS}" deployment "${DEP}" "${AN_ENABLE}"
kubectl annotate -n "${NS}" deployment "${DEP}" "${AN_FREQ}"
kubectl annotate -n "${NS}" deployment "${DEP}" "${AN_TCPU}"
kubectl annotate -n "${NS}" deployment "${DEP}" "${AN_TMEM}"
kubectl annotate -n "${NS}" deployment "${DEP}" "${AN_WINDOW}"

Kubescaler will take care of the rest. It will apply the best-available recommended requests to the annotated controller every 11 hours. If the recommended requests exceed the current limits, the update is currently configured to set the request to the current limit.

To check current requests for your Deployments, use the following command:

kubectl get deployment -n "kubecost" -o=jsonpath="{range .items[*]}"deployment/"{.metadata.name}{'\n'}{range .spec.template.spec.containers[*]}{.name}{'\t'}{.resources.requests}{'\n'}{end}{'\n'}{end}"

TurndownSchedule Migration Guide

In v1.94 of Kubecost, the turndownschedules.kubecost.k8s.io/v1alpha1 Custom Resource Definition (CRD) was to turndownschedules.kubecost.com/v1alpha1 to adhere to . This is a breaking change for users of Cluster Controller's turndown functionality. Please follow this guide for a successful migration of your turndown schedule resources.

Note: As part of this change, the CRD was updated to use apiextensions.k8s.io/v1 because v1beta1 was removed in K8s v1.22. If using Kubecost v1.94+, Cluster Controller's turndown functionality will not work on K8s versions before the introduction of apiextensions.k8s.io/v1.

Scenario 1: You have deployed Cluster Controller but don't use turndown

In this situation, you've deployed Kubecost's Cluster Controller at some point using --set clusterController.enabled=true, but you don't use the turndown functionality.

That means that this command should return one line:

kubectl get crd turndownschedules.kubecost.k8s.io

And this command should return no resources:

kubectl get turndownschedules.kubecost.k8s.io

This situation is easy! You can do nothing, and turndown should continue to behave correctly because kubectl get turndownschedule and related commands will correctly default to the new turndownschedules.kubecost.com/v1alpha1 CRD after you upgrade to Kubecost v1.94 or higher.

If you would like to be fastidious and clean up the old CRD, simply run kubectl delete crd turndownschedules.kubecost.k8s.io after upgrading Kubecost to v1.94 or higher.

Scenario 2: You currently use turndown

In this situation, you've deployed Kubecost's Cluster Controller at some point using --set clusterController.enabled=true and you have at least one turndownschedule.kubecost.k8s.io resource currently present in your cluster.

That means that this command should return one line:

kubectl get crd turndownschedules.kubecost.k8s.io

And this command should return at least one resource:

kubectl get turndownschedules.kubecost.k8s.io

We have a few steps to perform if you want Cluster Controller's turndown functionality to continue to behave according to your already-defined turndown schedules.

Upgrade Kubecost to v1.94 or higher with --set clusterController.enabled=true
Make sure the new CRD has been defined after your Kubecost upgrade
This command should return a line:
```
kubectl get crd turndownschedules.kubecost.com
```

Copy your existing turndownschedules.kubecost.k8s.io resources into the new CRD

kubectl get turndownschedules.kubecost.k8s.io -o yaml \
    | sed 's|kubecost.k8s.io|kubecost.com|' \
    | kubectl apply -f -

(optional) Delete the old turndownschedules.kubecost.k8s.io CRD
```
kubectl patch \
    crd/turndownschedules.kubecost.k8s.io \
    -p '{"metadata":{"finalizers":[]}}' \
    --type=merge
```
Note: The following command may be unnecessary because Helm should automatically remove the turndownschedules.kubecost.k8s.io resource during the upgrade. The removal will remain in a pending state until the finalizer patch above is implemented.
```
kubectl delete crd turndownschedules.kubecost.k8s.io
```

High Availability Kubecost

High availability mode is only officially supported on Kubecost Enterprise plans.

Running Kubecost in high availability (HA) mode is a feature that relies on multiple Kubecost replica pods implementing the feature combined with a Leader/Follower implementation which ensures that there always exists exactly one leader across all replicas.

Leader + Follower

The Leader/Follower implementation leverages a coordination.k8s.io/v1 Lease resource to manage the election of a leader when necessary. To control access of the backup from the ETL pipelines, a RWStorageController is implemented to ensure the following:

Followers block on all backup reads, and poll bucket storage for any backup reads every 30 seconds.
Followers no-op on any backup writes.
Followers who receive Queries in a backup store will not stack on pending reads, preventing external queries from blocking.
Followers promoted to Leader will drop all locks and receive write privileges.
Leaders behave identically to a single Kubecost install.

Configuring high availability

In order to enable the leader/follower and HA features, the following must also be configured:

Replicas are set to a value greater than 1
ETL FileStore is Enabled (enabled by default)

For example, using our Helm chart, the following is an acceptable configuration:

helm install kubecost kubecost/cost-analyzer --namespace kubecost \
	--set kubecostDeployment.leaderFollower.enabled=true \ 
	--set kubecostDeployment.replicas=5 \
	--set kubecostModel.etlBucketConfigSecret=kubecost-bucket-secret

This can also be done in the values.yaml file within the chart:

kubecostModel:
  image: "gcr.io/kubecost1/cost-model"
  imagePullPolicy: Always
  # ... 
  # ETL should be enabled with etlFileStoreEnabled: true 
  etl: true
  etlFileStoreEnabled: true 
  # ...
  # ETL Bucket Backup should be configured by passing the configuration secret name
  etlBucketConfigSecret: kubecost-bucket-secret

# Used for HA mode in Enterprise tier
kubecostDeployment:
  # Select a number of replicas of Kubecost pods to run 
  replicas: 5
  # Enable Leader/Follower Election 
  leaderFollower:
    enabled: true

Availability Tiers

Availability Tiers impact capacity recommendations, health ratings and more in the Kubecost product. As an example, production jobs receive higher resource request recommendations than dev workloads. Another example is health scores for high availability workloads are heavily penalized for not having multiple replicas available.

Today our product supports the following tiers:

Tier

Priority

Default

To apply a namespace tier, add a tier namespace label to reflect the desired value.

CSV Pricing

This feature is only officially supported on Kubecost Enterprise plans.

The following steps allow Kubecost to use custom prices with a CSV pipeline. This feature allows for individual assets (e.g. nodes) to be supplied at unique prices. Common uses are for on-premise clusters, service-providers, or for external enterprise discounts.

Creating a pricing file

Create a CSV file in this (also in the below table). CSV changes are picked up hourly by default.
1. EndTimeStamp: currently unused
2. InstanceID: identifier used to match asset
3. Region: filter match based on topology.kubernetes.io/region
4. AssetClass: node pv, gpu are supported
5. InstanceIDField: field in spec or metadata that will contain the relevant InstanceID. For nodes, often spec.providerID , for pv’s often metadata.name
6. InstanceType: optional field to define the asset type, e.g. m5.12xlarge
7. MarketPriceHourly: hourly price to charge this asset
8. Version: field for schema version, currently unused

If the node label topology.kubernetes.io/region is present, it must also be in the Region column.

GPU pricing

This section is only required for nodes with GPUs.

The node the GPU is attached to must be matched by a CSV node price. Typically this will be matched on instance type (node.kubernetes.io/instance-type)
Supported GPU labels are currently:
- gpu.nvidia.com/class
- nvidia.com/gpu_type
Verification:
1. Connect to the Kubecost Prometheus: kubectl port-forward --namespace kubecost services/kubecost-cost-analyzer 9090:9090
2. Run the following query: curl localhost:9090/model/prometheusQuery?query=node_gpu_hourly_cost
  1. You should see output similar to this: {instance="ip-192-168-34-166.us-east-2.compute.internal",instance_type="test.xlarge",node="ip-192-168-34-166.us-east-2.compute.internal",provider_id="aws:///us-east-2b/i-055274d3576800444",region="us-east-2"} 10 | YOUR_HOURLY_COST

Kubecost configuration

pricingCsv:
  enabled: true
  location:
    provider: "AWS|GCP"
    region: "us-east-1"
    URI: s3://YOUR_BUCKET/path/custom-pricing.csv
    csvAccessCredentials: pricing-schema-access-secret

Alternatively, mount a ConfigMap with the CSV:

kubectl create configmap csv-pricing --from-file custom-pricing.csv

Then set the following Helm values:

pricingCsv:
  enabled: true
  location:
    URI: /var/kubecost-csv/custom-pricing.csv
    csvAccessCredentials: ""

extraVolumes:
- name: kubecost-csv
  configMap:
    name: csv-pricing

extraVolumeMounts:
- name: kubecost-csv
  mountPath: /var/kubecost-csv

For S3 locations, provide file access. Required IAM permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:Get*",
                "s3:List*"
            ],
            "Resource": [
                "arn:aws:s3:::<your-bucket-name>/*",
                "arn:aws:s3:::<your-bucket-name>"
            ]
        }
    ]
}

There are two options for adding the credentials to the Kubecost pod:

Service key: Create an S3 service key with the permissions above, then add its ID and access key as a K8s secret:
1. kubectl create secret generic pricing-schema-access-secret -n kubecost --from-literal=AWS_ACCESS_KEY_ID=id --from-literal=AWS_SECRET_ACCESS_KEY=key
2. The name of this secret should be the same as csvAccessCredentials in values.yaml above

Pricing discounts

Negotiated discounts are applied after cost metrics are written to Prometheus. Discounts will apply to all node pricing data, including pricing data read directly from the custom provider CSV pipeline. Additionally, all discounts can be updated at any time and changes are applied retroactively.

Pricing inference

The following logic is used to match node prices accurately:

First, search for an exact match in the CSV pipeline
If an exact match is not available, search for an existing CSV data point that matches region, instanceType, and AssetClass
If neither is available, fall back to pricing estimates

You can check a summary of the number of nodes that have matched with the CSV by visiting /model/pricingSourceCounts. The response is a JSON object of the form:

{
    "code": 200,
    "status": "success",
    "data": {
        "TotalNodes": 10,
        "PricingType": {
            "csvExact": 5, // exact matches by the providerID field
            "csvClass": 4, // matches where the region and instanceType match
            "": 1 // matches that use our default pricing
        }
    }
}

Windows Node Support

Kubecost can run on clusters with mixed Linux and Windows nodes. The Kubecost pods must run on a Linux node.

Deployment

When using a Helm install, this can be done simply with:

helm install kubecost \
--repo https://kubecost.github.io/cost-analyzer/ cost-analyzer \
--namespace kubecost --create-namespace \
-f https://raw.githubusercontent.com/kubecost/cost-analyzer-helm-chart/develop/cost-analyzer/values-windows-node-affinity.yaml

Detail

The cluster must have at least one Linux node for the Kubecost pods to run on:

Use a nodeSelector for all Kubecost deployments:

```
spec:
  nodeSelector:
    kubernetes.io/os: linux
  containers:
```

For DaemonSets, set the affinity to only allow scheduling on Windows nodes:

```
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/os
            operator: In
            values:
            - linux
```

Metrics

Collecting data about Windows nodes is supported by Kubecost as of v1.93.0.
Accurate node and pod data exists by default, since they come from the Kubernetes API.
Kubecost requires cAdvisor for pod utilization data to determine costs at the container level.
Currently, for pods on Windows nodes: pods will be billed based on request size.

Kubecost Data Status Metrics

Overview

The metrics listed below are emitted by Kubecost and scraped by Prometheus to help monitor the status of Kubecost data pipelines:

kubecost_allocation_data_status, which presents the active allocation data's time series status
kubecost_asset_data_status, which presents the time series status of the active asset data

These metrics provide data status through to proactively alert and analyze the allocation and asset data at a point in time.

Metric details

Allocation metrics

The metrics below depict the status of active allocation data at a point in time. The resolution is either daily or hourly, which aligns one-to-one with the data status of allocation daily and hourly store. Each hourly and daily stores have four types of status

Empty: Depicts the total number of empty allocationSet in each store hourly or daily at a point in time.
Error: Depicts the total number of errors in the allocationSet in each store hourly or daily at a point in time.
Success: Depicts the total number of successful allocationSet in each store hourly or daily at a point in time.
Warning: Depicts the total number of warnings in all allocationSet in each store hourly or daily at a point in time.

Asset metrics

The metrics below depict the status of active asset data at a point in time. The resolution is either daily or hourly, which aligns one-to-one with the data status of asset daily and hourly store. Each hourly and daily stores have four types of status

Empty: Depicts the total number of empty assetSet in each store hourly or daily at a point in time.
Error: Depicts the total number of errors in the assetSet in each store hourly or daily at a point in time.
Success: Depicts the total number of successful assetSet in each store hourly or daily at a point in time.
Warning: Depicts the total number of warnings in all assetSet in each store hourly or daily at a point in time.

Usage details

kubecost_asset_data_status is written to Prometheus during the assetSet and assetLoad events.
kubecost_allocation_data_status is written to Prometheus during the allocationSet and allocationLoad events.
During the cleanup operation, the corresponding entries of each allocation and asset are deleted to avoid the metrics having those particular entries having parity with respective allocation and asset stores.

Service Key Rotation

Cloud provider service keys can be used in various aspects of the Kubecost installation. This includes configuring , , and . While automated IAM authentication via a Kubernetes service account like AWS IRSA is recommended, there are some scenarios where key-based authentication is preferred. When this method is used, rotating the keys at a pre-defined interval is a security best practice. Combinations of these features can be used, and therefore you may need to follow one or more of the below steps.

Adding cloud provider keys

There are multiple methods for adding cloud provider keys to Kubecost when configuring a cloud integration. This article will cover all three procedures. Be sure to use the same method that was used during the initial installation of Kubecost when rotating keys. See the doc for additional details.

The preferred and most common is via the multi-cloud cloud-integration.json Kubernetes secret.
The second method is to define the appropriate secret in Kubecost's .
The final method to configure keys is via the Kubecost Settings page.

The primary sequence for setting up your key is:

Modify the appropriate Kubernetes secret, Helm value, or update via the Settings page.
Restart the Kubecost cost-analyzer pod.
Verify the new key is working correctly. Any authentication errors should be present early in the cost-model container logs from the cost-analyzer pod. Additionally, you can check the status of the cloud integration in the Kubecost UI via Settings > View Full Diagnostics.

Adding multi-cluster keys

There are two methods for enabling multi-clustering in Kubecost:

Depending on which method you are using, the key rotation process differs.

Federated-ETL

With Federated ETL objects, storage keys can be provided in two ways. The preferred method is using the secret defined by the Helm value .Values.kubecostModel.federatedStorageConfigSecret. The alternate method is to re-use the ETL backup secret defined with the .Values.kubecostModel.etlBucketConfigSecret Helm value.

Update the appropriate Kubernetes secret with the new key on each cluster.
Restart the Kubecost cost-analyzer pod.
Restart the Kubecost federator pod.
Verify the new key is working correctly by checking the cost-model container logs from the cost-analyzer pod for any object storage authentication errors. Additionally, verify there are no object storage errors in the federator pod logs.

Thanos

Update the kubecost-thanos Kubernetes secret with the new key on each cluster.
Restart the prometheus server pod installed with Kubecost on all clusters (including the primary cluster) that write data to the Thanos object store. This will ensure the Thanos sidecar has the new key.
On the primary Kubecost cluster, restart the thanos-store pod.
Verify the new key is working correctly by checking the thanos-sidecar logs in the prometheus server pods for authentication errors to ensure they are able to write new block data to the object storage.
Verify the new key is working correctly by checking thanos-store pod logs on the primary cluster for authentication errors to ensure it is able to read block data from the object storage.

ETL backup keys

Modify the appropriate Kubernetes secret.
Restart the Kubecost cost-analyzer pod.
Verify the backups are still being written to the object storage.