1 of 3

ETL Backup

Kubecost's extract, transform, load (ETL) data is a computed cache based on Prometheus's metrics, from which the user can perform all possible Kubecost queries. The ETL data is stored in a persistent volume mounted to the kubecost-cost-analyzer pod.

There are a number of reasons why you may want to backup this ETL data:

To ensure a copy of your Kubecost data exists, so you can restore the data if needed
To reduce the amount of historical data stored in Prometheus/Thanos, and instead retain historical ETL data

Beginning in v1.100, this feature is enabled by default if you have Thanos enabled. To opt out, set .Values.kubecostModel.etlBucketConfigSecret="".

Option 1: Automated durable ETL backups and monitoring

Kubecost provides cloud storage backups for ETL backing storage. Backups are not the typical approach of "halt all reads/writes and dump the database." Instead, the backup system is a transparent feature that will always ensure that local ETL data is backed up, and if local data is missing, it can be retrieved from backup storage. This feature protects users from accidental data loss by ensuring that previously backed-up data can be restored at runtime.

Durable backup storage functionality is supported with a Kubecost Enterprise plan.

When the ETL pipeline collects data, it stores daily and hourly (if configured) cost metrics on a configured storage. This defaults to a PV-based disk storage, but can be configured to use external durable storage on the following providers:

AWS S3
Azure Blob Storage
Google Cloud Storage

Step 1: Create storage configuration secret

You will need to create a file named object-store.yaml using the chosen storage provider configuration (documented below), and run the following command to create the secret from this file:

The file must be named object-store.yaml.

Existing Thanos users

Setting .Values.kubecostModel.etlBucketConfigSecret=kubecost-thanos will enable the backup feature. This will back up all ETL data to the same bucket being used by Thanos.

Google Cloud Storage

Azure

S3 compatible tooling

Storj

Hitachi Content Platform (HCP)

For bucket, the value should be the folder created in the HCP endpoint bucket, not the pre-existing bucket name.

Step 2: Enable ETL backup in Helm values

If Kubecost was installed via Helm, ensure the following value is set.

Compatibility

If you are using an existing disk storage option for your ETL data, enabling the durable backup feature will retroactively back up all previously stored data*. This feature is also fully compatible with the existing S3 backup feature.

If you are using a memory store for your ETL data with a local disk backup (kubecostModel.etlFileStoreEnabled: false), the backup feature will simply replace the local backup. In order to take advantage of the retroactive backup feature, you will need to update to file store (kubecostModel.etlFileStoreEnabled: true). This option is now enabled by default in the Helm chart.

Option 2: Manual backup via Bash script

To restore the backup, untar the results of the ETL backup script into the ETL directory pod.

Monitoring

Currently, this feature is still in development, but there is currently a status card available on the Diagnostics page that will eventually show the status of the backup system:

Troubleshooting

Sharing ETL Backups

This document will describe why your Kubecost instance’s data can be useful to share with us, what content is in the data, and how to share it.

Kubecost product releases are tested and verified against a combination of generated/synthetic Kubernetes cluster data and examples of customer data that have been shared with us. Customers who share snapshots of their data with us help to ensure that product changes handle their specific use cases and scales. Because the Kubecost product for many customers is run as an on-prem service, with no data sharing back to us, we do not inherently have this data for many of our customers.

Query Service Replicas

This feature is only supported on Kubecost Enterprise plans.

The query service replica (QSR) is a scale-out query service that reduces load on the cost-model pod. It allows for improved horizontal scaling by being able to handle queries for larger intervals, and multiple simultaneous queries.

Overview

The query service will forward /model/allocation and /model/assets requests to the Query Services StatefulSet.

The diagram below demonstrates the backing architecture of this query service and its functionality.

Requirements

ETL data source

There are three options that can be used for the source ETL Files:

For environments that have Kubecost Federated ETL enabled, this store will be used, no additional configuration is required.
For single cluster environments, QSR can target the ETL backup store. To learn more about ETL backups, see the ETL Backup doc.
Alternatively, an object-store containing the ETL dataset to be queried can be configured using a secret kubecostDeployment.queryServiceConfigSecret. The file name of the secret must be object-store.yaml. Examples can be found in our Configuring Thanos doc.

Persistent volume on Kubecost primary instance

QSR uses persistent volume storage to avoid excessive S3 transfers. Data is retrieved from S3 hourly as new ETL files are created and stored in these PVs. The databaseVolumeSize should be larger than the size of the data in the S3 bucket.

When the pods start, data from the object-store is synced and this can take a significant time in large environments. During the sync, parts of the Kubecost UI will appear broken or have missing data. You can follow the pod logs to see when the sync is complete.

The default of 100Gi is enough storage for 1M pods and 90 days of retention. This can be adjusted:

kubecostDeployment:
  queryServiceReplicas: 2
  queryService:
    # default storage class
    storageClass: ""
    databaseVolumeSize: 100Gi
    configVolumeSize: 1G

Enabling QSR

Once the data store is configured, set kubecostDeployment.queryServiceReplicas to a non-zero value and perform a Helm upgrade.

Usage

Once QSR has been enabled, the new pods will automatically handle all API requests to /model/allocation and /model/assets.

Query Service Replicas

This feature is only supported on Kubecost Enterprise plans.

Overview

The query service will forward /model/allocation and /model/assets requests to the Query Services StatefulSet.

The diagram below demonstrates the backing architecture of this query service and its functionality.

Requirements

ETL data source

There are three options that can be used for the source ETL Files:

For environments that have Kubecost Federated ETL enabled, this store will be used, no additional configuration is required.
For single cluster environments, QSR can target the ETL backup store. To learn more about ETL backups, see the ETL Backup doc.
Alternatively, an object-store containing the ETL dataset to be queried can be configured using a secret kubecostDeployment.queryServiceConfigSecret. The file name of the secret must be object-store.yaml. Examples can be found in our Configuring Thanos doc.

Persistent volume on Kubecost primary instance

The default of 100Gi is enough storage for 1M pods and 90 days of retention. This can be adjusted:

kubecostDeployment:
  queryServiceReplicas: 2
  queryService:
    # default storage class
    storageClass: ""
    databaseVolumeSize: 100Gi
    configVolumeSize: 1G

Enabling QSR

Once the data store is configured, set kubecostDeployment.queryServiceReplicas to a non-zero value and perform a Helm upgrade.

Usage

Once QSR has been enabled, the new pods will automatically handle all API requests to /model/allocation and /model/assets.

Sharing ETL Backups

This document will describe why your Kubecost instance’s data can be useful to share with us, what content is in the data, and how to share it.

Sharing data with us requires an ETL backup executed by the customer in their own environment before the resulting data can be sent out. Kubecost's ETL is a computed cache built upon Prometheus metrics and cloud billing data, on which nearly all API requests made by the user and the Kubecost frontend currently rely. Therefore, the ETL data will contain metric data and identifying information for that metric (e.g. a container name, pod name, namespace, and cluster name) during a time window, but will not contain other information about containers, pods, clusters, cloud resources, etc. You can read more about these metric details in our doc.

The full methodology for creating the ETL backup can be found in our doc. Once these files have been backed up, the content will look as follows before compressing the data:

├── etl
│   ├── bingen
│   │   ├── allocations
│   │   │   ├── 1d # data chunks of 1 day
│   │   │   │   ├── filename: {start timestamp}-{end timestamp}
│   │   │   ├── 1h # data chunks of 1 hour
│   │   │   │   ├── filename: {start timestamp}-{end timestamp}
│   │   ├── assets
│   │   │   ├── 1d # data chunks of 1 day
│   │   │   │   ├── filename: {start timestamp}-{end timestamp}
│   │   │   ├── 1h # data chunks of 1 hour
│   │   │   │   ├── filename: {start timestamp}-{end timestamp}

Once the data is downloaded to the local disk from either the automated or manual ETL backup methods, the data must be converted to a gzip file. A suggested method for downloading the ETL backup and compressing it quickly is to use . Check out the tar syntax in that script if doing this manually without the script. When the compressed ETL backup is ready to share, please work with a Kubecost support engineer on sharing the file with us. Our most common approach is to use a Google Drive folder with access limited to you and the support engineer, but we recognize not all companies are open to this and will work with you to determine the most business-appropriate method.

If you are interested in reviewing the contents of the data, either before or after sending the ETL backup to us, you can find an example Golang implementation on how to read the .

ETL Backup

We do not recommend enabling ETL Backup in conjunction with .

There are a number of reasons why you may want to backup this ETL data:

To ensure a copy of your Kubecost data exists, so you can restore the data if needed
To reduce the amount of historical data stored in Prometheus/Thanos, and instead retain historical ETL data

Beginning in v1.100, this feature is enabled by default if you have Thanos enabled. To opt out, set .Values.kubecostModel.etlBucketConfigSecret="".

Option 1: Automated durable ETL backups and monitoring

Durable backup storage functionality is supported with a Kubecost Enterprise plan.

AWS S3
Azure Blob Storage
Google Cloud Storage

Step 1: Create storage configuration secret

This configuration secret follows the same layout documented for Thanos .

You will need to create a file named object-store.yaml using the chosen storage provider configuration (documented below), and run the following command to create the secret from this file:

kubectl create secret generic <YOUR_SECRET_NAME> -n kubecost --from-file=object-store.yaml

The file must be named object-store.yaml.

Existing Thanos users

If you have already followed our guide, you can reuse the previously created bucket configuration secret.

Setting .Values.kubecostModel.etlBucketConfigSecret=kubecost-thanos will enable the backup feature. This will back up all ETL data to the same bucket being used by Thanos.

The configuration schema for S3 can be found in this . For reference, here's an example:

type: S3
config:
  bucket: "my-bucket"
  endpoint: "s3.amazonaws.com"
  region: "us-west-2"
  access_key: "<AWS_ACCESS_KEY>"
  secret_key: "<AWS_SECRET_KEY>"
  insecure: false
  signature_version2: false
  put_user_metadata:
    "X-Amz-Acl": "bucket-owner-full-control"
prefix: ""  # Optional. Specify a path within the bucket (e.g. "kubecost/etlbackup").

Google Cloud Storage

The configuration schema for Google Cloud Storage can be found in this . For reference, here's an example:

type: GCS
config:
  bucket: "my-bucket"
  service_account: |-
    {
      "type": "service_account",
      "project_id": "project",
      "private_key_id": "abcdefghijklmnopqrstuvwxyz12345678906666",
      "private_key": "-----BEGIN PRIVATE KEY-----\...\n-----END PRIVATE KEY-----\n",
      "client_email": "project@kubecost.iam.gserviceaccount.com",
      "client_id": "123456789012345678901",
      "auth_uri": "https://accounts.google.com/o/oauth2/auth",
      "token_uri": "https://oauth2.googleapis.com/token",
      "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
      "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/kubecost%40gitpods.iam.gserviceaccount.com"
    }
prefix: ""  # Optional. Specify a path within the bucket (e.g. "kubecost/etlbackup").

Azure

The configuration schema for Azure can be found in this . For reference, here's an example:

type: AZURE
config:
  storage_account: "<STORAGE_ACCOUNT>"
  storage_account_key: "<STORAGE_ACCOUNT_KEY>"
  container: "my-bucket"
  endpoint: ""
prefix: ""  # Optional. Specify a path within the bucket (e.g. "kubecost/etlbackup").

S3 compatible tooling

Storj

Because Storj is , it can be used as a drop-in replacement for S3. After an S3 Compatible Access Grant has been created, an example configuration would be:

type: S3
config:
  bucket: "my-bucket"
  endpoint: "gateway.storjshare.io"
  access_key: "<STORJ_ACCESS_KEY>"
  secret_key: "<STORJ_SECRET_KEY>"
  insecure: false
  signature_version2: false
  http_config:
    idle_conn_timeout: 90s
    response_header_timeout: 2m
    insecure_skip_verify: false
  trace:
    enable: true
  part_size: 134217728
prefix: ""  # Optional. Specify a path within the bucket (e.g. "kubecost/etlbackup").

Hitachi Content Platform (HCP)

Because HCP is , it can be used as a drop-in replacement for S3. To obtain the necessary S3 User Credentials, see . Afterwards, follow the example below to configure the secret.

For bucket, the value should be the folder created in the HCP endpoint bucket, not the pre-existing bucket name.

type: S3
config:
  bucket: "folder name"
  endpoint: "gateway.storjshare.io"
  access_key: "<HITACHI_ACCESS_KEY>"
  secret_key: "<HITACHI_SECRET_KEY>"
  insecure: false
  signature_version2: false
  http_config:
    idle_conn_timeout: 90s
    response_header_timeout: 2m
    insecure_skip_verify: false
  trace:
    enable: true
  part_size: 134217728
prefix: ""  # Optional. Specify a path within the bucket (e.g. "kubecost/etlbackup").

Step 2: Enable ETL backup in Helm values

If Kubecost was installed via Helm, ensure the following value is set.

kubecostModel:
  etlBucketConfigSecret: <YOUR_SECRET_NAME>

Compatibility

Option 2: Manual backup via Bash script

The simplest way to backup Kubecost's ETL is to copy the pod's ETL store to your local disk. You can then send that file to any other storage system of your choice. We provide a to do that.

To restore the backup, untar the results of the ETL backup script into the ETL directory pod.

kubectl cp -c cost-model <untarred-results-of-script>/bingen <kubecost-namespace>/<kubecost-pod-name>:/var/configs/db/etl

There is also a Bash script available to restore the backup in .

Monitoring

Currently, this feature is still in development, but there is currently a status card available on the Diagnostics page that will eventually show the status of the backup system:

Troubleshooting

In some scenarios like when using Memory store, setting kubecostModel.etlHourlyStoreDurationHours to a value of 48 hours or less will cause ETL backup files to become truncated. The current recommendation is to keep at its default of 49 hours.