ETL Backup
Last updated
Last updated
We do not recommend enabling ETL Backup in conjunction with .
Kubecost's extract, transform, load (ETL) data is a computed cache based on Prometheus's metrics, from which the user can perform all possible Kubecost queries. The ETL data is stored in a persistent volume mounted to the kubecost-cost-analyzer
pod.
There are a number of reasons why you may want to backup this ETL data:
To ensure a copy of your Kubecost data exists, so you can restore the data if needed
To reduce the amount of historical data stored in Prometheus/Thanos, and instead retain historical ETL data
Beginning in v1.100, this feature is enabled by default if you have Thanos enabled. To opt out, set .Values.kubecostModel.etlBucketConfigSecret="".
Kubecost provides cloud storage backups for ETL backing storage. Backups are not the typical approach of "halt all reads/writes and dump the database." Instead, the backup system is a transparent feature that will always ensure that local ETL data is backed up, and if local data is missing, it can be retrieved from backup storage. This feature protects users from accidental data loss by ensuring that previously backed-up data can be restored at runtime.
Durable backup storage functionality is supported with a Kubecost Enterprise plan.
When the ETL pipeline collects data, it stores daily and hourly (if configured) cost metrics on a configured storage. This defaults to a PV-based disk storage, but can be configured to use external durable storage on the following providers:
AWS S3
Azure Blob Storage
Google Cloud Storage
This configuration secret follows the same layout documented for Thanos .
You will need to create a file named object-store.yaml using the chosen storage provider configuration (documented below), and run the following command to create the secret from this file:
The file must be named object-store.yaml.
If Kubecost was installed via Helm, ensure the following value is set.
If you are using an existing disk storage option for your ETL data, enabling the durable backup feature will retroactively back up all previously stored data*. This feature is also fully compatible with the existing S3 backup feature.
If you are using a memory store for your ETL data with a local disk backup (kubecostModel.etlFileStoreEnabled: false
), the backup feature will simply replace the local backup. In order to take advantage of the retroactive backup feature, you will need to update to file store (kubecostModel.etlFileStoreEnabled: true
). This option is now enabled by default in the Helm chart.
To restore the backup, untar the results of the ETL backup script into the ETL directory pod.
Currently, this feature is still in development, but there is currently a status card available on the Diagnostics page that will eventually show the status of the backup system:
If you have already followed our guide, you can reuse the previously created bucket configuration secret.
The configuration schema for S3 can be found in this . For reference, here's an example:
The configuration schema for Google Cloud Storage can be found in this . For reference, here's an example:
The configuration schema for Azure can be found in this . For reference, here's an example:
Because Storj is , it can be used as a drop-in replacement for S3. After an S3 Compatible Access Grant has been created, an example configuration would be:
Because HCP is , it can be used as a drop-in replacement for S3. To obtain the necessary S3 User Credentials, see . Afterwards, follow the example below to configure the secret.
The simplest way to backup Kubecost's ETL is to copy the pod's ETL store to your local disk. You can then send that file to any other storage system of your choice. We provide a to do that.
There is also a Bash script available to restore the backup in .
In some scenarios like when using Memory store, setting kubecostModel.etlHourlyStoreDurationHours
to a value of 48
hours or less will cause ETL backup files to become truncated. The current recommendation is to keep at its default of 49
hours.