Federated ETL Architecture is only officially supported on Kubecost Enterprise plans.
This doc provides recommendations to improve the stability and recoverability of your Kubecost data when deploying in a Federated ETL architecture.
Kubecost can rebuild its extract, transform, load (ETL) data using Prometheus metrics from each cluster. It is strongly recommended to retain local cluster Prometheus metrics that meet an organization's disaster recovery requirements.
For long term storage of Prometheus metrics, we recommend setting up a Thanos sidecar container to push Prometheus metrics to a cloud storage bucket.
You can configure the Thanos sidecar following this example or this example. Additionally, ensure you configure the following:
object-store.yaml
so the Thanos sidecar has permissions to read/write to the cloud storage bucket
.Values.prometheus.server.global.external_labels.cluster_id
so Kubecost is able to distinguish which metric belongs to which cluster in the Thanos bucket.
Use your cloud service provider's bucket versioning feature to take frequent snapshots of the bucket holding your Kubecost data (ETL files and Prometheus metrics).
Configure Prometheus Alerting rules or Alertmanager to get notified when you are losing metrics or when metrics deviate beyond a known standard.