Backups and Alerting

Federated ETL Architecture is only officially supported on Kubecost Enterprise plans.

This doc provides recommendations to improve the stability and recoverability of your Kubecost data when deploying in a Federated ETL architecture.

Option 1: Increase Prometheus retention

Kubecost can rebuild its extract, transform, load (ETL) data using Prometheus metrics from each cluster. It is strongly recommended to retain local cluster Prometheus metrics that meet an organization's disaster recovery requirements.

prometheus:
  server:
    retention: 21d
  # Ensure the volume is large enough to hold all metrics
  persistentVolume:
    size: 32Gi
    enabled: true

Option 2: Metrics backup

For long term storage of Prometheus metrics, we recommend setting up a Thanos sidecar container to push Prometheus metrics to a cloud storage bucket.

# This is an abridged example. Full example in link below.
prometheus:
  server:
    extraArgs:
      storage.tsdb.min-block-duration: 2h
      storage.tsdb.max-block-duration: 2h
    extraVolumes:
    - name: object-store-volume
      secret:
        secretName: kubecost-thanos
    sidecarContainers:
    - name: thanos-sidecar
      image: thanosio/thanos:v0.30.2
      args:
        - sidecar
        - --prometheus.url=http://127.0.0.1:9090
        - --objstore.config-file=/etc/config/object-store.yaml
      volumeMounts:
      - name: object-store-volume
        mountPath: /etc/config
      - name: storage-volume
        mountPath: /data
        subPath: ""

You can configure the Thanos sidecar following this example or this example. Additionally, ensure you configure the following:

Option 3: Bucket versioning

Use your cloud service provider's bucket versioning feature to take frequent snapshots of the bucket holding your Kubecost data (ETL files and Prometheus metrics).

Option 4: Alerting

Configure Prometheus Alerting rules or Alertmanager to get notified when you are losing metrics or when metrics deviate beyond a known standard.

Last updated