Alerts
Kubecost alerts allow teams to receive updates on real-time Kubernetes spend. They are configurable via the Kubecost UI or Helm values. This resource gives an overview of how to configure alerts sent through email, Slack, and Microsoft Teams using Kubecost Helm chart values. Alerts are either created to monitor specific data sets and trends, or they must be toggled on or off. The following alert types are supported:
Allocation Budget: Sends an alert when spending crosses a defined threshold
Allocation Efficiency: Detects when a Kubernetes tenant is operating below a target cost-efficiency threshold
Allocation Recurring Update: Sends an alert with cluster spending across all or a subset of Kubernetes resources.
Allocation Spend Change: Sends an alert reporting unexpected spend increases relative to moving averages
Asset Budget: Sends an alert when spend for a particular set of assets crosses a defined threshold.
Asset Recurring Update: Sends an alert with asset spend across all or a subset of cloud resources.
Cloud Cost Budget: Sends an alert when the total cost of cloud spend goes over a set budget limit.
Monitor Cluster Health: Used to determine if the cluster's health score changes by a specific threshold. Can only be toggled on/off.
Monitor Kubecost Health: Used for production monitoring for the health of Kubecost itself. Can only be toggled on/off.
Configuring alerts in Helm
values.yaml is a source of truth. Alerts set through values.yaml will continually overwrite any manual alert settings set through the Kubecost UI.
Global alert parameters
The alert settings, under global.notifications.alertConfigs
in cost-analyzer/values.yaml, accept four global fields:
frontendUrl
optional, your cost analyzer front-end URL used for linkbacks in alert bodiesglobalSlackWebhookUrl
optional, a global Slack webhook used for alerts, enabled by default if providedglobalMsTeamWebhookUrl
optional, a global Microsoft Teams webhook used for alerts, enabled by default if providedglobalAlertEmails
a global list of emails for alerts
Example Helm values.yaml:
Configuring each alert type
In addition to all global...
fields, every alert allows optional individual ownerContact
(a list of email addresses), slackWebhookUrl
(if different from globalSlackWebhookUrl
), and msTeamsWebhookUrl
(if different from globalMsTeamsWebhookUrl
) fields. Alerts will default to the global settings if these optional fields are not supplied.
Allocation Budget
Defines spend budgets and alerts on budget overruns.
type
budget
Alert type.
window
<N>d
or <M>h
The date range over which to query items. Configurable where 1 ≤ N ≤ 7, or 1 ≤ M ≤ 24.
filter
<value>,<value2>...
Optional. Configurable, accepts any 1 or more values of aggregate type as comma-separated values.
threshold
<amount>
Cost threshold in configured currency units.
Example Helm values.yaml:
Allocation Efficiency
Alerts when Kubernetes tenants, e.g. namespaces or label sets, are running below defined cost-efficiency thresholds.
type
efficiency
Alert type.
window
<N>d
or <M>h
The date range over which to query items. Configurable where 1 ≤ N ≤ 7, or 1 ≤ M ≤ 24.
filter
<value>,<value2>...
Optional. Configurable, accepts any 1 or more values of aggregate type as comma-separated values.
efficiencyThreshold
<value>
Optional. Efficiency threshold ranging from 0.0 to 1.0.
spendThreshold
<amount>
The cost threshold (ie. budget) in configured currency units.
The example below sends a Slack alert when any namespace spending is running below 40% cost efficiency and has spent more than $100 during the last day.
Allocation Recurring Update
Sends a recurring alert with a summary report of cost and efficiency metrics.
type
recurringUpdate
Alert type.
window
<N>d
or <M>h
The date range over which to query items. Configurable where 1 ≤ N ≤ 7, or 1 ≤ M ≤ 24.
filter
<value>,<value2>...
Optional. Configurable, accepts any 1 or more values of aggregate type as comma-separated values
Additional window
values:
window
values:<N>d
whereN in [1, 7)
for every N days7d
orweekly
for 0:00:00 UTC every Monday30d
ormonthly
for 0:00:00 UTC on the first day of the month.
Additional aggregation
values:
aggregation
values:label
requires the following format:label:<label_name>
annotation
requires the following format:annotation:<annotation_name>
This example sends a recurring alert for allocation data for all namespaces every seven days:
Allocation Spend Change
Detects unexpected spend increases/decreases relative to historical moving averages.
type
spendChange
Alert type.
window
<N>d
or <M>h
The date range over which to query items. Configurable where 1 ≤ N ≤ 7, or 1 ≤ M ≤ 24.
filter
<value>,<value2>...
Optional. Configurable, accepts any 1 or more values of aggregate type as comma-separated values.
baselineWindow
<N>d
Collect data from N days prior to queried items to establish cost baseline. Configurable, where N ≥ 1.
relativeThreshold
<N>
Percentage of change from the baseline (positive or negative) which will trigger the alert. Configurable where N ≥ -1.
Example Helm values.yaml:
Asset Budget
Defines asset budgets and alerts when Kubernetes assets overrun the threshold set.
type
assetBudget
Alert type
window
<N>d
or <M>h
The date range over which to query items. Configurable where 1 ≤ N ≤ 7, or 1 ≤ M ≤ 24.
filter
<value>,<value2>...
Optional. Configurable, accepts any 1 or more values of aggregate type as comma-separated values
threshold
<amount>
The cost threshold (ie. budget) in configured currency units.
Example Helm values.yaml:
Asset Recurring Update
Sends a recurring alert with a Kubernetes assets summary report.
type
cloudReport
Alert type
window
<N>d
or <M>h
The date range over which to query items. Configurable where 1 ≤ N ≤ 7, or 1 ≤ M ≤ 24.
filter
<value>,<value2>...
Optional. Configurable, accepts any 1 or more values of aggregate type as comma-separated values
Additional window
values:
window
values:<N>d
whereN in [1, 7)
for every N days7d
orweekly
for 0:00:00 UTC every Monday30d
ormonthly
for 0:00:00 UTC on the first day of the month.
Additional aggregation
values:
aggregation
values:label
requires the following format:label:<label_name>
annotation
requires the following format:annotation:<annotation_name>
Two example alerts, one which provides weekly summaries of Kubernetes asset spend data aggregated by cluster, and one which provides weekly summaries of asset spend data for one specific cluster:
Cloud Cost Budget
Defines cloud cost budgets and alerts when cloud spend overruns the threshold set.
type
cloudCostBudget
Alert type
window
<N>d
or <M>h
The date range over which to query items. Configurable where 1 ≤ N ≤ 7, or 1 ≤ M ≤ 24.
aggregation
<agg-parameter>
Configurable, accepts service
, account
, provider
, invoiceEntity
, or label
.
filter
<value>,<value2>...
Optional. Configurable, accepts any 1 or more values of aggregate type as comma-separated values
threshold
<amount>
The cost threshold (ie. budget) in configured currency units.
costMetric
<metric-type>
Cost metric type. Accepts ListCost
, NetCost
, AmortizedNetCost
, InvoicedCost
and AmortizedCost.
Monitor Cluster Health
Cluster health alerts occur when the cluster health score changes by a specific threshold. The health score is calculated based on the following criteria:
Low Cluster Memory
Low Cluster CPU
Too Many Pods
Crash Looping Pods
Out of Memory Pods
Failed Jobs
Example Helm values.yaml:
Monitor Kubecost Health
Enabling diagnostic alerts in Kubecost occursthe when an event impacts product uptime. This feature can be enabled in seconds from a values file. The following events are grouped into distinct categories that each result in a separate alert notification:
Prometheus is unreachable
Kubecost Metrics Availability:
Kubecost exported metrics missing over last 5 minutes
cAdvisor exported metrics missing over last 5 minutes
cAdvisor exported metrics missing expected labels in the last 5 minutes
Kubestate Metrics (KSM) exported metrics missing over last 5 minutes
Kubestate Metrics (KSM) unexpected version
Node Exporter metrics are missing over last 5 minutes.
Scrape Interval prometheus self-scraped metrics missing over last 5 minutes
CPU Throttling detected on cost-model in the last 10 minutes
Clusters Added/Removed (Enterprise Multicluster Support Only)
Required parameters:
type: diagnostic
window: <N>m
-- configurable, N > 0
Optional parameters:
diagnostics
-- object containing specific diagnostic checks to run (default istrue
for all). See configuration example below for options:
Example Helm values.yaml:
Configuring alerts in the Kubecost UI
Cluster and Kubecost Health Alerts
Cluster Health Alerts and Kubecost Health Alerts work differently from other alert types. While other alerts monitor cost data for cost or efficiency anomalies, these two monitor the health of Kubecost itself, as well as the health of the cluster running Kubecost. For this reason, multiple of these alert types cannot be created. In the UI, switches for these alert types can be toggled either on or off, managing a single instance of each, and allowing the settings of these single instances to be adjusted.
There is no validation around Cluster Health Alerts. If a Health Alert configuration is invalid, it will appear to save, but will not actually take effect. Please check carefully that the alert has a Window and Threshold properly specified.
Global recipients
Global recipients specify a default fallback recipient for each type of message. If an alert does not define any email recipients, its messages will be sent to any emails specified in the Global Recipients email list. Likewise, if an alert does not define a webhook, its messages will be sent to the global webhook, if one is present. Alerts that do define recipients will ignore the global setting for recipients of that type.
Budget, efficiency, spend change, and recurring update alerts
The remaining alert types all target a set of allocation data with window
, aggregation
and filter
parameters, and trigger based on the target data. The table results can be filtered using the Filter alerts search bar next to + Create Alert. This input can be used to filter based on alert name, type, aggregation, window, and/or filter.
Select + Create Alert to open the Create Alert window where you configure the details of your alert.
The fields for each alert type should resemble their corresponding Helm values in the above tables.
Alerts can also be edited, removed, and tested from the table. Editing opens a dialog similar to the alert creation dialog, for editing the chosen alert.
When creating an alert, you can have these alerts sent through email, Slack, or Microsoft Teams. You can customize the subject field for an email, and attach multiple recipients. Alerts sent via email will contain a PDF of your report which shows the Kubecost UI for your Allocation/Asset page(s). This can be helpful for distributing visual information to those without immediate access to Kubecost.
Testing alerts
The Test arrow icons, as well as a separate Test Alert button in the Edit Alert window, can be used to issue a "test" alert. This can be useful to ensure that alerting infrastructure is working correctly and that an alert is properly configured. Issuing a test from the alert edit modal tests the alert with any modifications that have not yet been saved.
Alerts scheduler
All times in UTC. Alert send times are determined by parsing the supplied window
parameter. Alert diagnostics with the next and last scheduled run times are available via <your-kubecost-url>/model/alerts/status
.
Supported: weekly
and daily
special cases, <N>d
, <M>h
(1 ≤ N ≤ 7, 1 ≤ M ≤ 24) Currently Unsupported: time zone adjustments, windows greater than 7d
, windows less than 1h
Scheduler behavior
An <N>d
alert sends at 00:00 UTC N day(s) from now, i.e., N days from now rounded down to midnight.
For example, a
5d
alert scheduled on Monday will send on Saturday at 00:00, and subsequently the next Thursday at 00:00
An <N>h
alert sends at the earliest time of day after now that is a multiple of N.
For example, a
6h
alert scheduled at any time between 12 pm and 6 pm will send next at 6 pm and subsequently at 12 am the next day.
If 24 is not divisible by the hourly window, schedule at next multiple of <N>h
after now, starting from the current day at 00:00.
For example, a
7h
alert scheduled at 22:00 checks 00:00, 7:00, 14:00, and 21:00, before arriving at the next send time of 4:00 tomorrow.
Troubleshooting
Review these steps to verify alerts are being passed to the Kubecost application correctly.
Check
/model/alerts/configs
to ensure the alerts system has been configured properly.Check
/model/alerts/status
to ensure alerts have been scheduled correctly.The status endpoint returns all of the running alerts including schedule metadata:
scheduledOn
: The date and time (UTC) that the alert was scheduled.lastRun
: The date and time (UTC) that the alert last ran checks (will be set to0001-01-01T00:00:00Z
if the alert has never run).nextRun
: The date and time (UTC) that the alert will next run checks.lastError
: If running the alert checks fails for unexpected reasons, this field will contain the error message.
If using Helm:
Run
kubectl get configmap alert-configs -n kubecost -o json
to view the alerts ConfigMap.Ensure that the Helm values are successfully read into the ConfigMap under alerts.json under the
data
field. See below:
Ensure that the .JSON string is successfully mapped to the appropriate configs
Confirm that Kubecost has received configuration data:
Visit the Alerts page in the Kubecost UI to view configured alert settings as well as any of the alerts configured from Helm.
Alerts set up in the UI will be overwritten by Helm
values.yaml
if the pod restarts.
Additionally, confirm that the alerts scheduler has properly parsed and scheduled the next run for each alert by visiting <your-kubecost-url>/model/alerts/status
to view individual alert parameters as well as the next and last scheduled run times for individual alerts.
Confirm that nextRun
has been updated from "0001-01-01T00:00:00Z"
If nextRun
fails to update, or alerts are not sent at the nextRun
time, check pod logs by running kubectl logs $(kubectl get pods -n kubecost | awk '{print $1}' | grep "^kubecost-cost-analyzer.\{16\}") -n kubecost -c cost-model > kubecost-logs.txt
Common causes of misconfiguration include the following:
Unsupported CSV filters:
spendChange
alerts accept multiplefilter
values when comma-separated; other alert types do not.Unsupported alert type: all alert type names are in camelCase. Check spelling and capitalization for all alert parameters.
Unsupported aggregation parameters: see the Allocation API doc for details.
Last updated