Amazon Managed Service for Prometheus
Overview
Kubecost leverages the open-source Prometheus project as a time series database and post-processes the data in Prometheus to perform cost allocation calculations and provide optimization insights for your Kubernetes clusters such as Amazon Elastic Kubernetes Service (Amazon EKS). Prometheus is a single machine statically-resourced container, so depending on your cluster size or when your cluster scales out, it could exceed the scraping capabilities of a single Prometheus server. In collaboration with Amazon Web Services (AWS), Kubecost integrates with Amazon Managed Service for Prometheus (AMP), a managed Prometheus-compatible monitoring service, to enable the customer to easily monitor Kubernetes cost at scale.
Reference resources
Architecture
The architecture of this integration is similar to Amazon EKS cost monitoring with Kubecost, which is described in the previous blog post, with some enhancements as follows:
In this integration, an additional AWS SigV4 container is added to the cost-analyzer pod, acting as a proxy to help query metrics from Amazon Managed Service for Prometheus using the AWS SigV4 signing process. It enables passwordless authentication to reduce the risk of exposing your AWS credentials.
When the Amazon Managed Service for Prometheus integration is enabled, the bundled Prometheus server in the Kubecost Helm Chart is configured in the remote_write mode. The bundled Prometheus server sends the collected metrics to Amazon Managed Service for Prometheus using the AWS SigV4 signing process. All metrics and data are stored in Amazon Managed Service for Prometheus, and Kubecost queries the metrics directly from Amazon Managed Service for Prometheus instead of the bundled Prometheus. It helps customers not worry about maintaining and scaling the local Prometheus instance.
There are two architectures you can deploy:
The Quick-Start architecture supports a small multi-cluster setup of up to 100 clusters.
The Federated architecture supports a large multi-cluster setup for over 100 clusters.
Quick-Start architecture
The infrastructure can manageup to 100 clusters. The following architecture diagram illustrates the small-scale infrastructure setup:
Federated architecture
To support the large-scale infrastructure of over 100 clusters, Kubecost leverages a Federated ETL architecture. In addition to Amazon Prometheus Workspace, Kubecost stores its extract, transform, and load (ETL) data in a central S3 bucket. Kubecost's ETL data is a computed cache based on Prometheus's metrics, from which users can perform all possible Kubecost queries. By storing the ETL data on an S3 bucket, this integration offers resiliency to your cost allocation data, improves the performance and enables high availability architecture for your Kubecost setup.
The following architecture diagram illustrates the large-scale infrastructure setup:
Instructions
Prerequisites
You have an existing AWS account. You have IAM credentials to create Amazon Managed Service for Prometheus and IAM roles programmatically. You have an existing Amazon EKS cluster with OIDC enabled. Your Amazon EKS clusters have Amazon EBS CSI driver installed
Create Amazon Managed Service for Prometheus workspace:
Step 1: Run the following command to get the information of your current EKS cluster:
The example output should be in this format:
Step 2: Run the following command to create new a Amazon Managed Service for Prometheus workspace
The Amazon Managed Service for Prometheus workspace should be created in a few seconds. Run the following command to get the workspace ID:
Setting up the environment:
Step 1: Set environment variables for integrating Kubecost with Amazon Managed Service for Prometheus
Run the following command to set environment variables for integrating Kubecost with Amazon Managed Service for Prometheus:
Step 2: Set up S3 bucket, IAM policy and Kubernetes secret for storing Kubecost ETL files
Note: You can ignore Step 2 for the small-scale infrastructure setup.
a. Create Object store S3 bucket to store Kubecost ETL metrics. Run the following command in your workspace:
b. Create IAM Policy to grant access to the S3 bucket. The following policy is for demo purposes only. You may need to consult your security team and make appropriate changes depending on your organization's requirements.
Run the following command in your workspace:
c. Create Kubernetes secret to allow Kubecost to write ETL files to the S3 bucket. Run the following command in your workspace:
Step 3: Set up IRSA to allow Kubecost and Prometheus to read & write metrics from Amazon Managed Service for Prometheus
These following commands help to automate the following tasks:
Create an IAM role with the AWS-managed IAM policy and trusted policy for the following service accounts:
kubecost-cost-analyzer-amp
,kubecost-prometheus-server-amp
.Modify current K8s service accounts with annotation to attach a new IAM role.
Run the following command in your workspace:
For more information, you can check AWS documentation at IAM roles for service accounts and learn more about Amazon Managed Service for Prometheus managed policy at Identity-based policy examples for Amazon Managed Service for Prometheus
Integrating Kubecost with Amazon Managed Service for Prometheus
Preparing the configuration file
Run the following command to create a file called config-values.yaml, which contains the defaults that Kubecost will use for connecting to your Amazon Managed Service for Prometheus workspace.
Primary cluster
Run this command to install Kubecost and integrate it with the Amazon Managed Service for Prometheus workspace as the primary:
Additional clusters
These installation steps are similar to those for a primary cluster setup, except you do not need to follow the steps in the section "Create Amazon Managed Service for Prometheus workspace", and you need to update these environment variables below to match with your additional clusters. Please note that the AMP_WORKSPACE_ID
and KC_BUCKET
are the same as the primary cluster.
Run this command to install Kubecost and integrate it with the Amazon Managed Service for Prometheus workspace as the additional cluster:
Your Kubecost setup is now writing and collecting data from AMP. Data should be ready for viewing within 15 minutes.
To verify that the integration is set up, go to Settings in the Kubecost UI, and check the Prometheus Status section.
Read our Custom Prometheus integration troubleshooting guide if you run into any errors while setting up the integration. For support from AWS, you can submit a support request through your existing AWS support contract.
Add recording rules (optional)
You can add these recording rules to improve the performance. Recording rules allow you to precompute frequently needed or computationally expensive expressions and save their results as a new set of time series. Querying the precomputed result is often much faster than running the original expression every time it is needed. Follow these instructions to add the following rules:
Troubleshooting
The below queries must return data for Kubecost to calculate costs correctly.
For the queries below to work, set the environment variables:
Verify connection to AMP and that the metric for
container_memory_working_set_bytes
is available:
If you have set kubecostModel.promClusterIDLabel
, you will need to change the query (CLUSTER_ID
) to match the label (typically cluster
or alpha_eksctl_io_cluster_name
).
The output should contain a JSON entry similar to the following.
The value of cluster_id
should match the value of kubecostProductConfigs.clusterName
.
Verify Kubecost metrics are available in AMP:
The output should contain a JSON entry similar to:
If the above queries fail, check the following:
Check logs of the
sigv4proxy
container (may be the Kubecost deployment or Prometheus Server deployment depending on your setup):
In a working sigv4proxy
, there will be very few logs.
Correctly working log output:
Check logs in the `cost-model`` container for Prometheus connection issues:
Example errors:
Last updated