[Metrics] Reduce Cardinality

Apr 21, 2025 by ADMIN 29 views

Objective

The primary objective of this project is to reduce the cardinality of time-series metrics produced on path. This involves identifying and addressing the root causes of high-cardinality label noise in observability metrics.

Problem Statement

The current metrics generation process is producing an excessive number of time-series metrics, leading to increased storage costs, reduced scalability, and decreased observability functionality. To address this issue, we need to focus on labels that change over time, are random, or have too many possible values, creating very deep dimensions for time-series metrics.

Identifying High-Cardinality Labels

The following labels have been identified as having the most values:

endpoint_addr
app_address
session_height

These labels are contributing to the high-cardinality label noise in observability metrics. To reduce this noise, we suggest removing these labels from the metrics generation process.

Keeping Valuable Labels Configurable

However, not all labels are created equal. The service_id label appears to be valuable despite its cardinality. To maintain its value while reducing cardinality, we can make this label configurable. This will allow us to keep the label in production metrics while still reducing the overall cardinality of the metrics.

Configurable Label Gating

Alternatively, instead of removing the high-cardinality labels outright, we can gate them behind a config flag. This will allow us to keep the labels in the metrics generation process for development, canary, or debugging purposes.

Goals

The primary goals of this project are to:

Reduce high-cardinality label noise in observability metrics
Keep key labels like service_id configurable for different environments
Ensure production metrics are cost-efficient and scalable
Maintain observability functionality without depending on unbounded label values

Deliverables

The following deliverables are required to complete this project:

Audit metrics in Shannon to verify they don’t include removed labels
Implement config gating for endpoint_addr, app_address, and session_height
Update metrics generation logic to reflect label changes
Validate metrics output across environments (prod/dev/canary)
Communicate changes to internal consumers of the metrics (e.g. dashboards, alerting)

Non-goals / Non-deliverables

The following are not part of this project:

Full refactor of the observability pipeline
Migration of historical metric data or logs
Changes to unrelated label sets or metrics

General Deliverables

In addition to the project-specific deliverables, the following general deliverables are required:

Comments: Add/update TODOs and comments alongside the source code so it is easier to follow
Testing: Add new tests (unit and/or E2E) to the test suite
Makefile: Add new targets to the Makefile to make the new functionality easier to use
Documentation: Update architectural or development READMEs; use mermaid diagrams where appropriate

Origin Document

The origin document for this project is located at:

https://discord.com/channels/824324475256438814/1003704237324259418/1360277335294607361

Creator

This project was created by [GitHub handle of issue owner]

Co-Owners

Q: What is cardinality in observability metrics?

A: Cardinality refers to the number of unique values in a label or dimension. High-cardinality labels can lead to increased storage costs, reduced scalability, and decreased observability functionality.

Q: Why is reducing cardinality important?

A: Reducing cardinality is essential to ensure that observability metrics are cost-efficient, scalable, and maintainable. High-cardinality labels can lead to increased storage costs, reduced scalability, and decreased observability functionality.

Q: What are the common causes of high-cardinality labels?

A: The common causes of high-cardinality labels include:

Labels that change over time
Labels that are random
Labels that have too many possible values, creating very deep dimensions for time-series metrics

Q: How can I identify high-cardinality labels?

A: To identify high-cardinality labels, you can use tools such as Shannon to analyze the metrics and identify labels with the most values.

Q: What are the benefits of removing high-cardinality labels?

A: The benefits of removing high-cardinality labels include:

Reduced storage costs
Increased scalability
Improved observability functionality

Q: Can I keep valuable labels configurable?

A: Yes, you can keep valuable labels configurable by implementing a config flag. This will allow you to keep the label in production metrics while still reducing the overall cardinality of the metrics.

Q: What are the general deliverables for reducing cardinality in observability metrics?

A: The general deliverables for reducing cardinality in observability metrics include:

Adding/update TODOs and comments alongside the source code
Adding new tests (unit and/or E2E) to the test suite
Adding new targets to the Makefile to make the new functionality easier to use
Updating architectural or development READMEs; using mermaid diagrams where appropriate

Q: What are the non-goals / non-deliverables for reducing cardinality in observability metrics?

A: The non-goals / non-deliverables for reducing cardinality in observability metrics include:

Full refactor of the observability pipeline
Migration of historical metric data or logs
Changes to unrelated label sets or metrics

Q: Where can I find more information about reducing cardinality in observability metrics?

A: You can find more information about reducing cardinality in observability metrics in the origin document located at:

https://discord.com/channels/824324475256438814/1003704237324259418/1360277335294607361

Q: Who is responsible for creating this project?

A: This project was created by [GitHub handle of issue owner]

Q: Who are the co-owners of this project?

A: [OPTIONAL - GitHub handle of co-owner(s)]