[Metrics] Reduce Cardinality
Objective
The primary objective of this project is to reduce the cardinality of time-series metrics produced on path. This involves identifying and addressing the root causes of high-cardinality label noise in observability metrics.
Problem Statement
The current metrics generation process is producing an excessive number of time-series metrics, leading to increased storage costs, reduced scalability, and decreased observability functionality. To address this issue, we need to focus on labels that change over time, are random, or have too many possible values, creating very deep dimensions for time-series metrics.
Identifying High-Cardinality Labels
The following labels have been identified as having the most values:
endpoint_addr
app_address
session_height
These labels are contributing to the high-cardinality label noise in observability metrics. To reduce this noise, we suggest removing these labels from the metrics generation process.
Keeping Valuable Labels Configurable
However, not all labels are created equal. The service_id
label appears to be valuable despite its cardinality. To maintain its value while reducing cardinality, we can make this label configurable. This will allow us to keep the label in production metrics while still reducing the overall cardinality of the metrics.
Configurable Label Gating
Alternatively, instead of removing the high-cardinality labels outright, we can gate them behind a config flag. This will allow us to keep the labels in the metrics generation process for development, canary, or debugging purposes.
Goals
The primary goals of this project are to:
- Reduce high-cardinality label noise in observability metrics
- Keep key labels like
service_id
configurable for different environments - Ensure production metrics are cost-efficient and scalable
- Maintain observability functionality without depending on unbounded label values
Deliverables
The following deliverables are required to complete this project:
- Audit metrics in Shannon to verify they don’t include removed labels
- Implement config gating for
endpoint_addr
,app_address
, andsession_height
- Update metrics generation logic to reflect label changes
- Validate metrics output across environments (prod/dev/canary)
- Communicate changes to internal consumers of the metrics (e.g. dashboards, alerting)
Non-goals / Non-deliverables
The following are not part of this project:
- Full refactor of the observability pipeline
- Migration of historical metric data or logs
- Changes to unrelated label sets or metrics
General Deliverables
In addition to the project-specific deliverables, the following general deliverables are required:
- Comments: Add/update TODOs and comments alongside the source code so it is easier to follow
- Testing: Add new tests (unit and/or E2E) to the test suite
- Makefile: Add new targets to the Makefile to make the new functionality easier to use
- Documentation: Update architectural or development READMEs; use mermaid diagrams where appropriate
Origin Document
The origin document for this project is located at:
https://discord.com/channels/824324475256438814/1003704237324259418/1360277335294607361
Creator
This project was created by [GitHub handle of issue owner]
Co-Owners
Q: What is cardinality in observability metrics?
A: Cardinality refers to the number of unique values in a label or dimension. High-cardinality labels can lead to increased storage costs, reduced scalability, and decreased observability functionality.
Q: Why is reducing cardinality important?
A: Reducing cardinality is essential to ensure that observability metrics are cost-efficient, scalable, and maintainable. High-cardinality labels can lead to increased storage costs, reduced scalability, and decreased observability functionality.
Q: What are the common causes of high-cardinality labels?
A: The common causes of high-cardinality labels include:
- Labels that change over time
- Labels that are random
- Labels that have too many possible values, creating very deep dimensions for time-series metrics
Q: How can I identify high-cardinality labels?
A: To identify high-cardinality labels, you can use tools such as Shannon to analyze the metrics and identify labels with the most values.
Q: What are the benefits of removing high-cardinality labels?
A: The benefits of removing high-cardinality labels include:
- Reduced storage costs
- Increased scalability
- Improved observability functionality
Q: Can I keep valuable labels configurable?
A: Yes, you can keep valuable labels configurable by implementing a config flag. This will allow you to keep the label in production metrics while still reducing the overall cardinality of the metrics.
Q: What are the general deliverables for reducing cardinality in observability metrics?
A: The general deliverables for reducing cardinality in observability metrics include:
- Adding/update TODOs and comments alongside the source code
- Adding new tests (unit and/or E2E) to the test suite
- Adding new targets to the Makefile to make the new functionality easier to use
- Updating architectural or development READMEs; using mermaid diagrams where appropriate
Q: What are the non-goals / non-deliverables for reducing cardinality in observability metrics?
A: The non-goals / non-deliverables for reducing cardinality in observability metrics include:
- Full refactor of the observability pipeline
- Migration of historical metric data or logs
- Changes to unrelated label sets or metrics
Q: Where can I find more information about reducing cardinality in observability metrics?
A: You can find more information about reducing cardinality in observability metrics in the origin document located at:
https://discord.com/channels/824324475256438814/1003704237324259418/1360277335294607361
Q: Who is responsible for creating this project?
A: This project was created by [GitHub handle of issue owner]
Q: Who are the co-owners of this project?
A: [OPTIONAL - GitHub handle of co-owner(s)]