Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
e57a291
Add Important Labels subsection, with job and instance called out
conallob Jul 15, 2025
a40367a
Add a new section about labels
conallob Jul 15, 2025
ccf009a
Fix typo
conallob Aug 14, 2025
0e87e09
Update docs/practices/naming.md
conallob Aug 14, 2025
a79aa9b
Update docs/practices/naming.md
conallob Aug 15, 2025
a0ebfa9
Iterate on the job label description
conallob Aug 17, 2025
7694523
Update docs/practices/naming.md
conallob Oct 7, 2025
6b6ceb0
Merge branch 'prometheus:main' into main
conallob Dec 31, 2025
938a3a8
Merge branch 'prometheus:main' into main
conallob Jan 22, 2026
e42cd96
Merge branch 'main' into main
conallob Apr 24, 2026
1cb09cd
Split labels from naming.md into a new labels.md
conallob Apr 24, 2026
62bf3d7
Manually apply suggested edits from @SuperQ
conallob Apr 24, 2026
675d41f
Set sort_rank to 2
conallob Apr 25, 2026
092cc89
Rename title to just Labels
conallob Apr 25, 2026
0a7611f
Wordsmith the IMPORTANT block to describe job as a scoping label, not…
conallob Apr 27, 2026
4da0a33
Integrate PR review suggestion
conallob Apr 27, 2026
9b7793c
Merge branch 'main' into main
conallob Apr 29, 2026
519fedc
Merge branch 'main' into main
conallob May 8, 2026
2c94ccf
Merge branch 'main' into main
conallob May 15, 2026
8a013e4
Merge branch 'main' into main
conallob May 22, 2026
6c95581
Merge branch 'main' into main
conallob May 28, 2026
3fecc71
Update docs/practices/labels.md
conallob Jun 1, 2026
c530444
Update docs/practices/labels.md
conallob Jun 1, 2026
9ac82a7
Update docs/practices/labels.md
conallob Jun 1, 2026
9b1f734
Update docs/practices/rules.md
conallob Jun 1, 2026
c13630c
Make it clear stripping job is valid in explicit use cases; Extend WA…
conallob Jun 1, 2026
f1b517f
Move label stripping guidance from rules.md
conallob Jun 1, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions docs/practices/labels.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
---
title: Labels
sort_rank: 2
---

The label conventions presented in this document are not required
for using Prometheus, but can serve as both a style-guide and a collection of
best practices. Individual organizations may want to approach some of these
practices, e.g. naming conventions, differently.

## Labels

Prometheus labels can come from both the target itself and from
[relabeling in discovery](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config).

By default Prometheus configures two primary discovery target labels.

- `job`
- The `job` label is a default target label set by the scrape configs and is used
to identify metrics scraped by the same scrape config.
- Stripping the `job` label is a valid action in certain, explicit aggregation use
cases (e.g metrics across multiple `job` values, etc)
- If not specified in PromQL expressions, they will match unrelated metrics
with the same name. This is especially true in a multi system or multi tenant
installation.

- `instance`
- The `instance` label will include the `ip:port` what was scraped, identifying
the target instance.

WARNING: Stripping the `instance` label will not impact PromQL expressions from being evaluated, but it will make it challenging to debug metric scrape issues.


WARNING: When using `without`, be careful not to strip out the `job` or `instance` labels unintentionally.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need a similar warning for "instance" , depending on usage?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While instance is one of the standard scrape time labels, like job, stripping it doesn't have the same blast radius. Stripping instance will make metrics hard to debug, but should will still work.

For certain use cases that require using multiple layers of rules (e.g in a multi region, multi layered tree of Prometheus), you may want to strip out instance at the higher aggregation layers to manage label cardinality (e.g instance labels make sense to the per region aggregation, but can be problematic if aggregated at the global level)

I've added a warning that stripping instance can make it harder to debug scrape time issues with a metric though.

### General Labelling Advice

Use labels to differentiate the characteristics of the thing that is being measured:

- `api_http_requests_total` - differentiate request types: `operation="create|update|delete"`
- `api_request_duration_seconds` - differentiate request stages: `stage="extract|transform|load"`

Do not put the label names in the metric name, as this introduces redundancy
and will cause confusion if the respective labels are aggregated away.

CAUTION: Remember that every unique combination of key-value label
pairs represents a new time series, which can dramatically increase the amount
of data stored. Do not use labels to store dimensions with high cardinality
(many different label values), such as user IDs, email addresses, or other
unbounded sets of values.


Always specify a `without` clause with the labels you are aggregating away.
20 changes: 2 additions & 18 deletions docs/practices/naming.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
---
title: Metric and label naming
title: Metric naming
sort_rank: 1
---

The metric and label conventions presented in this document are not required
The metric conventions presented in this document are not required
for using Prometheus, but can serve as both a style-guide and a collection of
best practices. Individual organizations may want to approach some of these
practices, e.g. naming conventions, differently.
Expand Down Expand Up @@ -80,22 +80,6 @@ the underlying metric type and unit you work with.
* **Metric collisions**: With growing adoption and metric changes over time, there are cases where lack
of unit and type information in the metric name will cause certain series to collide (e.g. `process_cpu` for seconds and milliseconds).

## Labels

Use labels to differentiate the characteristics of the thing that is being measured:

* `api_http_requests_total` - differentiate request types: `operation="create|update|delete"`
* `api_request_duration_seconds` - differentiate request stages: `stage="extract|transform|load"`

Do not put the label names in the metric name, as this introduces redundancy
and will cause confusion if the respective labels are aggregated away.

CAUTION: Remember that every unique combination of key-value label
pairs represents a new time series, which can dramatically increase the amount
of data stored. Do not use labels to store dimensions with high cardinality
(many different label values), such as user IDs, email addresses, or other
unbounded sets of values.

## Base Units

Prometheus does not have any units hard coded. For better compatibility, base
Expand Down
19 changes: 15 additions & 4 deletions docs/practices/rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@ This page documents proper naming conventions and aggregation for recording rule
Keeping the metric name unchanged makes it easy to know what a metric is and
easy to find in the codebase.

IMPORTANT: `job` label is used to scope a PromQL to a specific service/exporter. It is **strongly** recommended that you
always set it, in order to scope your PromQL expressions to the system you are monitoring.

To keep the operations clean, `_sum` is omitted if there are other operations,
as `sum()`. Associative operations can be merged (for example `min_min` is the
same as `min`).
Expand All @@ -27,6 +30,18 @@ If there is no obvious operation to use, use `sum`. When taking a ratio by
doing division, separate the metrics using `_per_` and call the operation
`ratio`.

## Labels

NOTE: Omitting a label in a PromQL expression is the functional equivalent of specifying `label=~".*"`.

* In both recorded rules and alerting expressions, always specify a `job` label to prevent expression mismatches from occuring.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the need to specify job is very circumstantial , so again I think it needs to be conditional on what you want to achieve. Also specify job is very vague in itself.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on why job is circumstantial?

Afaik, job will always be set on metrics unless it is explicitly stripped away.

This is especially important in multi-tenant systems where the same metric names may be exported by different jobs or the

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think multi-tenant has anything to do with job and instance labels.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, "Multi-tenent systems" may not be the best term, but I'm referring to a Prometheus, run as a platform for multiple teams (e.g by a DevEx or Platform Engineering team), to prevent every team running their own siloed Prometheus stack. In such a setup, all PromQL expressions should be scoped with a job label, to ensure the metrics are from the the expected exporters.

Or framed another way, in such a centralised stack, always write up{job=bla}, never up{}

same job (e.g `node_exporter) in multiple, distinct deployments

* Always specify a `without` clause with the labels you are aggregating away.
This is to preserve all the other labels such as `job`, which will avoid
conflicts and give you more useful metrics and alerts.

## Aggregation

* When aggregating up ratios, aggregate up the numerator and denominator
Expand All @@ -40,10 +55,6 @@ Instead keep the metric name without the `_count` or `_sum` suffix and replace
the `rate` in the operation with `mean`. This represents the average
observation size over that time period.

* Always specify a `without` clause with the labels you are aggregating away.
Comment thread
conallob marked this conversation as resolved.
This is to preserve all the other labels such as `job`, which will avoid
conflicts and give you more useful metrics and alerts.

## Examples

_Note the indentation style with outdented operators on their own line between
Expand Down