Skip to content

[fm] ereporter restart ordering dingus#10132

Merged
hawkw merged 17 commits intomainfrom
eliza/ereporter-restart-order
Mar 25, 2026
Merged

[fm] ereporter restart ordering dingus#10132
hawkw merged 17 commits intomainfrom
eliza/ereporter-restart-order

Conversation

@hawkw
Copy link
Copy Markdown
Member

@hawkw hawkw commented Mar 23, 2026

Ereports are uniquely identified by the tuple of their reporter's restart ID and the ereport's ENA, as described in RFD 520. ENAs are ordered within the restart of the reporter, but reporter IDs are randomly-generated UUIDs and do not have a temporal ordering. As I described in this comment on issue #10073, the approach I'd like to take for loading new ereports to use as inputs to fault management analysis relies on having an ordering of not only the ENA component of the ereport ID but also the ordering of reporter restarts (i.e. "is this restart newer than or older than the one I currently believe to be the latest?"). In order to do this, we need to track the sequence of restart IDs per reporter location. This is done on a location-by-location basis because, well, the laws of physics prevent two sleds/switches/PSCs from occupying the same physical location in the rack at the same time, making their history of reporter restarts inherently orderable in a way that other things, like serial numbers, may be less orderable.

This PR adds a table for constructing that ordering and modifies the ereport insert query to actually use it. There's some additional tidiness and refactoring I'd like to do here later (and an OMDB command for printing the ordering table would be nice), but for now, I'd like to be able to actually use this.

Comment on lines +23 to +26
ROW_NUMBER() OVER (
PARTITION BY reporter, slot_type, slot
ORDER BY time_first_seen
) - 1 AS generation,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume ROW_NUMBER is CRDB "enumerate"?

(Is this particularly different than "COUNT DISTINCT"?)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is pretty much exactly "CRDB for Iterator::enumerate()": https://www.cockroachlabs.com/docs/stable/window-functions#add-row-numbers-to-query-output

I guess the way you would use COUNT DISTINCT to produce the generation numbers in order for each row is to COUNT all the DISTINCT records for a slot prior to each row you're about to insert? I wasn't totally sure how to do that, and this thing was basically just an example I could copy out of the CRDB docs :)

Also, I would kinda guess that using COUNT DISTINCT in that way would actually make the database go back and actually COUNT all the rows each time, which feels very O(n2), while this is (I think) just adding numbers to the result set of a query it's already gone and done (AFAICT?).

Comment on lines +222 to +224
/// List restarts of an ereporter at a given physical location, paginated by
/// restart generation.
pub async fn ereporter_restart_list(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda curious; is this going to be for debugging?

(seems like in the "common case" we'll only care about "what's the latest one", right?)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking I would add a "select latest one" query...but I kinda think that in practice, this table is going to end up being used mostly for filtering other queries so I'm not actually sure if we will end up using a separate Rust function that does just a "latest ID from this reporter" or doing a select from this table to get the latest inside the database.

At present this method is just being used for the tests, but I was planning to plumb this through into OMDB, which is why it's pub currently.

Comment thread nexus/db-queries/src/db/datastore/ereport.rs Outdated
Comment thread nexus/db-queries/src/db/datastore/ereport.rs
Comment thread nexus/db-queries/src/db/datastore/ereport.rs
Comment thread nexus/db-queries/src/db/datastore/ereport.rs
@hawkw hawkw added the fault-management Everything related to the fault-management initiative (RFD480 and others) label Mar 24, 2026
@hawkw hawkw marked this pull request as ready for review March 24, 2026 23:36
@hawkw hawkw changed the title [WIP][fm] ereporter restart ordering dingus [fm] ereporter restart ordering dingus Mar 24, 2026
@hawkw hawkw requested a review from mergeconflict March 24, 2026 23:36
@hawkw hawkw enabled auto-merge (squash) March 24, 2026 23:37
@hawkw hawkw merged commit eb0bb82 into main Mar 25, 2026
16 checks passed
@hawkw hawkw deleted the eliza/ereporter-restart-order branch March 25, 2026 17:39
Comment on lines +453 to +463
let next_generation = coalesce(
restart_dsl::ereporter_restart
.filter(restart_dsl::reporter_type.eq(reporter_type))
.filter(restart_dsl::slot_type.eq(slot_type))
.filter(restart_dsl::slot.eq(slot))
.select(max(restart_dsl::generation))
.limit(1)
.single_value()
+ 1,
0,
);
Copy link
Copy Markdown
Contributor

@mergeconflict mergeconflict Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that two concurrent transactions for different restart IDs sharing the same (reporter_type, slot_type, slot) could both read the same max(restart_dsl::generation), compute the same next_generation, and collide?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, in the event of two concurrent inserts with different restart IDs, the one that loses the race will be rejected by the unique index here:

omicron/schema/crdb/dbinit.sql

Lines 7312 to 7319 in ccb9a3a

CREATE UNIQUE INDEX IF NOT EXISTS
lookup_ereporter_restart_generations_by_location
ON omicron.public.ereporter_restart (
reporter_type,
slot_type,
slot,
generation
);

This is good, but I think the rust code should probably be retrying if that constraint is violated, so that the loser of the race is inserted at the next generation.

hawkw added a commit that referenced this pull request Mar 25, 2026
As discussed in [this comment][1], it turns out that the approach to
sitrep input loading which required the ereporter restart table that I
added in #10132 will actually just straight up not work. Unfortunately,
we didn't figure that out until after the PR adding that table had
merged. We've now found a different way to do this that doesn't require
the ordering table. Additionally, the ordering table orders restarts by
the order in which ereports were _added to the database_, not the order
in which the restarts were actually first observed, which is of
debatable usefulness at best. Therefore, we should just get rid of it.

Unfortunately, because there was a database migration, we cannot simply
push a commit that reverts eb0bb82 and
removes all evidence of my shame. Instead, we must have a new migration
that just kinda undoes the migration I wrote yesterday, in case anyone
has installed a build of the control plane containing
eb0bb82 some time in approximately the
last two hours. So, undoing the change requires somewhat more ceremony
than it would if the database schema has not been touched. This PR does
that.

[1]:
#10073 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fault-management Everything related to the fault-management initiative (RFD480 and others)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants