Skip to content

expose PMBus device info from build-i2c codegen; set IS_PMBUS#2519

Open
hawkw wants to merge 6 commits into
masterfrom
eliza/hi-im-a-pmbus
Open

expose PMBus device info from build-i2c codegen; set IS_PMBUS#2519
hawkw wants to merge 6 commits into
masterfrom
eliza/hi-im-a-pmbus

Conversation

@hawkw
Copy link
Copy Markdown
Member

@hawkw hawkw commented May 18, 2026

This branch consists of three separate commits which build upon each other, and which I would like to rebase-merge as three separate commits to master. Overall, the primary goal here is to start populating the IS_PMBUS bit in the control-plane-agent inventory's DeviceCapabilities, which was added in oxidecomputer/management-gateway-service#488. In addition, I also tried to do some prep for future changes that want to generate more code for all PMBus devices, such as what @jamesmunns is working on in #2463.

In particular, the three commits do the following:

  1. Update the management-gateway-service dep to pick up the new DeviceCapabilities flags

  2. Add code in build-i2c's public API that is exposed to other build scripts (the device_descriptions() API) to indicate which devices are PMBus devices. In particular, I2cDeviceDescription now includes an optional PmbusDeviceDescription field, which is Some for PMBus devices. The presence of this field can be used to determine when a device is a PMBus device, such as when determining if the IS_PMBUS capability should be advertised.

    I did this by adding a whole new optional struct rather than just sticking an is_pmbus bool on I2cDeviceDescription, because I felt like future code, such as what @jamesmunns is working on for want to read PMBus status registers via MGS #2463, will almost certainly want a way to get a PMBus device's rail and phase names in codegen that lives outside of build-i2c (i.e. in
    task-validate-api's build script). Currently, there isn't a nice way
    to get this stuff in codegen that lives outside of build-i2c, so this
    should help make that nicer.

    Then, I've modified the codegen in task-validate-api, which is what control-plane-agent ultimately consumes for its codegen inventory devices, to include an is_pmbus field so that we can use this to populate the DeviceCapabilities flag.

  3. Finally, actually set the flag in control-plane-agent. :)

hawkw added 3 commits May 20, 2026 09:32
This commit updates our Git dep on the `management-gateway-service`
repo to
oxidecomputer/management-gateway-service@745a508.

This is primarily in order to pick up
oxidecomputer/management-gateway-service#488, which adds new
`DeviceCapabilities` bits for PMBus devices and for devices which have
VPD.
This commit adds a new `PmbusDeviceDescription` to the `build-i2c`
crate. `I2cDeviceDescription` now includes an optional
`PmbusDeviceDescription` field which is `Some` for PMBus devices.
The presence of this field can be used to determine when a device
is a PMBus device, such as when determining if the `IS_PMBUS`
capability should be advertised.

I did this by adding a whole new optional struct rather than just
sticking an `is_pmbus` bool on `I2cDeviceDescription`, because I felt
like future code, such as what @jamesmunns is working on for #2463,
will almost certainly want a way to get a PMBus device's rail and phase
names in codegen that lives outside of `build-i2c` (i.e. in
`task-validate-api`'s build script). Currently, there isn't a nice way
to get this stuff in codegen that lives outside of `build-i2c`, so this
should help make that nicer.
Now that we have the appropriate information available at codegen-time
to know if an I2C device is a PMbus device, we can actually set this
flag. Yay! Now the control plane can know which devices are PMBus!
@hawkw hawkw force-pushed the eliza/hi-im-a-pmbus branch from 732ab05 to 9526186 Compare May 20, 2026 16:47
@hawkw hawkw changed the title Eliza/hi im a pmbus expose PMBus device info from build-i2c codegen; set IS_PMBUS May 20, 2026
@hawkw hawkw marked this pull request as ready for review May 20, 2026 16:53
@hawkw hawkw added service processor Related to the service processor. fault-management Everything related to the Oxide's Fault Management architecture implementation control-plane-agent labels May 20, 2026
Comment thread build/i2c/src/lib.rs Outdated
.flatten()
.enumerate()
.map(|(i, rail)| {
let phases = if let Some(ref phases_list) = power.phases
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Might be worth hoisting this up to the top level, and doing one arm for "has phases" and one for "doesn't have phases"? Then we could do a single assert for "lengths match", and then zip together rails and phases_list, instead of doing .get/indexing by i.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, so that ends up looking like this:

diff --git a/build/i2c/src/lib.rs b/build/i2c/src/lib.rs
index 3ca2e1487..a4201fe57 100644
--- a/build/i2c/src/lib.rs
+++ b/build/i2c/src/lib.rs
@@ -1827,35 +1827,29 @@ pub fn device_descriptions() -> impl Iterator<Item = I2cDeviceDescription> {
                     return None;
                 }

-                let rails = power
-                    .rails
-                    .iter()
-                    .flatten()
-                    .enumerate()
-                    .map(|(i, rail)| {
-                        let phases = if let Some(ref phases_list) = power.phases
-                        {
-                            match phases_list.get(i) {
-                                Some(phases) => phases.clone(),
-                                None => {
-                                    // The rails and phases arrays are expected to
-                                    // be the same length; if this is not the case,
-                                    // the config file is malformed!
-                                    panic!(
-                                        "PMBus device {device_id:?} is missing
-                                         phases for its {i}th rail ({rail})",
-                                    );
-                                }
-                            }
-                        } else {
-                            Vec::new()
-                        };
-                        PmbusRailDescription {
-                            name: rail.clone(),
-                            phases,
-                        }
-                    })
-                    .collect::<Vec<_>>();
+                let rails =  power.rails.as_ref().map(|rails| {
+                    if let Some(ref phases) = power.phases {
+                        assert_eq!(rails.len(), phases.len(),
+                            "PMBus device {device_id:?}'s `power.rails` and \
+                             `power.phases` lists are not the same length"
+                        );
+                        rails.iter()
+                            .cloned()
+                            .zip(phases.iter().cloned())
+                            .map(|(name, phases)| PmbusRailDescription {
+                                name,
+                                phases,
+                            })
+                            .collect()
+                    } else {
+                        rails.iter().cloned()
+                            .map(|name| PmbusRailDescription {
+                                name,
+                                phases: Vec::new(),
+                            })
+                            .collect()
+                    }
+                }).unwrap_or_default();
                 Some(PmbusDeviceDescription { rails })
             });

which, incidentally, rustfmt decided to kinda give up on lol. Do you think that's nicer? because I'm happy to change it if you do...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my take on it, and I actually think this makes one thing a bit more clear: what we do when EITHER rails or power vecs are missing (right now unless we have both we return Some(vec![]):

diff --git a/build/i2c/src/lib.rs b/build/i2c/src/lib.rs
index 3ca2e1487..f6dcd0b43 100644
--- a/build/i2c/src/lib.rs
+++ b/build/i2c/src/lib.rs
@@ -1827,35 +1827,31 @@ pub fn device_descriptions() -> impl Iterator<Item = I2cDeviceDescription> {
                     return None;
                 }
 
-                let rails = power
-                    .rails
+                // If we're missing rails OR phases, we have no rails.
+                //
+                // TODO: Do we want to match on this? Are we actually fine with 3/4 match arms just
+                // giving up and returning `Some(PmbusDeviceDescription { rails: vec![] })`?
+                let (Some(rails), Some(phases)) = (power.rails.as_ref(), power.phases.as_ref()) else {
+                    return Some(PmbusDeviceDescription { rails: vec![] })
+                };
+
+                // If we have rails and phases, they MUST match each other's lengths
+                assert_eq!(
+                    rails.len(),
+                    phases.len(),
+                    "PMBus device {device_id:?} has mismatched rails and phases",
+                );
+
+                // A phase for every rail, and every rail in its phase
+                let rails = rails
                     .iter()
-                    .flatten()
-                    .enumerate()
-                    .map(|(i, rail)| {
-                        let phases = if let Some(ref phases_list) = power.phases
-                        {
-                            match phases_list.get(i) {
-                                Some(phases) => phases.clone(),
-                                None => {
-                                    // The rails and phases arrays are expected to
-                                    // be the same length; if this is not the case,
-                                    // the config file is malformed!
-                                    panic!(
-                                        "PMBus device {device_id:?} is missing
-                                         phases for its {i}th rail ({rail})",
-                                    );
-                                }
-                            }
-                        } else {
-                            Vec::new()
-                        };
-                        PmbusRailDescription {
-                            name: rail.clone(),
-                            phases,
-                        }
+                    .zip(phases.iter())
+                    .map(|(r, p)| PmbusRailDescription {
+                        name: r.clone(),
+                        phases: p.clone(),
                     })
-                    .collect::<Vec<_>>();
+                    .collect();
+
                 Some(PmbusDeviceDescription { rails })
             });

Less nested:

// If we're missing rails OR phases, we have no rails.
//
// TODO: Do we want to match on this? Are we actually fine with 3/4 match arms just
// giving up and returning `Some(PmbusDeviceDescription { rails: vec![] })`?
let (Some(rails), Some(phases)) = (power.rails.as_ref(), power.phases.as_ref()) else {
    return Some(PmbusDeviceDescription { rails: vec![] })
};

// If we have rails and phases, they MUST match each other's lengths
assert_eq!(
    rails.len(),
    phases.len(),
    "PMBus device {device_id:?} has mismatched rails and phases",
);

// A phase for every rail, and every rail in its phase
let rails = rails
    .iter()
    .zip(phases.iter())
    .map(|(r, p)| PmbusRailDescription {
        name: r.clone(),
        phases: p.clone(),
    })
    .collect();

Some(PmbusDeviceDescription { rails })

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, your code snippet implements different (and in this case, incorrect) behavior than what mine does. Not all devices have multiple phases, some devices have a list of rails and no corresponding list of phases. The requirement is that if there is a list of phases, it must be as long as the list of rails.

@mkeeter
Copy link
Copy Markdown
Collaborator

mkeeter commented May 21, 2026

@hawkw This generally looks good; can you do a smoke test to make sure this didn't break humility validate or faux-mgs inventory?

@hawkw
Copy link
Copy Markdown
Member Author

hawkw commented May 21, 2026

@jamesmunns
Copy link
Copy Markdown
Contributor

what the heck happened here: https://github.com/oxidecomputer/hubris/actions/runs/26239754200/job/77222643916?pr=2519#step:4:288

I've been seeing these pretty often in other PRs. I'm not sure if it's some kind of cache failure, a build getting cancelled when it shouldn't, or just GitHub jank. It's starting to happen often enough to worry about imo.

@hawkw
Copy link
Copy Markdown
Member Author

hawkw commented May 21, 2026

sigh...

@hawkw
Copy link
Copy Markdown
Member Author

hawkw commented May 21, 2026

@mkeeter well, humility validate works fine, at least:

eliza@alfred ~ $ pfexec humility flash
humility: WARNING: archive in environment variable overriding archive in environment file
humility: attached to archive
humility: attaching with chip set to "STM32H753ZITx"
humility: attached to 1fc9:0143:XJACFXSEKTQJS via CMSIS-DAP
humility: flash/archive mismatch; reflashing
humility: skipping measurement token handoff
humility: erasing slot 12
humility: done
humility: skipping measurement token handoff
humility: resetting the SP, please wait...
humility: done with auxiliary flash
humility: flashing done
eliza@alfred ~ $ pfexec humility validate
humility: WARNING: archive in environment variable overriding archive in environment file
humility: attached to 1fc9:0143:XJACFXSEKTQJS via CMSIS-DAP
ID VALIDATION   C P  MUX ADDR DEVICE        DESCRIPTION
 0 present      1 B  -   0x70 oximux16      Front FPGA virtual mux
 1 validated    1 B  1:1 0x50 at24csw080    U.2 Sharkfin A VPD
 2 validated    1 B  1:1 0x38 max5970       U.2 Sharkfin A hot swap controller
 3 removed      1 B  1:1 0x6a nvme_bmc      U.2 A NVMe Basic Management Command
 4 validated    1 B  1:2 0x50 at24csw080    U.2 Sharkfin B VPD
 5 validated    1 B  1:2 0x38 max5970       U.2 Sharkfin B hot swap controller
 6 removed      1 B  1:2 0x6a nvme_bmc      U.2 B NVMe Basic Management Control
 7 validated    1 B  1:3 0x50 at24csw080    U.2 Sharkfin C VPD
 8 validated    1 B  1:3 0x38 max5970       U.2 Sharkfin C hot swap controller
 9 removed      1 B  1:3 0x6a nvme_bmc      U.2 C NVMe Basic Management Control
10 validated    1 B  1:4 0x50 at24csw080    U.2 Sharkfin D VPD
11 validated    1 B  1:4 0x38 max5970       U.2 Sharkfin D hot swap controller
12 removed      1 B  1:4 0x6a nvme_bmc      U.2 D NVMe Basic Management Control
13 validated    1 B  1:5 0x50 at24csw080    U.2 Sharkfin E VPD
14 validated    1 B  1:5 0x38 max5970       U.2 Sharkfin E hot swap controller
15 removed      1 B  1:5 0x6a nvme_bmc      U.2 E NVMe Basic Management Control
16 validated    1 B  1:6 0x50 at24csw080    U.2 Sharkfin F VPD
17 validated    1 B  1:6 0x38 max5970       U.2 Sharkfin F hot swap controller
18 removed      1 B  1:6 0x6a nvme_bmc      U.2 F NVMe Basic Management Control
19 validated    1 B  1:9 0x50 at24csw080    U.2 Sharkfin G VPD
20 validated    1 B  1:9 0x38 max5970       U.2 Sharkfin G hot swap controller
21 removed      1 B  1:9 0x6a nvme_bmc      U.2 G NVMe Basic Management Control
22 validated    1 B  1:10 0x50 at24csw080    U.2 Sharkfin H VPD
23 validated    1 B  1:10 0x38 max5970       U.2 Sharkfin H hot swap controller
24 removed      1 B  1:10 0x6a nvme_bmc      U.2 H NVMe Basic Management Control
25 validated    1 B  1:11 0x50 at24csw080    U.2 Sharkfin I VPD
26 validated    1 B  1:11 0x38 max5970       U.2 Sharkfin I hot swap controller
27 removed      1 B  1:11 0x6a nvme_bmc      U.2 I NVMe Basic Management Control
28 validated    1 B  1:12 0x50 at24csw080    U.2 Sharkfin J VPD
29 validated    1 B  1:12 0x38 max5970       U.2 Sharkfin J hot swap controller
30 removed      1 B  1:12 0x6a nvme_bmc      U.2 J NVMe Basic Management Control
31 validated    1 B  1:13 0x48 tmp117        Southwest temperature sensor
32 validated    1 B  1:13 0x49 tmp117        South temperature sensor
33 validated    1 B  1:13 0x4a tmp117        Southeast temperature sensor
34 present      2 F  -   0x70 oximux16      Main FPGA virtual mux
35 removed      2 F  1:1 0x6a nvme_bmc      M.2 A NVMe Basic Management Command
36 removed      2 F  1:2 0x6a nvme_bmc      M.2 B NVMe Basic Management Command
37 validated    2 F  1:4 0x3c sbrmi         CPU via SB-RMI
38 validated    2 F  1:4 0x4c sbtsi         CPU temperature sensor
39 validated    2 F  1:7 0x50 at24csw080    Fan VPD
40 validated    2 F  1:8 0x4c tmp451        T6 temperature sensor
41 validated    3 H  -   0x24 tps546b24a    A2 3.3V rail
42 validated    3 H  -   0x27 tps546b24a    A2 5V rail
43 validated    3 H  -   0x29 tps546b24a    A2 1.8V rail
44 validated    3 H  -   0x3a max5970       M.2 hot plug controller
45 validated    3 H  -   0x54 ltc4282       12V MCIO hot plug controller
46 validated    3 H  -   0x56 ltc4282       DIMM GHIJKL hot plug controller
47 validated    3 H  -   0x55 ltc4282       DIMM ABCDEF hot plug controller
48 validated    3 H  -   0x75 raa229620a    South power controller (Core 0, SOC)
49 validated    3 H  -   0x76 raa229620a    North power controller (Core 1, VDDIO)
50 validated    3 H  -   0x5c isl68224      SP5 power controller (V1P1, V1P8, V3P3)
51 validated    4 F  -   0x39 max5970       NIC hot swap
52 validated    4 F  -   0x25 tps546b24a    T6 power controller
53 validated    4 F  -   0x48 tmp117        Northwest temperature sensor
54 validated    4 F  -   0x49 tmp117        North temperature sensor
55 validated    4 F  -   0x4a tmp117        Northeast temperature sensor
56 validated    4 F  -   0x20 max31790      Fan controller
57 validated    4 F  -   0x67 bmr491        Intermediate bus converter
58 validated    4 F  -   0x50 at24csw080    Cosmo VPD
59 validated    4 F  -   0x11 lm5066i       Fan hot swap controller (east)
60 validated    4 F  -   0x12 lm5066i       Fan hot swap controller (central)
61 validated    4 F  -   0x13 lm5066i       Fan hot swap controller (west)
62 validated    4 F  -   0x14 adm127x       Sled hot swap controller
eliza@alfred ~ $

I haven't yet gotten as far as testing that MGS doesn't get sad upon discovering the new bits (which I sure hope it wouldn't), since the only lab systems that have management network are currently busy doing other things again (cf oxidecomputer/meta#984). I'll see if I can get ahold of something to test against...

hawkw added 2 commits May 22, 2026 14:13
As suggested by @jamesmunns in [this comment][1]. I think this is a bit
easier to follow now, thanks James!

[1]:
#2519 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

control-plane-agent fault-management Everything related to the Oxide's Fault Management architecture implementation service processor Related to the service processor.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants