Skip to content

[GSD-12906] zesPowerGetEnergyCounter() returns near-zero energy values periodically on idle Intel Arc Pro B70/B60 — possible counter wrap #938

@sjlealru

Description

@sjlealru

Environment

Node GPU SKU Kernel compute-runtime (libze-intel-gpu1) L0 loader (libze1) TW spikes
node-A B70 (0x8086:E223) 6.17.0-35-generic 26.09.37435.12 1.28.0 49
node-C B70 (0x8086:E223) 6.17.0-1009-intel 26.09.37435.12 1.28.0 2
node-D B60 6.14.0-1011-intel 25.40.35563.10 1.26.2 17
node-B B70 (0x8086:E223) 6.17.0-1009-intel 26.18.38308.1 1.28.2 0

OS: Ubuntu 24.04 | xpumanager: 1.3.5-20251216 | Monitoring: 54 hrs, all nodes idle (0% GPU util)

Observed behavior

zesPowerGetEnergyCounter() periodically returns energy values of ~100–300 J on GPU cards that have been continuously idle. The sysfs hwmon energy*_input counter for the same card at the same moment continues accumulating normally, confirming the hardware energy is intact.

xpumd reads the L0 counter every ~5s and computes power as ediff / tdiff. When the counter drops to ~200 J from ~22 MJ, the subtraction underflows as uint64, producing a ~3.69 TW result reported via hw_power_watts.

Captured values at spike moment

Time (UTC) GPU (BDF) zesPowerGetEnergyCounter (L0, power-1) sysfs energy1_input hw_power_watts
2026-06-09 19:44:10 node-A 0001:6b:00.0 209.8 J 22,543,283.9 J 3,688,785,853,643 W
2026-06-09 22:02:11 node-A 0001:91:00.0 173.4 J 23,329,654.5 J 3,690,483,216,906 W

Statistics (54-hour window)

  • L0 counter resets: 345
  • TW spike events: 68
  • sysfs counter drops (node-C only): 67

Related: intel/xpumanager#130

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions