Skip to content

cgroup v1 blkio parser aborts on "(unknown)" device rows, causing all container stats to fail #393

@begna112

Description

@begna112

Summary

On cgroup v1 hosts with Linux native NVMe multipath enabled and multi-port / multi-controller NVMe SSDs, blkio cgroup files can contain rows where the device field is (unknown) instead of major:minor.

containerd/cgroups appears to treat that as a fatal parse error in the blkio stats parser. In Docker/containerd, the user-visible result is that docker stats --no-stream returns an all-zero stats row for the container, including unrelated metrics such as network, memory, and PIDs.

I think containerd/cgroups should tolerate these blkio rows by skipping them or preserving them as unknown-device blkio entries, rather than failing the entire container stats collection.

Environment

Observed on multiple affected hosts:

  • Docker Engine: 28.5.2, 29.4.2
  • containerd: 1.7.29, 2.2.3
  • runc: 1.3.3, 1.3.5
  • Kernel: Ubuntu generic 6.8.0-111 and 6.8.0-117
  • Cgroups: v1 / cgroupfs
  • nvme_core.multipath=Y

Upstream check:

  • Current containerd/cgroups main was checked at commit 360fd8b.
  • The cgroup v1 blkio code currently lives under cgroup1/blkio.go.
  • A local scratch unit test against that commit confirmed that a row beginning with (unknown) still returns strconv.ParseUint: parsing "(unknown)": invalid syntax.

Affected storage examples:

  • SNM5C-R3R8NC, observed with cmic=0x3, hidden nvme*c*n1 path disks
  • Samsung MZWLJ3T8HBLS / PM1733-family drive, observed with cmic=0x3; Samsung PM1733 documentation describes dual-port functionality

Healthy comparison host:

  • Dell/Kioxia CD8 U.2 3.84TB, observed with cmic=0x0, no hidden NVMe path disks, and normal Docker stats; Kioxia documents CD8-R as a PCIe Gen4 x4 NVMe data-center SSD here

For CMIC meaning, libnvme documents NVME_CTRL_CMIC_MULTI_CTRL as indicating that the NVM subsystem may contain two or more controllers and may provide multiple paths for a single host: Ubuntu manpage.

Actual Behavior

For affected running containers:

$ docker stats --no-stream --format 'stats={{.Name}} net={{.NetIO}} block={{.BlockIO}} mem={{.MemUsage}} pids={{.PIDs}}' <container>

stats=<container> net=0B / 0B block=0B / 0B mem=0B / 0B pids=0

containerd logs show:

strconv.ParseUint: parsing "(unknown)": invalid syntax

The container is not actually idle. Raw cgroup data, runc stats, and interface counters show nonzero activity. The stats failure appears to be triggered by blkio parsing.

Example blkio rows:

(unknown) Read 0
(unknown) Write 0
(unknown) Sync 0
(unknown) Async 0

On one affected host, blkio.throttle.io_serviced had 24 (unknown) rows for the container, and related blkio files had the same pattern.

Expected Behavior

A malformed or unknown blkio device row should not cause the whole container stats response to fail.

At minimum, containerd/cgroups should skip (unknown) blkio rows and still return CPU, memory, PIDs, network, and any parseable blkio metrics.

Why This Looks Like a containerd/cgroups Parser Issue

The cgroup v1 blkio documentation describes these files as normally using major:minor operation value format: Linux blkio controller docs.

The current blkio parser in containerd/cgroups splits each row on spaces and colons, then unconditionally parses fields[0] and fields[1] as numeric major/minor values: cgroup1/blkio.go.

For a row like:

(unknown) Read 0

that means:

fields[0] = "(unknown)"
fields[1] = "Read"
fields[2] = "0"

So strconv.ParseUint(fields[0], 10, 64) fails and the parser returns an error.

Minimal Reproducer Against Current Main

This test can be placed in package cgroup1 against current main. It fails today with strconv.ParseUint: parsing "(unknown)": invalid syntax.

package cgroup1

import (
	"os"
	"path/filepath"
	"testing"

	v1 "github.com/containerd/cgroups/v3/cgroup1/stats"
)

func TestReadEntryUnknownDeviceShouldNotFail(t *testing.T) {
	root := t.TempDir()
	cgroupPath := filepath.Join(root, "blkio", "test")
	if err := os.MkdirAll(cgroupPath, 0o755); err != nil {
		t.Fatal(err)
	}
	if err := os.WriteFile(
		filepath.Join(cgroupPath, "blkio.throttle.io_serviced"),
		[]byte("(unknown) Read 0\nTotal 0\n"),
		0o644,
	); err != nil {
		t.Fatal(err)
	}

	ctrl := NewBlkio(root)
	var entries []*v1.BlkIOEntry
	if err := ctrl.readEntry(map[deviceKey]string{}, "test", "throttle.io_serviced", &entries); err != nil {
		t.Fatalf("unknown blkio device rows should not fail all stats: %v", err)
	}
}

Why (unknown) Can Come From the Kernel

This seems to be triggered by Linux native NVMe multipath hidden path disks.

Linux NVMe multipath is documented as integrating namespaces with the same identifier into a single block device: Linux NVMe multipath docs.

The kernel source shows the nvme_core multipath module parameter: drivers/nvme/host/multipath.c. For multipath namespace paths, the NVMe driver names path disks like nvme%dc%dn%d and marks them hidden with GENHD_FL_HIDDEN: drivers/nvme/host/core.c.

GENHD_FL_HIDDEN is documented in kernel source as a hidden block device used for underlying components of multipath devices: include/linux/blkdev.h.

Hidden disks skip normal BDI registration in genhd.c: block/genhd.c. The BDI fallback name is intentionally (unknown) when there is no BDI device: mm/backing-dev.c.

So the kernel appears able to produce (unknown) blkio device names for hidden multipath path devices. Even if that kernel behavior is imperfect, the container stats collector should not fail all metrics because of it.

Workaround Validation: Disabling Native NVMe Multipath Removed the Bad Rows

On one validation host with the same hidden NVMe path disk signature, disabling Linux native NVMe multipath removed the (unknown) blkio rows and restored docker stats output.

Before the change:

  • Docker Engine: 29.4.2
  • containerd: 2.2.3
  • Kernel: Ubuntu generic 6.8.0-111
  • Cgroups: v1 / cgroupfs
  • nvme_core.multipath=Y
  • Four hidden nvme*c*n1 path disks were present for the data drives.
  • blkio cgroup files contained (unknown) rows.
  • containerd logs included strconv.ParseUint: parsing "(unknown)": invalid syntax during metrics collection.

The host was not using device-mapper multipath for the data volume. Its data filesystem was on an mdadm RAID10 array built directly from the NVMe namespace devices. The workaround was:

# added to GRUB_CMDLINE_LINUX_DEFAULT
nvme_core.multipath=N

sudo update-grub
sudo systemctl reboot

Docker and containerd were not upgraded as part of this test.

After reboot:

/proc/cmdline included nvme_core.multipath=N
/sys/module/nvme_core/parameters/multipath = N
hidden nvme*c*n* path disks = none
mdadm RAID10 array = active, 4/4 members [UUUU]
/var/lib/docker = mounted on the same XFS filesystem
unknown blkio rows under /sys/fs/cgroup/blkio = 0

A disposable local smoke container then returned nonzero Docker stats:

$ docker stats --no-stream --format 'name={{.Name}} net={{.NetIO}} block={{.BlockIO}} mem={{.MemUsage}} cpu={{.CPUPerc}}' <container>
name=<container> net=736B / 126B block=5.59MB / 0B mem=2.918MiB / 377.3GiB cpu=0.00%

There were no (unknown) blkio rows while that test container existed, and no new containerd metrics parse errors appeared.

This does not remove the need for a parser fix. Disabling native NVMe multipath is only safe for hosts that are not actually relying on NVMe multipath. The broader issue is still that one unknown blkio device row can make containerd/Docker lose all container stats.

Related Reports

This looks closely related to containerd/cgroups#300, which reports strconv.ParseUint: parsing "(unknown)": invalid syntax.

A downstream symptom also appears in the Kepler discussion sustainable-computing-io/kepler#750, where metrics are reported as all zero and the linked containerd/cgroups issue is referenced.

Possible Fix

In the cgroup v1 blkio parser, tolerate rows whose device field is not numeric major:minor: cgroup1/blkio.go.

For example:

  • Continue skipping Total as today.
  • If fields[0] == "(unknown)", skip the row or add an unknown-device blkio entry.
  • More generally, if major/minor parse fails for a blkio stat row, do not fail the whole stats collection.
  • Preserve the parse error only if strict parsing is explicitly desired.

Losing blkio accounting for an unknown hidden device is much less harmful than returning zero for unrelated container metrics such as network, memory, and PIDs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions