Skip to content

Validate VL datatype type during decode and check file pointer in H5T_set_loc#6395

Open
tbeu wants to merge 1 commit into
HDFGroup:developfrom
tbeu:fix/vlen-invalid-type-decode
Open

Validate VL datatype type during decode and check file pointer in H5T_set_loc#6395
tbeu wants to merge 1 commit into
HDFGroup:developfrom
tbeu:fix/vlen-invalid-type-decode

Conversation

@tbeu
Copy link
Copy Markdown
Contributor

@tbeu tbeu commented May 4, 2026

Description

This fixes the crash reported in #6378 and #6385 by addressing the root cause at the decode level, as requested by reviewers.

H5O__dtype_decode_helper() reads vlen.type from the file without validation. With corrupted HDF5 files (e.g. from fuzzing), this field can have an invalid value that is neither H5T_VLEN_SEQUENCE nor H5T_VLEN_STRING, which later triggers assert(0) in H5T__vlen_set_loc() (debug builds) or a NULL pointer dereference / SEGV in release builds.

Changes

  1. src/H5Odtype.c (H5O__dtype_decode_helper): Added validation immediately after reading vlen.type from the file. If the value is invalid, HGOTO_ERROR is returned so the corrupted type is rejected at the point where the bad value enters the system.

  2. src/H5T.c (H5T_set_loc): Added a NULL file pointer check before calling H5T__vlen_set_loc() when loc == H5T_LOC_DISK, so the low-level assert(file) invariant is never violated even if a higher-level caller fails to provide a valid file.

Rationale

Per the review feedback on #6378 and #6385: the assert() statements in H5T__vlen_set_loc() and the vlen disk operations are intended to catch internal programming errors and should remain. The proper fix is to validate inputs at the higher level where corrupted data is first ingested. This PR does exactly that.

How was this found

Found by OSS-Fuzz via the matio fuzzer (ClusterFuzz testcase 5366895365914624).

OSS-Fuzz issue: https://issues.oss-fuzz.com/issues/472641758

Reproducer file: clusterfuzz-testcase-minimized-matio_fuzzer-5366895365914624.zip

Supersedes #6378 and #6385.

@tbeu tbeu force-pushed the fix/vlen-invalid-type-decode branch 2 times, most recently from 1d856f9 to 54495aa Compare May 7, 2026 19:16
@tbeu tbeu force-pushed the fix/vlen-invalid-type-decode branch from 54495aa to ff21ec4 Compare May 14, 2026 18:24
…_set_loc

H5O__dtype_decode_helper() reads vlen.type from the file without
validation. With corrupted HDF5 files (e.g. from fuzzing), this field
can have an invalid value that is neither H5T_VLEN_SEQUENCE nor
H5T_VLEN_STRING, which later triggers assert(0) in H5T__vlen_set_loc()
(debug builds) or a NULL pointer dereference / SEGV in release builds.

Fix by:
1. Adding a validation check in H5O__dtype_decode_helper() immediately
   after reading the vlen.type field, returning an error if the value
   is invalid.
2. Adding a NULL file pointer check in H5T_set_loc() before calling
   H5T__vlen_set_loc() when loc == H5T_LOC_DISK, so the low-level
   assert(file) invariant is never violated.

This fixes the root cause at the decode level where the bad value
enters the system, as requested in review of HDFGroup#6378 and HDFGroup#6385.

Closes HDFGroup#6378
Closes HDFGroup#6385

Found by OSS-Fuzz via the matio fuzzer (ClusterFuzz testcase
5366895365914624).
@tbeu tbeu force-pushed the fix/vlen-invalid-type-decode branch from ff21ec4 to d495a3d Compare May 18, 2026 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: To be triaged

Development

Successfully merging this pull request may close these issues.

5 participants