Skip to content

Add exclude attribute by name option to h5diff#6372

Open
mattjala wants to merge 20 commits into
HDFGroup:developfrom
mattjala:h5diff_exclude_attrs
Open

Add exclude attribute by name option to h5diff#6372
mattjala wants to merge 20 commits into
HDFGroup:developfrom
mattjala:h5diff_exclude_attrs

Conversation

@mattjala
Copy link
Copy Markdown
Contributor

Added new h5diff option --exclude-attr-name (short form -B) which allows users to exclude specific attribute names from the comparison. It can be specified multiple times to skip multiple attribute names. If multiple attributes share the provided name(s), then every one will be skipped.

This PR also includes new tests (h5diff_488-491), test file generation in h5diffgentest, and modifies the pre-existing test files which involve usage text to include the new usage text for the new option.

Resolves #6364

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a new h5diff CLI option to exclude attributes by name (across all objects), along with generated test files and new/updated tests to validate behavior and usage text.

Changes:

  • Add --exclude-attr-name (-B) option parsing and propagate exclude-by-name list into diff options.
  • Apply exclude-by-name filtering during attribute comparison.
  • Add generator support + new h5diff tests and update usage-text golden files.

Reviewed changes

Copilot reviewed 19 out of 21 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tools/test/h5gentest.c Registers the new test-file generator for exclude-by-attribute-name cases
tools/test/h5diff/h5diffgentest.h Adds filenames + function prototype for new generated test files
tools/test/h5diff/h5diffgentest.c Implements generation of two HDF5 test files with differing attribute values
tools/test/h5diff/CMakeTests.cmake Adds generated files + new test cases (488–491) validating -B/--exclude-attr-name
tools/src/h5diff/h5diff_main.h Updates public CLI documentation to include the new option
tools/src/h5diff/h5diff_common.c Adds option parsing, stores exclude-name list, and updates usage() output
tools/lib/h5diff.h Extends diff options struct with exclude-by-name flag + linked list
tools/lib/h5diff_attr.c Skips attribute comparison when attribute name matches exclude list
tools/lib/h5diff.c Frees new exclude-name list at end of diff run
tools/test/h5diff/expected/h5diff_10.txt Updates expected usage text to include --exclude-attr-name
tools/test/h5diff/expected/h5diff_600.txt Updates expected usage text to include --exclude-attr-name
tools/test/h5diff/expected/h5diff_603.txt Updates expected usage text to include --exclude-attr-name
tools/test/h5diff/expected/h5diff_606.txt Updates expected usage text to include --exclude-attr-name
tools/test/h5diff/expected/h5diff_612.txt Updates expected usage text to include --exclude-attr-name
tools/test/h5diff/expected/h5diff_615.txt Updates expected usage text to include --exclude-attr-name
tools/test/h5diff/expected/h5diff_621.txt Updates expected usage text to include --exclude-attr-name
tools/test/h5diff/expected/h5diff_622.txt Updates expected usage text to include --exclude-attr-name
tools/test/h5diff/expected/h5diff_623.txt Updates expected usage text to include --exclude-attr-name
tools/test/h5diff/expected/h5diff_624.txt Updates expected usage text to include --exclude-attr-name

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +3883 to +3890
if (fid1 > 0)
H5Fclose(fid1);
if (fid2 > 0)
H5Fclose(fid2);
if (gid1 > 0)
H5Gclose(gid1);
if (gid2 > 0)
H5Gclose(gid2);
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cleanup order closes the files before closing the groups created within them. HDF5 expects child objects (groups/datasets) to be closed before the file to avoid close errors and ID leaks. Close gid1/gid2 before fid1/fid2 (and ideally set IDs to H5I_INVALID_HID after successful close) to ensure deterministic cleanup.

Suggested change
if (fid1 > 0)
H5Fclose(fid1);
if (fid2 > 0)
H5Fclose(fid2);
if (gid1 > 0)
H5Gclose(gid1);
if (gid2 > 0)
H5Gclose(gid2);
if (gid1 > 0) {
H5Gclose(gid1);
gid1 = H5I_INVALID_HID;
}
if (gid2 > 0) {
H5Gclose(gid2);
gid2 = H5I_INVALID_HID;
}
if (fid1 > 0) {
H5Fclose(fid1);
fid1 = H5I_INVALID_HID;
}
if (fid2 > 0) {
H5Fclose(fid2);
fid2 = H5I_INVALID_HID;
}

Copilot uses AI. Check for mistakes.
Comment thread tools/lib/h5diff.c
Comment thread tools/test/h5diff/h5diffgentest.c Outdated
Comment thread tools/test/h5diff/h5diffgentest.c
Comment thread tools/lib/h5diff_attr.c
@mattjala mattjala marked this pull request as draft April 17, 2026 16:03
@mattjala mattjala force-pushed the h5diff_exclude_attrs branch from bb3c795 to ce1c0d0 Compare April 17, 2026 22:39
@mattjala
Copy link
Copy Markdown
Contributor Author

There is an issue in ph5diff where pointer fields (exclude, exclude_attr, exclude_attr_names) were broadcast directly into worker processes with where the addresses pointed to were undefined memory. This most recent commit adds a couple helper functions to pack these lists for the transfer and unpack them afterwards in each worker process.

There may be other fields in diff_opts_t which require a similar fix in order to be safe for use by worker processes, but I think that's out of scope for this PR.

@mattjala mattjala force-pushed the h5diff_exclude_attrs branch from d2e5ffc to 3ce0e76 Compare April 20, 2026 15:42
Copy link
Copy Markdown
Collaborator

@jhendersonHDF jhendersonHDF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting a couple things while I'm looking over this

Comment thread tools/src/h5diff/ph5diff_main.c Outdated
Comment thread tools/src/h5diff/ph5diff_main.c Outdated
* (rather than MPI_Bcast) means workers stay in their probe loop
* and can still receive MPI_TAG_END if we exit early. */
for (i = 1; i < g_nTasks; i++)
send_opts_to_worker(&opts, i);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't going to scale particularly well and seems to point to a design issue. There's no reason you should need to use point-to-point operations rather than a broadcast, but you will need to do some extra work to get things in the right form for a broadcast. This is probably a case for https://www.open-mpi.org/doc/v3.1/man3/MPI_Pack.3.php, but I think more discussion is needed on the design for this feature.

That said, I think this can all be handled with MPI_TAG_ARGS which would fix the problem with error handling in the first place. See the other comment about this.

Comment thread tools/src/h5diff/ph5diff_main.c Outdated
MPI_Probe(0, MPI_ANY_TAG, MPI_COMM_WORLD, &Status);

/* Receive exclude list options from manager (sent before MPI_TAG_PARALLEL) */
if (Status.MPI_TAG == MPI_TAG_OPTS) {
Copy link
Copy Markdown
Collaborator

@jhendersonHDF jhendersonHDF Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not introduce a separate tag just for this one option, as others may be tempted to do the same in the future for each new option. This can be handled with MPI_TAG_ARGS, it's just going to take a bit of extra work.

Currently, ph5diff can get away with broadcasting the arguments for the most part because the object names use fixed-length fields in diff_mpi_args. However, the subsetting and attribute exclusion options are problematic due to their use of dynamic allocation, where an address on one MPI rank will almost certainly be invalid on another. This was probably overlooked at the time because ph5diff tries to share code with h5diff and the dynamically-allocated fields weren't specifically thought about for parallel in the shared diff() function.

This will take a decent bit of refactoring to handle "correctly", but I'd suggest creating a packed MPI type from everything currently sent in diff_mpi_args, along with the dynamically-allocated fields and then have the parallel workers unpack that type after the MPI_Recv in ph5diff_worker(), ideally setting the pointers in the opts struct after doing so. The benefit would be that we could expand to new dynamically-allocated options in the future and also could have arbitrary object name lengths if needed, as opposed to failing in ph5diff for object names > 255 characters.

I'd say some sort of refactoring needs to happen anyway, as it can be unclear whether a field from the opts struct is safe to use in parallel or not without having an overall understanding of that code.

Comment thread tools/src/h5diff/h5diff_common.c Outdated
PRINTVALSTREAM(rawoutstream, " excluded.\n");
PRINTVALSTREAM(rawoutstream, " This option can be used repeatedly to exclude multiple paths.\n");
PRINTVALSTREAM(rawoutstream, "\n");
PRINTVALSTREAM(rawoutstream, " -B \"attr_name\", --exclude-attr-name \"attr_name\"\n");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matching all the attributes with the specified name is almost certain to cause frustration. This seems like it would benefit more from a path-style specification like h5dump has, perhaps with the ability to specify "all attributes matching this name".

@github-project-automation github-project-automation Bot moved this from To be triaged to In progress in HDF5 - TRIAGE & TRACK Apr 20, 2026
@mattjala mattjala force-pushed the h5diff_exclude_attrs branch 2 times, most recently from 4e1be02 to 0f58c5f Compare April 27, 2026 14:03
@mattjala mattjala marked this pull request as ready for review April 27, 2026 17:04
@mattjala mattjala force-pushed the h5diff_exclude_attrs branch from 0f58c5f to df9ba9b Compare May 1, 2026 16:01
@mattjala mattjala force-pushed the h5diff_exclude_attrs branch from df9ba9b to 92c81d9 Compare May 19, 2026 16:16
@mattjala mattjala force-pushed the h5diff_exclude_attrs branch from 92c81d9 to ffc411d Compare May 20, 2026 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

[h5diff] add abilty to exclude specific attributes

3 participants