Add exclude attribute by name option to h5diff#6372
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a new h5diff CLI option to exclude attributes by name (across all objects), along with generated test files and new/updated tests to validate behavior and usage text.
Changes:
- Add
--exclude-attr-name(-B) option parsing and propagate exclude-by-name list into diff options. - Apply exclude-by-name filtering during attribute comparison.
- Add generator support + new h5diff tests and update usage-text golden files.
Reviewed changes
Copilot reviewed 19 out of 21 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/test/h5gentest.c | Registers the new test-file generator for exclude-by-attribute-name cases |
| tools/test/h5diff/h5diffgentest.h | Adds filenames + function prototype for new generated test files |
| tools/test/h5diff/h5diffgentest.c | Implements generation of two HDF5 test files with differing attribute values |
| tools/test/h5diff/CMakeTests.cmake | Adds generated files + new test cases (488–491) validating -B/--exclude-attr-name |
| tools/src/h5diff/h5diff_main.h | Updates public CLI documentation to include the new option |
| tools/src/h5diff/h5diff_common.c | Adds option parsing, stores exclude-name list, and updates usage() output |
| tools/lib/h5diff.h | Extends diff options struct with exclude-by-name flag + linked list |
| tools/lib/h5diff_attr.c | Skips attribute comparison when attribute name matches exclude list |
| tools/lib/h5diff.c | Frees new exclude-name list at end of diff run |
| tools/test/h5diff/expected/h5diff_10.txt | Updates expected usage text to include --exclude-attr-name |
| tools/test/h5diff/expected/h5diff_600.txt | Updates expected usage text to include --exclude-attr-name |
| tools/test/h5diff/expected/h5diff_603.txt | Updates expected usage text to include --exclude-attr-name |
| tools/test/h5diff/expected/h5diff_606.txt | Updates expected usage text to include --exclude-attr-name |
| tools/test/h5diff/expected/h5diff_612.txt | Updates expected usage text to include --exclude-attr-name |
| tools/test/h5diff/expected/h5diff_615.txt | Updates expected usage text to include --exclude-attr-name |
| tools/test/h5diff/expected/h5diff_621.txt | Updates expected usage text to include --exclude-attr-name |
| tools/test/h5diff/expected/h5diff_622.txt | Updates expected usage text to include --exclude-attr-name |
| tools/test/h5diff/expected/h5diff_623.txt | Updates expected usage text to include --exclude-attr-name |
| tools/test/h5diff/expected/h5diff_624.txt | Updates expected usage text to include --exclude-attr-name |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (fid1 > 0) | ||
| H5Fclose(fid1); | ||
| if (fid2 > 0) | ||
| H5Fclose(fid2); | ||
| if (gid1 > 0) | ||
| H5Gclose(gid1); | ||
| if (gid2 > 0) | ||
| H5Gclose(gid2); |
There was a problem hiding this comment.
The cleanup order closes the files before closing the groups created within them. HDF5 expects child objects (groups/datasets) to be closed before the file to avoid close errors and ID leaks. Close gid1/gid2 before fid1/fid2 (and ideally set IDs to H5I_INVALID_HID after successful close) to ensure deterministic cleanup.
| if (fid1 > 0) | |
| H5Fclose(fid1); | |
| if (fid2 > 0) | |
| H5Fclose(fid2); | |
| if (gid1 > 0) | |
| H5Gclose(gid1); | |
| if (gid2 > 0) | |
| H5Gclose(gid2); | |
| if (gid1 > 0) { | |
| H5Gclose(gid1); | |
| gid1 = H5I_INVALID_HID; | |
| } | |
| if (gid2 > 0) { | |
| H5Gclose(gid2); | |
| gid2 = H5I_INVALID_HID; | |
| } | |
| if (fid1 > 0) { | |
| H5Fclose(fid1); | |
| fid1 = H5I_INVALID_HID; | |
| } | |
| if (fid2 > 0) { | |
| H5Fclose(fid2); | |
| fid2 = H5I_INVALID_HID; | |
| } |
bb3c795 to
ce1c0d0
Compare
|
There is an issue in ph5diff where pointer fields ( There may be other fields in |
d2e5ffc to
3ce0e76
Compare
jhendersonHDF
left a comment
There was a problem hiding this comment.
Just noting a couple things while I'm looking over this
| * (rather than MPI_Bcast) means workers stay in their probe loop | ||
| * and can still receive MPI_TAG_END if we exit early. */ | ||
| for (i = 1; i < g_nTasks; i++) | ||
| send_opts_to_worker(&opts, i); |
There was a problem hiding this comment.
This isn't going to scale particularly well and seems to point to a design issue. There's no reason you should need to use point-to-point operations rather than a broadcast, but you will need to do some extra work to get things in the right form for a broadcast. This is probably a case for https://www.open-mpi.org/doc/v3.1/man3/MPI_Pack.3.php, but I think more discussion is needed on the design for this feature.
That said, I think this can all be handled with MPI_TAG_ARGS which would fix the problem with error handling in the first place. See the other comment about this.
| MPI_Probe(0, MPI_ANY_TAG, MPI_COMM_WORLD, &Status); | ||
|
|
||
| /* Receive exclude list options from manager (sent before MPI_TAG_PARALLEL) */ | ||
| if (Status.MPI_TAG == MPI_TAG_OPTS) { |
There was a problem hiding this comment.
Let's not introduce a separate tag just for this one option, as others may be tempted to do the same in the future for each new option. This can be handled with MPI_TAG_ARGS, it's just going to take a bit of extra work.
Currently, ph5diff can get away with broadcasting the arguments for the most part because the object names use fixed-length fields in diff_mpi_args. However, the subsetting and attribute exclusion options are problematic due to their use of dynamic allocation, where an address on one MPI rank will almost certainly be invalid on another. This was probably overlooked at the time because ph5diff tries to share code with h5diff and the dynamically-allocated fields weren't specifically thought about for parallel in the shared diff() function.
This will take a decent bit of refactoring to handle "correctly", but I'd suggest creating a packed MPI type from everything currently sent in diff_mpi_args, along with the dynamically-allocated fields and then have the parallel workers unpack that type after the MPI_Recv in ph5diff_worker(), ideally setting the pointers in the opts struct after doing so. The benefit would be that we could expand to new dynamically-allocated options in the future and also could have arbitrary object name lengths if needed, as opposed to failing in ph5diff for object names > 255 characters.
I'd say some sort of refactoring needs to happen anyway, as it can be unclear whether a field from the opts struct is safe to use in parallel or not without having an overall understanding of that code.
| PRINTVALSTREAM(rawoutstream, " excluded.\n"); | ||
| PRINTVALSTREAM(rawoutstream, " This option can be used repeatedly to exclude multiple paths.\n"); | ||
| PRINTVALSTREAM(rawoutstream, "\n"); | ||
| PRINTVALSTREAM(rawoutstream, " -B \"attr_name\", --exclude-attr-name \"attr_name\"\n"); |
There was a problem hiding this comment.
Matching all the attributes with the specified name is almost certain to cause frustration. This seems like it would benefit more from a path-style specification like h5dump has, perhaps with the ability to specify "all attributes matching this name".
4e1be02 to
0f58c5f
Compare
0f58c5f to
df9ba9b
Compare
df9ba9b to
92c81d9
Compare
92c81d9 to
ffc411d
Compare
Added new h5diff option
--exclude-attr-name(short form-B) which allows users to exclude specific attribute names from the comparison. It can be specified multiple times to skip multiple attribute names. If multiple attributes share the provided name(s), then every one will be skipped.This PR also includes new tests (h5diff_488-491), test file generation in h5diffgentest, and modifies the pre-existing test files which involve usage text to include the new usage text for the new option.
Resolves #6364