Skip to content

Improve performance of h5trav interfaces for links to objects#6400

Open
jhendersonHDF wants to merge 3 commits into
HDFGroup:developfrom
jhendersonHDF:h5trav_perf_improvement
Open

Improve performance of h5trav interfaces for links to objects#6400
jhendersonHDF wants to merge 3 commits into
HDFGroup:developfrom
jhendersonHDF:h5trav_perf_improvement

Conversation

@jhendersonHDF
Copy link
Copy Markdown
Collaborator

@jhendersonHDF jhendersonHDF commented May 6, 2026

For objects with multiple hard links, use hash table to map between object tokens and pathnames during traversal to avoid linear scan over all previous objects for each hard link seen

Use separate hash table for h5trav "table" interface to map between object tokens and an index into the table of visited objects. This facilitates quick lookups of objects when adding hard link name aliases for h5repack processing.

See the linked issue for context

For objects with multiple hard links, use hash table to map between
object tokens and pathnames during traversal to avoid linear scan over
all previous objects for each hard link seen

Use separate hash table for h5trav "table" interface to map between
object tokens and an index into the table of visited objects. This
facilitates quick lookups of objects when adding hard link name aliases
for h5repack processing
@jhendersonHDF jhendersonHDF added the Component - Tools Command-line tools like h5dump, includes high-level tools label May 6, 2026
@github-project-automation github-project-automation Bot moved this to To be triaged in HDF5 - TRIAGE & TRACK May 6, 2026
@jhendersonHDF jhendersonHDF linked an issue May 6, 2026 that may be closed by this pull request
Comment thread tools/lib/h5trav.h
#include "hdf5.h"

/* Typedefs for visiting objects */
typedef herr_t (*h5trav_obj_func_t)(const char *path_name, const H5O_info2_t *oinfo, const char *first_seen,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These typedefs had to be moved down for the new trav_seen_t parameter to be available

Comment thread tools/lib/h5trav.h
trav_obj_t *objs;

/* Private data for this trav_table_t */
void *priv_data;
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really a pointer to a structure with a UT_hash_handle, but exposing the uthash API at the h5trav.h level is problematic due to its header-only nature and already being used in H5private.h. This is just a quick hack to hide the implementation details.

Comment thread tools/lib/h5trav.c
* where a visited object was placed to facilitate quicker
* lookups when adding path aliases
*/
typedef struct trav_table_hash_t {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This structure is mostly for h5repack, which now has its own hash table separate from the main one used by other h5trav interfaces. I considered modifying the h5trav interfaces to allow h5repack to share a single hash table, but it would have been fairly awkward to do so and would have imposed unnecessary memory overhead for other tools that wouldn't need the extra information that would have been stored, so I instead just used a separate hash table specifically for the h5trav "table" interface.

Comment thread tools/lib/h5trav.c
*-------------------------------------------------------------------------
*/
static int
trav_token_visited_cmp(hid_t loc_id, const H5O_token_t *token1, const H5O_token_t *token2)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is called by HASH_FIND and is just used to wrap around H5Otoken_cmp() instead of a plain memcmp of object token bytes

Comment thread tools/lib/h5trav.c Outdated
udata.fields = fields;

/* Check for multiple links to top group */
if (oinfo.rc > 1)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this down below the udata initialization in case the hash table gets initialized by the call to trav_token_add().

Comment thread tools/lib/h5trav.c
Comment thread tools/lib/h5trav.c Outdated
mattjala
mattjala previously approved these changes May 8, 2026
Copy link
Copy Markdown
Contributor

@mattjala mattjala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a couple minor issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component - Tools Command-line tools like h5dump, includes high-level tools

Projects

Status: To be triaged

Development

Successfully merging this pull request may close these issues.

HDF5 tools performance issue for multiply-linked objects

3 participants