Extend problem cache with hardware provenance metadata#4835
Extend problem cache with hardware provenance metadata#4835danieyan-amd wants to merge 5 commits into
Conversation
Two changes to problem_cache.cpp:
1. load(): Project deserialized keys to only {name, problem} so that
extra metadata fields in the JSON don't break cache key matching.
Previously, the full JSON object (all fields) was used as the map
key, causing 100% cache misses when metadata was present.
2. save(): Enrich each key with hardware provenance before writing:
gpu_arch, cu_count, graphics_clock_mhz, memory_clock_mhz,
memory_bus_bits, vram_bytes, wavefront_size, regs_per_block,
max_threads_per_cu. Queried once via hipGetDeviceProperties at
session end — negligible performance cost.
The in-memory map always uses {name, problem} keys for O(1) lookups.
The on-disk JSON carries additional hardware context for traceability.
On load, the extra fields are projected away, preserving fast matching.
|
Sorry Chris, I didnt mean to hit ready for review. |
There was a problem hiding this comment.
Pull request overview
This PR updates the GPU problem cache persistence format to remain resilient to extra on-disk metadata while also recording hardware provenance for traceability.
Changes:
- In
load(), deserialize into a temporary map and project keys down to{name, problem}to prevent metadata fields from breaking cache-key matching. - In
save(), enrich persisted keys with HIP device properties (e.g., arch, CU count, clocks, VRAM) before writing the JSON file.
| // Enrich keys with hardware provenance metadata on write. | ||
| // This runs once at session end — negligible cost. | ||
| hipDeviceProp_t props{}; | ||
| auto status = hipGetDeviceProperties(&props, get_device_id()); |
| std::unordered_map<value, value> raw; | ||
| from_value(from_json_string(read_string(pc_path)), raw); | ||
| for(auto& [k, v] : raw) | ||
| { | ||
| auto projected = create_key(k.at("name").to<std::string>(), k.at("problem")); | ||
| cache[projected] = v; | ||
| } |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #4835 +/- ##
===========================================
+ Coverage 92.86% 92.88% +0.01%
===========================================
Files 586 587 +1
Lines 30287 30331 +44
===========================================
+ Hits 28126 28170 +44
Misses 2161 2161 🚀 New features to boost your workflow:
|
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
| { | ||
| auto projected = create_key(k.at("name").to<std::string>(), k.at("problem")); | ||
| cache[projected] = v; | ||
| } |
There was a problem hiding this comment.
Make an extra copy can get slow with larger problem caches.
|
I think the metadata should be managed externally. In the future, we may use sqlite dbs to manage problem caches which may not be efficient to insert metadata like this. |
Do you mind if we move forward with the code as it is and optimize if it becomes a problem? We just want to incrementally improve the code and build up the database and caching logic. If it becomes a performance issue, we will optimize it since we are focusing on time-to-first-inference right now. |
I dont understand why this is needed. I would like to keep the code as simple as possible so it easier to update with newer features in the future such as multi-targets, multi-file, sqlite, etc. |
It's needed so that we can create problem cache databases that span different GPU's and GPU configurations. It will start off simple by selecting a single matching problem if there are multiple different matching hardware configurations but we plan to make the algorithm have better selection login in the future as our data improves. This is so that we can start collecting the data and work on building the database. Our next update will actually be adding multi-targets and sqlite support. Daniel has also been profiling different backend support and will be providing an abstraction interface to allow interchangeable backends for logging problem cache data. We need this change so we don't need to redo all the data collection work in the future when we want to select better solution when the hardware configuration changes but has the same gfx arch. |
Well this is not the way we would approach this. We would make the device a key. So instead of it being
Make sure to use type erasure instead of inheritance for this. |
I can look into this, and make the necessary changes |
Addresses PR review feedback: - Device (gpu_arch|cu_count|wavefront_size) used as composite cache key - Type-erased problem_cache_backend wrapper (no virtual inheritance) - JSON backend as default implementation - load()/save() in problem_cache rewritten to use backend abstraction
4b65154 to
3a47ddc
Compare
Two changes to problem_cache.cpp:
load(): Project deserialized keys to only {name, problem} so that extra metadata fields in the JSON don't break cache key matching. Previously, the full JSON object (all fields) was used as the map key, causing 100% cache misses when metadata was present.
save(): Enrich each key with hardware provenance before writing: gpu_arch, cu_count, graphics_clock_mhz, memory_clock_mhz, memory_bus_bits, vram_bytes, wavefront_size, regs_per_block, max_threads_per_cu. Queried once via hipGetDeviceProperties at session end — negligible performance cost.
The in-memory map always uses {name, problem} keys for O(1) lookups. The on-disk JSON carries additional hardware context for traceability. On load, the extra fields are projected away, preserving fast matching.
Motivation
Adding hardware info to the problem cache, and added handling of the hardware data when doing cache lookups for solutions.
Technical Details
Changelog Category
Add a
CHANGELOG.mdentry for any option other thanNot Applicable