[tensilelite] Fix --use-cache SIGSEGV from cached library-path drift#7735
Open
Alex-Vasile wants to merge 4 commits into
Open
[tensilelite] Fix --use-cache SIGSEGV from cached library-path drift#7735Alex-Vasile wants to merge 4 commits into
Alex-Vasile wants to merge 4 commits into
Conversation
…ryFile When llvm::MemoryBuffer::getFileAsStream fails (e.g. library-file in ClientParameters.ini points at a missing path), the returned ErrorOr wraps a null unique_ptr. The previous code dereferenced it unconditionally, producing a SIGSEGV with no diagnostic. This is defense-in-depth on top of the cache.yaml LibraryFile fix: the principled fix makes the wrong-path scenario unreachable from the Python side, this fix makes any future wrong-path bug a clean error with the filename in it, not a crash.
Root cause of the gfx1201 SIGSEGV: the --use-cache path reconstructed
the library file location via libraryDir(sourcePath, cmdLineArchs) and
assumed it matched whatever writeBenchmarkFiles had actually written.
When the layouts diverged (single-arch builds use per-arch subdirs via
libraryDir(); the cached read assumed the flat library/ layout), the
.ini pointed at a missing file and the C++ client crashed loading it.
Fix: writeBenchmarkFiles now returns the relative path it wrote, the
build phase persists it as LibraryFile in cache.yaml alongside
CodeObjectFiles, and the --use-cache path reads it back instead of
recomputing. No reconstruction = no chance of cached read diverging
from what the build wrote.
_readCacheIfValid now returns a dict {CodeObjectFiles, LibraryFile}
rather than a bare list. Legacy cache.yaml files written before
LibraryFile was persisted are caught by the existing KeyError handler,
treated as invalid, and trigger a clean recompile (one-time cost, no
harm). The defensive "cachedLibraryFile is None" branch the earlier
draft carried at the call site is unreachable through this path and
omitted — if the invariant ever breaks via hand-edited cache.yaml,
os.path.join raises TypeError at the bug site with a clean trace.
The on-disk existence check at the call site stays — it guards a real,
user-reproducible scenario (cache dir partially deleted between runs)
and produces a friendlier message than an LLVM open failure.
…ry-file Two tangential bugs in the standalone bin/TensileClientConfig entry point, both surfaced while testing the cache.yaml fix: 1. Imports of globalParameters, assignGlobalParameters, restoreDefault- GlobalParameters, and __version__ from Tensile.Common were stale — those names now live in Tensile.Common.GlobalParameters (and __version__ in Tensile/__init__.py). The module wouldn't even import. Updated to the canonical paths used by every other module in the package. 2. The call to writeClientConfigIni passed a None positional for libraryFile, which emitted a literal `library-file=` line (empty value) into the generated .ini once the inner function's None- fallback was removed — an unusable .ini. Synthesize a libraryFile path anchored to the directory of args.OutputConfig (required arg, so always available). Mirrors the prior implicit default of "library/TensileLibrary.yaml" but pointed at a real directory.
Removes the libraryFile=None default from writeClientConfigIni (inner)
and the libraryFile=None default from writeClientConfig (wrapper, one
frame up), making the parameter required at both. The wrapper variant
is made keyword-only since all current callers already pass it as a
kwarg and the parameter sits behind configBase in the positional list.
Why both: the inner-function default was the silent-wrong-path failure
mode that produced the gfx1201 SIGSEGV — when libraryFile was omitted,
it fell back to library/TensileLibrary.yaml (flat layout), which is
wrong whenever the build used a per-arch subdir. Removing the default
at the wrapper too closes the parallel hole one frame up: a future
caller that omits the kwarg there would propagate None positionally
into writeClientConfigIni and produce library-file=None in the .ini —
the same bug class.
Also updates CreateBenchmarkClientParametersForSizes to pass an
explicit libraryFile= built from the libraryPath it already computed.
Tests added:
- libraryFile is a required parameter on both writeClientConfigIni
and writeClientConfig (signature-level invariant)
- every writeClientConfigIni call site in the codebase mentions
libraryFile= in its arglist (static check via inspect.getsource +
a hand-rolled paren matcher; the matcher's limitations are
documented in its docstring)
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
--use-cacheSIGSEGVs on single-arch builds (gfx1201). The crash is inLLVMLoadLibraryFile—getFileAsStreamreturns an error, the code derefs the nullunique_ptranyway.Root cause is Python-side.
_benchmarkProblemTyperecomputed the library-file path vialibraryDir()and assumed it matched whatwriteBenchmarkFileshad written. They diverged: single-arch builds use per-arch subdirs, the cached read assumed flatlibrary/. The.inipointed at a missing file.Technical Details
Four commits:
fix(loader)— null-checkErrorOrinLLVMLoadLibraryFile. Defense-in-depth: turns any future wrong-path bug into a one-line error instead of a crash.fix(cache)— principal fix.writeBenchmarkFilesreturns the relative library-file path it wrote, the build persists it incache.yaml,--use-cachereads it back. No recomputation, no drift. Legacycache.yamlfiles hit the existingKeyErrorhandler and trigger a clean recompile.fix(TensileClientConfig)— two tangential bugs found while testing. Stale imports (Tensile.Commonno longer re-exports those names) and a positionalNoneforlibraryFilethat emittedlibrary-file=into the.ini.refactor(ClientWriter)—libraryFileis now required on bothwriteClientConfigIniand thewriteClientConfigwrapper (kw-only). The default was the failure mode behind this bug; closing it at both frames stops a future caller from re-introducing it. Updates the remaining call site and adds signature + static call-site tests.(2) is the load-bearing change; (1)(3)(4) stand on their own.
Test Plan
writeClientConfigInicall site passeslibraryFile=.--use-cacheagainst a populated cache: no SIGSEGV, same results as a from-scratch run.Test Result
Submission Checklist