[DSC] Add a Strings table to the shared cache triage view#8277
Conversation
The table displays strings from every image in the shared cache. Double-clicking on a string loads its corresponding image or region and navigates to it. The entire shared cache is scanned asynchronously on the worker pool to discover strings when the Strings tab is first selected. Table filtering and sorting is performed on background threads due to the sheer amount of data this table contains (over 15 millions of rows). Data is discarded shortly after the strings table is hidden.
This gives it asynchronous loading, filtering, and sorting, along with lazy loading and releasing the table data after the view is hidden.
fuzyll
left a comment
There was a problem hiding this comment.
Lots of questions and a few potential bugs. Ironically, the two things I am most concerned about aren't even in these code changes and are just things I saw when scrolling around and reading. 🙃
Good work regardless, thanks for taking this on. 🙂
| size_t comparisons = 0; | ||
| try | ||
| { | ||
| std::sort(display.begin() + chunk.first, display.begin() + chunk.second, guarded(comparisons)); |
There was a problem hiding this comment.
Would it be possible to have non-deterministic ordering here? If I'm following this correctly, results will arrive in the order in which workers complete. Since there's no tie-break on address/length/something else, wouldn't this have ties ordered by worker completion?
(Even if I'm right, I don't know how often this would occur, whether this would be noticeable, etc. Just an observation.)
There was a problem hiding this comment.
Yeah, it is possible to get non-deterministic results if the comparator doesn't break ties itself. And the comparators the two table models were using didn't do that.
I decided to address this by revamping how sorting works in TriageTableRowsModel. Rather than sorting by a single column, it tracks the columns that the user has sorted by and builds a comparator based on multiple columns. The address is always used as a tiebreaker. This gives the effect of using a stable sort when re-sorting by a different column. For example, if the user sorts by symbol name then re-sorts by image name, the rows will be ordered by image name and within each image they'll be sorted by symbol name. This seems strictly more useful than only ever sorting by a single column.
Using a stable sort by itself wouldn't be sufficient here as any filtered rows would be excluded from the sort, so widening the filter criteria would result in inconsistent sorting for the newly-matched rows. Hence building a compound comparator instead.
| raw_length: int | ||
| text: str | ||
| region_start: int | ||
| image_start: int |
There was a problem hiding this comment.
In sharedcache.cpp, you state that "A zeroed imageStart means the region is not associated with an image". Should this detail be surfaced to Python as well in some capacity (even if just a doc comment)?
There was a problem hiding this comment.
I've updated these so non-image regions have an image_start of None
| case Utf16String: | ||
| { | ||
| char* converted = BNUnicodeUTF16ToUTF8(data, std::min(ref.length, maxLength * 2)); | ||
| std::string text(converted); |
There was a problem hiding this comment.
If we fail to convert here, we'd try to construct a std::string from a nullptr. This seems like it might only happen if we fail to allocate the string? Which, is probably not likely to happen, but we still might want to check.
This caused me to feel like a fraud because it turns out I have no idea what actually happens if you try and construct a string from nullptr. Fortunately, it appears C++ also had no idea what should happen, but C++23 appears to be fixing this.
There was a problem hiding this comment.
I've decided to address this and the other unicode-related issue you mentioned by adding some Unicode helpers to bn::base, and using them here. I'll send the change that adds those helpers out as its own PR and rebase this on top of it.
The table displays strings from every image in the shared cache. Double-clicking on a string loads its corresponding image or region and navigates to it.
The entire shared cache is scanned asynchronously on the worker pool to discover strings when the Strings tab is first selected. Table filtering and sorting is performed on background threads due to the sheer amount of data this table contains (over 15 millions of rows). Data is discarded shortly after the strings table is hidden.
This change introduces a
BackgroundSortFilterRowshelper class that handles sorting and filtering of flat UI models on background threads, and aTriageTablePanelthat encapsulates a lot of the logic that is common across DSC's different panes. These are both used by the new strings table, and I updated the DSC's symbols table to use them as well.