HBASE-30225 Performance degradation observed on ycsb reads benchmark after HBASE-29727#8353
HBASE-30225 Performance degradation observed on ycsb reads benchmark after HBASE-29727#8353wchevreuil wants to merge 2 commits into
Conversation
Apache9
left a comment
There was a problem hiding this comment.
Mind explaining more about the changes here? I do not fully get what you are trying to fix here...
| public BlockCacheKey(Path hfilePath, long offset, boolean isPrimaryReplica, BlockType blockType) { | ||
| this(hfilePath.getName(), hfilePath.getParent().getName(), | ||
| hfilePath.getParent().getParent().getName(), offset, isPrimaryReplica, blockType, | ||
| HFileArchiveUtil.isHFileArchived(hfilePath)); |
There was a problem hiding this comment.
Why here we can use false directly?
There was a problem hiding this comment.
We are actually only using this constructor on tests now. Because the HFileUtils.isArchived call seems too expensive to be performed on every block cache key creation, that check should be delegated to the reader, which can do it once during it's initialisation. Writers don't even need it, since a being written file is guaranteed to be "non-archived", so writers can benefit from constructors that alwas set archived as false. Let me add javadoc to explain this.
There was a problem hiding this comment.
Then why not just remoev this flag in BlockCacheKey if it does not change?
There was a problem hiding this comment.
It might change when it's created by a Reader (see HFileReaderImpl#1352).
There was a problem hiding this comment.
Then can we just remove this constructor? Or leave the code as is since it only calls from test code? It does not affect real performance if it only calls in test code right?
| this.hfileName = HFILE_NAME_POOL.intern(hfileName); | ||
| this.regionName = (regionName != null) ? REGION_NAME_POOL.intern(regionName) : null; | ||
| this.cfName = (cfName != null) ? CF_NAME_POOL.intern(cfName) : null; | ||
| this.hfileName = hfileName; |
There was a problem hiding this comment.
If we do not do intern here, why we still put the above 3 string pools in this class?
There was a problem hiding this comment.
We are doing intern from readers and writers, we need those to access a single pool per file/region/cf. Since both reader and writer classes already reference BlockCacheKey, and the string re-use is relevant in the BlockCacheKey usage, I found it makes sense to keep the pools in the BlockCacheKey.
There was a problem hiding this comment.
Why not introduce a new class to do this?
As explained in the jira description: When executing ycsb read workloads, we observed a ~30% latency degradation (please see flamegrahs attached to the jira). The problem was that we added logic for parsing the file Path into region name, family name, as well checks for archiving all on the BlockCacheKey constructor used by HFileReaderImpl on the beginning of each block read. As seen on the flame graphs attached covering a five minutes window on one of the RSes, around 30% of the CPU time was spent on the BlockCacheKey constructor, either calling Path.getParent() or HFileUtils.isHFileArchived(). |
So the intention here, is to not always call intern when creating BlockCacheKey, and move intern to other places? |
Sort of. The main problem is not the intern call itself, but the parsing of file name, region and CF name from the path. Doing it inside the BlockCacheKey constructor is too costly, we saw 10% degradation on latency and throughput of ycsb workloads. Moving this parsing to HFileWriterImpl/HFileReaderImpl initialisation means we only need to do it once, as all block cache keys on each writer/reader instance will refer to same values. We could maybe leave the pools and intern calls private to BlockCacheKey, but we would miss the original parsed strings from getting pooled. Less efficient, but I think it's not a big deal, as we would only be doing it at the reader/writer level. Code-wise, I think it would be cleaner and less confusing. Let me do it. |
VladRodionov
left a comment
There was a problem hiding this comment.
The direction makes sense to me, but I think HFileReaderImpl should also cache the HFile name once.
| BlockCacheKey cacheKey = | ||
| new BlockCacheKey(path, dataBlockOffset, this.isPrimaryReplicaReader(), expectedBlockType); | ||
|
|
||
| BlockCacheKey cacheKey = new BlockCacheKey(path.getName(), family, region, dataBlockOffset, |
There was a problem hiding this comment.
A bit off top, I am really surprised that that parsing qualified file name can be expensive at all, but you see that in your benchmarks. Q: why did not you cache file name in this case? path.getName()?
There was a problem hiding this comment.
The culprits, as seem by the async profiler flamegraphs, are Path.getParent() and HFileArchiveUtil.isFileArchived(). I didn't think of doing the same for the file name because that didn't seem to cause problems, but yeah, could do it for the sake of code consistency.
| * Construct a new BlockCacheKey using a file path. File, column family and region information | ||
| * will be extracted from the passed path. This should be used when inserting keys into the cache, | ||
| * so that region cache metrics are recorded properly. | ||
| * will be extracted from the passed path, so that region cache metrics are recorded properly/ |
Change-Id: I213cab8f75df3586241e81750e840f568bfe14c7
|
Current compile failures were introduced by HBASE-30221. There's already an addendum PR for fixing it: #8361 |
Apache9
left a comment
There was a problem hiding this comment.
Where are the intern calls now?
| BlockCacheKey cacheKey = | ||
| new BlockCacheKey(path, metaBlockOffset, this.isPrimaryReplicaReader(), BlockType.META); | ||
| BlockCacheKey cacheKey = new BlockCacheKey(fileName, family, region, metaBlockOffset, | ||
| this.isPrimaryReplicaReader(), BlockType.META, archived); |
There was a problem hiding this comment.
So this is one of the optimization right? We calculate archived once when constructing the HFileReaderImpl, and do not need to calcuate it every time when constructing a BlockCacheKey?
There was a problem hiding this comment.
Correct. Same for region and CF name.
Back to the "BlockCacheKey(String hfileName, String cfName, String regionName, long offset, boolean isPrimaryReplica, BlockType blockType, boolean archived)" (lines 99 to 101). |
No description provided.