Very interesting work!
I find that you have cached recent query before compress_kv()
|
self.kv_cluster.cache_recent(query_states) |
As a result, there will be repetition for the latest query states.
|
selectors = torch.cat([self.cached_recent, query_states], dim=-2) |
Very interesting work!
I find that you have cached recent query before compress_kv()
ReasoningPathCompression/rpc/qwen2_custom.py
Line 97 in 00e64b7
As a result, there will be repetition for the latest query states.
ReasoningPathCompression/rpc/rpc_utils.py
Line 97 in 00e64b7