tests: Fix NVMe device timing issues with intelligent polling mechanism#1124
Conversation
| decoded = decoder.decode(out) | ||
| if not decoded or 'Devices' not in decoded: | ||
| return [] | ||
| # Poll for devices with exponential backoff |
There was a problem hiding this comment.
Sorry, but this so unnecessarily overengineered, especially for tests. Simple while with few re-tries and time.sleep(1) is enough.
| if ret != 0: | ||
| return [] | ||
|
|
||
| try: |
There was a problem hiding this comment.
I know this is out of scope for this change, but if you are already changing these function, it would be worth removing the code duplication here.
|
The best approach is to wait for a |
b19969e to
951643e
Compare
I have made the changes according to your suggestions. Please help review it. Thank you. |
951643e to
2cc8e63
Compare
tbzatek
left a comment
There was a problem hiding this comment.
Apart from @vojtechtrefny's suggestions the logic looks good to me. There are multiple possible points of failure, but let's see if it helps.
| pass | ||
|
|
||
| time.sleep(wait_time) | ||
| wait_time = min(wait_time * 1.2, max_wait) # Slower exponential backoff |
There was a problem hiding this comment.
My original comment about this being unnecessarily complicated is still valid here.
There was a problem hiding this comment.
OK, I will update and submit a new version of the code based on your comments.
| wait_time = min(wait_time * 1.2, max_wait) # Slower exponential backoff | ||
|
|
||
| os.system("udevadm settle") | ||
| return False |
There was a problem hiding this comment.
Why this function returns boolean? It's not checked anywhere where it is called.
There was a problem hiding this comment.
You are right, there is no need to return a boolean type.
Fixes storaged-project#1080 - Remove complex exponential backoff in favor of simple 1-second sleep - Remove unnecessary boolean return value - Keep subsystem NQN filtering for precise controller monitoring - Use reasonable 3-second timeout for test efficiency
2cc8e63 to
0b42cbd
Compare
|
Jenkins, ok to test. |
Fixes #1080
This PR addresses the timing issue where NVMe namespace devices may not be immediately available after controller creation, causing intermittent test failures.
Problem
The current implementation performs immediate device lookup without considering that NVMe devices may need time to appear in the system after target creation and connection.
Solution
Instead of the simple
sleep(1)approach proposed in #1081, this implements an intelligent polling mechanism with exponential backoff:Key Features
udevadm settleto ensure proper device initialization