Skip to content

[host]: ping syncSignature per-host to prevent queue serialization#3948

Open
ZStack-Robot wants to merge 3 commits into
5.5.22from
sync/songtao.liu/fix/ZSTAC-80543-v2
Open

[host]: ping syncSignature per-host to prevent queue serialization#3948
ZStack-Robot wants to merge 3 commits into
5.5.22from
sync/songtao.liu/fix/ZSTAC-80543-v2

Conversation

@ZStack-Robot
Copy link
Copy Markdown
Collaborator

Resolves: ZSTAC-80543

sync from gitlab !9839

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: 9d81a7bf-955b-4708-beed-65fcf17d4f8e

📥 Commits

Reviewing files that changed from the base of the PR and between b996d6c and f56f00d.

📒 Files selected for processing (1)
  • test/src/test/java/org/zstack/test/core/thread/DispatchQueuePerHostSignatureTest.java
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/src/test/java/org/zstack/test/core/thread/DispatchQueuePerHostSignatureTest.java

Walkthrough

本PR修改了Ping主机任务的同步签名,从固定常量改为基于目标主机UUID的动态签名,并添加测试验证新的同步行为。

变更

每主机Ping任务同步

层级 / 文件 摘要
Ping任务每主机同步签名
compute/src/main/java/org/zstack/compute/host/HostBase.java
在Ping消息处理中,ChainTask的同步签名从固定常量"do-ping-host"修改为String.format("do-ping-host-%s", msg.getHostUuid()),使同步队列按主机UUID进行区分。
同步签名行为测试套件
test/src/test/java/org/zstack/test/core/thread/DispatchQueuePerHostSignatureTest.java
新增测试类,定义OrderedTask实现SyncTask接口,通过构造参数的签名返回值和同步闩锁控制任务执行;sameSignatureSerializes验证相同签名的任务按序列执行;differentSignatureConcurrent验证不同签名的任务可并发执行。

🎯 2 (Simple) | ⏱️ ~12 minutes

🐰 主机签名各自舞,ping请不再混杂辅。
同签序列按顺序,异签并发齐出发。
同步队列分明白,主机级别任务佳。✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed 标题完全相关且准确地总结了主要变更:将 ping 的同步签名从全局常量改为基于主机的签名。
Description check ✅ Passed 描述清晰地与变更集相关,包含问题编号和来源参考,说明了修改的目的和范围。
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sync/songtao.liu/fix/ZSTAC-80543-v2

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/src/test/java/org/zstack/test/core/thread/TestDispatchQueuePerHostSignature.java (1)

20-20: ⚡ Quick win

测试类命名不符合仓库规范。

Line 20 的类名 TestDispatchQueuePerHostSignature 未以 TestCase 结尾,建议改为 DispatchQueuePerHostSignatureTest,并同步文件名。
As per coding guidelines “测试类需要以 Test 或 Case 结尾。”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@test/src/test/java/org/zstack/test/core/thread/TestDispatchQueuePerHostSignature.java`
at line 20, Rename the test class TestDispatchQueuePerHostSignature to follow
the repository convention by changing the class name to
DispatchQueuePerHostSignatureTest and update the containing Java file name
accordingly; ensure all references to TestDispatchQueuePerHostSignature in
imports, test suites, or build configs are updated to the new class name to
avoid compilation failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In
`@test/src/test/java/org/zstack/test/core/thread/TestDispatchQueuePerHostSignature.java`:
- Line 20: Rename the test class TestDispatchQueuePerHostSignature to follow the
repository convention by changing the class name to
DispatchQueuePerHostSignatureTest and update the containing Java file name
accordingly; ensure all references to TestDispatchQueuePerHostSignature in
imports, test suites, or build configs are updated to the new class name to
avoid compilation failures.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: 0aab5719-69dc-41f5-992e-122f5f52f1dc

📥 Commits

Reviewing files that changed from the base of the PR and between 57a496a and ad52e6f.

📒 Files selected for processing (2)
  • compute/src/main/java/org/zstack/compute/host/HostBase.java
  • test/src/test/java/org/zstack/test/core/thread/TestDispatchQueuePerHostSignature.java

@zstack-robot-2
Copy link
Copy Markdown
Collaborator

Comment from yaohua.wu:

Review: MR !9839 — ZSTAC-80543

P3 — Low

# File:Line 问题 修复建议
1 test/.../TestDispatchQueuePerHostSignature.java:20 测试类名以 Test 开头而非结尾,不符合仓库规范(规范要求以 TestCase 结尾)。CodeRabbit 也已指出同一问题。 重命名为 DispatchQueuePerHostSignatureTest,同步修改文件名。

总结

本次修复将 HostBase 中 ping 任务的同步签名从全局常量 "do-ping-host" 改为包含目标主机 UUID 的动态值 "do-ping-host-{hostUuid}",使不同主机的 ping 队列相互独立。修复直接命中 Jira 根因:CBD 存储断管理网场景下,zbs/primarystorage/check/host/connection 请求因 1800 秒超时占用全局 ping 队列,导致其他主机 ping 任务被阻塞长达 30 分钟;改为 per-host 签名后,各主机队列互不影响,失联检测延迟恢复正常。配套的单元测试验证了同签名串行、异签名并发两个关键行为,覆盖充分。

Verdict: APPROVED


🤖 Robot Reviewer

zstack added 2 commits May 19, 2026 01:03
…serialization

getSyncSignature() changed from fixed "do-ping-host" to "do-ping-host-<hostUuid>",
preventing all physical host pings from queuing on the same SyncTaskChain.

Resolves: ZSTAC-83192, ZSTAC-79389
Verify after getSyncSignature() returns per-host key:
- same signature serializes execution (same queue)
- different signatures run in parallel (different host = different queue)

Related: ZSTAC-83192
@MatheMatrix MatheMatrix force-pushed the sync/songtao.liu/fix/ZSTAC-80543-v2 branch from 3db5a92 to b996d6c Compare May 19, 2026 08:04
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
test/src/test/java/org/zstack/test/core/thread/DispatchQueuePerHostSignatureTest.java (3)

25-57: ⚡ Quick win

建议将魔法值提取为常量以提升可读性。

OrderedTask 中存在多处魔法值:

  • 第 45 行:10 秒的超时时间
  • 第 56 行:getSyncLevel() 返回的 2 没有说明其含义

建议将这些值提取为有意义的常量,便于理解和维护。根据编码指南,应避免使用魔法值。

♻️ 建议的优化
 public class DispatchQueuePerHostSignatureTest {
     private static final CLogger logger = Utils.getLogger(DispatchQueuePerHostSignatureTest.class);
+    private static final int DEFAULT_TIMEOUT_SECONDS = 10;
+    private static final int SYNC_LEVEL_DEFAULT = 2;
     ComponentLoader loader;
     ThreadFacade thdf;

     static class OrderedTask implements SyncTask<Integer> {
         // ... fields ...

         `@Override`
         public Integer call() throws Exception {
             executionOrder.add(id);
             started.countDown();
-            letFinish.await(10, TimeUnit.SECONDS);
+            letFinish.await(DEFAULT_TIMEOUT_SECONDS, TimeUnit.SECONDS);
             return id;
         }

         `@Override`
-        public int getSyncLevel() { return 2; }
+        public int getSyncLevel() { return SYNC_LEVEL_DEFAULT; }
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@test/src/test/java/org/zstack/test/core/thread/DispatchQueuePerHostSignatureTest.java`
around lines 25 - 57, Extract the magic numbers in OrderedTask into named
constants: replace the hardcoded 10 in letFinish.await(10, TimeUnit.SECONDS)
with a timeout constant (e.g., DEFAULT_WAIT_SECONDS or TASK_AWAIT_SECONDS) and
replace the literal 2 returned by getSyncLevel() with a named
SYNCHRONIZATION_LEVEL (or similar) so the meaning is clear; update the
OrderedTask constructor/class to declare these private static final constants
and use them in call() and getSyncLevel() to improve readability and
maintainability.

66-91: ⚡ Quick win

建议提取重复的超时常量。

测试方法中多次使用相同的超时值(如 10 秒和 500 毫秒),建议提取为常量以提升可维护性。这样可以统一调整测试超时配置,也使代码意图更清晰。

♻️ 建议的优化
 public class DispatchQueuePerHostSignatureTest {
     private static final CLogger logger = Utils.getLogger(DispatchQueuePerHostSignatureTest.class);
+    private static final int DEFAULT_TIMEOUT_SECONDS = 10;
+    private static final int BLOCKING_CHECK_TIMEOUT_MS = 500;
     ComponentLoader loader;
     ThreadFacade thdf;

     `@Test`
     public void sameSignatureSerializes() throws Exception {
         List<Integer> executionOrder = Collections.synchronizedList(new ArrayList<>());
         CountDownLatch started1 = new CountDownLatch(1);
         CountDownLatch letFinish1 = new CountDownLatch(1);

         Future<Integer> f1 = thdf.syncSubmit(new OrderedTask(1, "same-sig", executionOrder, started1, letFinish1));
-        started1.await(10, TimeUnit.SECONDS);
+        started1.await(DEFAULT_TIMEOUT_SECONDS, TimeUnit.SECONDS);

         CountDownLatch started2 = new CountDownLatch(1);
         CountDownLatch letFinish2 = new CountDownLatch(1);
         Future<Integer> f2 = thdf.syncSubmit(new OrderedTask(2, "same-sig", executionOrder, started2, letFinish2));

         // Task 2 should NOT start while Task 1 is still running (same signature = serial)
-        Assert.assertFalse("task 2 should be blocked by same signature", started2.await(500, TimeUnit.MILLISECONDS));
+        Assert.assertFalse("task 2 should be blocked by same signature", started2.await(BLOCKING_CHECK_TIMEOUT_MS, TimeUnit.MILLISECONDS));

         letFinish1.countDown();
-        Assert.assertEquals(1, (int) f1.get(10, TimeUnit.SECONDS));
+        Assert.assertEquals(1, (int) f1.get(DEFAULT_TIMEOUT_SECONDS, TimeUnit.SECONDS));

-        started2.await(10, TimeUnit.SECONDS);
+        started2.await(DEFAULT_TIMEOUT_SECONDS, TimeUnit.SECONDS);
         Assert.assertEquals("task 1 must execute before task 2", (Integer) 1, executionOrder.get(0));
         Assert.assertEquals("task 2 must execute after task 1", (Integer) 2, executionOrder.get(1));

         letFinish2.countDown();
-        Assert.assertEquals(2, (int) f2.get(10, TimeUnit.SECONDS));
+        Assert.assertEquals(2, (int) f2.get(DEFAULT_TIMEOUT_SECONDS, TimeUnit.SECONDS));
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@test/src/test/java/org/zstack/test/core/thread/DispatchQueuePerHostSignatureTest.java`
around lines 66 - 91, Extract repeated timeout literals into named constants and
replace the inline numbers in sameSignatureSerializes: create constants such as
DEFAULT_WAIT_SECONDS (used for started1.await(..., TimeUnit.SECONDS),
started2.await(..., TimeUnit.SECONDS), f1.get(..., TimeUnit.SECONDS),
f2.get(..., TimeUnit.SECONDS)) and SHORT_WAIT_MILLIS (used for
started2.await(500, TimeUnit.MILLISECONDS)); update the calls in OrderedTask
submissions and assertions to use these constants so timeouts are centralized
and maintainable.

3-3: ⚡ Quick win

使用 JUnit 4 的 Assert 类以保持一致性。

代码中使用了 JUnit 4 的注解(@Before@Test),但导入的是 JUnit 3 的 junit.framework.Assert。建议统一使用 org.junit.Assert 以保持版本一致性。

♻️ 建议的修复
-import junit.framework.Assert;
+import org.junit.Assert;
 import org.junit.Before;
 import org.junit.Test;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@test/src/test/java/org/zstack/test/core/thread/DispatchQueuePerHostSignatureTest.java`
at line 3, Replace the JUnit 3 import with the JUnit 4 Assert import: remove the
existing import of junit.framework.Assert and import org.junit.Assert instead so
the test class DispatchQueuePerHostSignatureTest (which uses `@Before/`@Test
annotations) consistently uses JUnit 4; ensure any references to Assert methods
remain valid (or switch to static imports from org.junit.Assert if preferred).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In
`@test/src/test/java/org/zstack/test/core/thread/DispatchQueuePerHostSignatureTest.java`:
- Around line 25-57: Extract the magic numbers in OrderedTask into named
constants: replace the hardcoded 10 in letFinish.await(10, TimeUnit.SECONDS)
with a timeout constant (e.g., DEFAULT_WAIT_SECONDS or TASK_AWAIT_SECONDS) and
replace the literal 2 returned by getSyncLevel() with a named
SYNCHRONIZATION_LEVEL (or similar) so the meaning is clear; update the
OrderedTask constructor/class to declare these private static final constants
and use them in call() and getSyncLevel() to improve readability and
maintainability.
- Around line 66-91: Extract repeated timeout literals into named constants and
replace the inline numbers in sameSignatureSerializes: create constants such as
DEFAULT_WAIT_SECONDS (used for started1.await(..., TimeUnit.SECONDS),
started2.await(..., TimeUnit.SECONDS), f1.get(..., TimeUnit.SECONDS),
f2.get(..., TimeUnit.SECONDS)) and SHORT_WAIT_MILLIS (used for
started2.await(500, TimeUnit.MILLISECONDS)); update the calls in OrderedTask
submissions and assertions to use these constants so timeouts are centralized
and maintainable.
- Line 3: Replace the JUnit 3 import with the JUnit 4 Assert import: remove the
existing import of junit.framework.Assert and import org.junit.Assert instead so
the test class DispatchQueuePerHostSignatureTest (which uses `@Before/`@Test
annotations) consistently uses JUnit 4; ensure any references to Assert methods
remain valid (or switch to static imports from org.junit.Assert if preferred).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: a88df3a4-68ef-4c43-8283-e638a2a6c1a1

📥 Commits

Reviewing files that changed from the base of the PR and between ad52e6f and b996d6c.

📒 Files selected for processing (2)
  • compute/src/main/java/org/zstack/compute/host/HostBase.java
  • test/src/test/java/org/zstack/test/core/thread/DispatchQueuePerHostSignatureTest.java
✅ Files skipped from review due to trivial changes (1)
  • compute/src/main/java/org/zstack/compute/host/HostBase.java

Renamed from TestDispatchQueuePerHostSignature to follow project
naming convention (Test suffix, not prefix).

Resolves: ZSTAC-80543
@MatheMatrix MatheMatrix force-pushed the sync/songtao.liu/fix/ZSTAC-80543-v2 branch from b996d6c to f56f00d Compare May 19, 2026 11:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants