Skip to content

test: add deepseek v4 decode cases#693

Merged
zhangstevenunity merged 1 commit into
mainfrom
codex/deepseek-v4-decode-cases-v2
May 22, 2026
Merged

test: add deepseek v4 decode cases#693
zhangstevenunity merged 1 commit into
mainfrom
codex/deepseek-v4-decode-cases-v2

Conversation

@HecreReed
Copy link
Copy Markdown
Collaborator

@HecreReed HecreReed commented May 22, 2026

Summary

  • vendor DeepSeek V4 decode raw .pto cases for both A3 and A5 from hw-native-sys/pypto-lib models/deepseek/v4 at be3c7942420b48fbab4ab1150edbc4ca8a125b94
  • wire runop.sh so these direct .pto cases compile with the same A3/A5 arch and --pto-level=level3 defaults as the existing Qwen decode cases
  • add custom standalone golden scripts for the imported rope-pack kernels and fix testcase generation so these cases use the real full-buffer sizes plus block_idx=0 / block_num=32 defaults

Notes

  • the checked-in .pto files are copied directly from PyPTO raw PTO output and are not hand-edited
  • all seven imported DeepSeek cases lower to the same standalone rope-pack/writeback kernel shape, so the custom golden is shared across the wrappers in each sample directory

Validation

  • PTOAS_BIN=/Users/laoda/pto/PTOAS-deepseek-v4/build/tools/ptoas/ptoas PTOAS_OUT_DIR=/tmp/deepseek_v4_runop_a3 bash test/samples/runop.sh -t DeepseekV4DecodeA3
  • PTOAS_BIN=/Users/laoda/pto/PTOAS-deepseek-v4/build/tools/ptoas/ptoas PTOAS_OUT_DIR=/tmp/deepseek_v4_runop_a5 bash test/samples/runop.sh -t DeepseekV4DecodeA5
  • regenerated all A3/A5 validation cases with test/npu_validation/scripts/generate_testcase.py and executed each generated golden.py
  • python3 -m py_compile test/npu_validation/scripts/generate_testcase.py test/samples/DeepseekV4DecodeA3/*.py test/samples/DeepseekV4DecodeA5/*.py
  • verified every checked-in .pto matches the corresponding PyPTO-exported source file byte-for-byte

@reedhecre
Copy link
Copy Markdown

reedhecre commented May 22, 2026

Codex Review

该评论由 review 机器人自动更新。

  • PR: test: add deepseek v4 decode cases #693 test: add deepseek v4 decode cases
  • Author: HecreReed
  • Base/Head: main / codex/deepseek-v4-decode-cases-v2
  • Head SHA: 28a4bf23f94a
  • Trigger: PR 有新提交
  • Generated At: 2026-05-22T06:43:43Z
  • Previous Head SHA: db398dcb3105
  • Status: completed

Summary

发现 1 个兼容性问题:新加的 DeepseekV4Decode A3/A5 样例在 Ascend910_95* 这类 A5 SOC 标识下会被错误分流。

Findings

  1. P2 DeepseekV4Decode 的 A3/A5 SOC 判定漏掉了 `Ascend910_95*` test/samples/runop.sh:265

这里把新目录并入了 Qwen 的目录判定,但判定条件仍然只认 *a5*/*950*。仓库其它位置已经把 Ascend910_95* 当作 A5 处理(例如 test/samples/validation_runtime.pyis_a5_soc())。因此在 SOC_VERSION=Ascend910_9599 之类环境下,DeepseekV4DecodeA5 会被错误地跳过为“requires A5 target SOC”,而 DeepseekV4DecodeA3 反而不会被拦住。这会让这次新增的 Deepseek V4 样例在受支持的 A5 命名上无法正确跑通。

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds DeepSeek V4 decode PTO kernels for A3 and A5 architectures, including the associated .pto files and documentation. The test generation infrastructure is updated to support boolean scalar defaults, and the runop.sh script is modified to incorporate these new test directories. Feedback focuses on simplifying conditional logic in both the Python and shell scripts, specifically recommending the use of wildcard pattern matching in runop.sh to reduce repetition and reusing existing variables instead of re-calculating them.

if t == "bool":
value = "true"
bool_override = _bool_scalar_default_value(testcase, p["name"])
value = "true" if bool_override is None else ("true" if bool_override else "false")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The boolean string assignment can be simplified. Since bool_override is Optional[bool], and the default behavior (when None) is true, you only need to check for the False case explicitly.

Suggested change
value = "true" if bool_override is None else ("true" if bool_override else "false")
value = "false" if bool_override is False else "true"

Comment thread test/samples/runop.sh
use_ptobc_roundtrip=1
fi
if [[ "$A" == "Qwen3DecodeA3" || "$A" == "Qwen3DecodeA5" ]]; then
if [[ "$A" == "Qwen3DecodeA3" || "$A" == "Qwen3DecodeA5" || "$A" == "DeepseekV4DecodeA3" || "$A" == "DeepseekV4DecodeA5" ]]; then
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The list of model directories is becoming repetitive and hard to maintain. Using wildcard pattern matching would simplify this and future additions.

Suggested change
if [[ "$A" == "Qwen3DecodeA3" || "$A" == "Qwen3DecodeA5" || "$A" == "DeepseekV4DecodeA3" || "$A" == "DeepseekV4DecodeA5" ]]; then
if [[ "$A" == *DecodeA[35] ]]; then

Comment thread test/samples/runop.sh
done
fi
if [[ "$A" == "Qwen3DecodeA5" ]]; then
if [[ "$A" == "Qwen3DecodeA5" || "$A" == "DeepseekV4DecodeA5" ]]; then
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This condition can be simplified using pattern matching to avoid explicitly listing every model directory.

Suggested change
if [[ "$A" == "Qwen3DecodeA5" || "$A" == "DeepseekV4DecodeA5" ]]; then
if [[ "$A" == *DecodeA5 ]]; then

Comment thread test/samples/runop.sh
ptoas_flags+=(--pto-level=level3)
fi
elif [[ "$A" == "Qwen3DecodeA3" ]]; then
elif [[ "$A" == "Qwen3DecodeA3" || "$A" == "DeepseekV4DecodeA3" ]]; then
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This condition can be simplified using pattern matching.

Suggested change
elif [[ "$A" == "Qwen3DecodeA3" || "$A" == "DeepseekV4DecodeA3" ]]; then
elif [[ "$A" == *DecodeA3 ]]; then

Comment thread test/samples/runop.sh
for qwen_case in "$dir"/*.pto; do
[[ -f "$qwen_case" ]] || continue
case "$qwen_case" in
if [[ ( "$A" == "Qwen3DecodeA5" || "$A" == "DeepseekV4DecodeA5" ) && "$(printf '%s' "$target_arch" | tr '[:upper:]' '[:lower:]')" != "a5" ]]; then
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In addition to using pattern matching for the directory name, you should use the target_arch_lc variable defined on line 224 instead of re-calculating it with printf and tr.

Suggested change
if [[ ( "$A" == "Qwen3DecodeA5" || "$A" == "DeepseekV4DecodeA5" ) && "$(printf '%s' "$target_arch" | tr '[:upper:]' '[:lower:]')" != "a5" ]]; then
if [[ "$A" == *DecodeA5 && "${target_arch_lc}" != "a5" ]]; then

Comment thread test/samples/runop.sh
decoded_pto="${out_subdir}/${base}-roundtrip.pto"
cpp="${out_subdir}/${base}.cpp"
if [[ "$A" == "Qwen3DecodeA3" || "$A" == "Qwen3DecodeA5" ]]; then
if [[ "$A" == "Qwen3DecodeA3" || "$A" == "Qwen3DecodeA5" || "$A" == "DeepseekV4DecodeA3" || "$A" == "DeepseekV4DecodeA5" ]]; then
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Simplify this condition using pattern matching to align with the rest of the script.

Suggested change
if [[ "$A" == "Qwen3DecodeA3" || "$A" == "Qwen3DecodeA5" || "$A" == "DeepseekV4DecodeA3" || "$A" == "DeepseekV4DecodeA5" ]]; then
if [[ "$A" == *DecodeA[35] ]]; then

@HecreReed HecreReed force-pushed the codex/deepseek-v4-decode-cases-v2 branch from db398dc to 28a4bf2 Compare May 22, 2026 06:34
@HecreReed
Copy link
Copy Markdown
Collaborator Author

/review

@reedhecre
Copy link
Copy Markdown

Manual Codex Review

该评论由 /review 手动触发。

Summary

收到 /review,正在执行 Codex review。

Findings

Review in progress.

@HecreReed
Copy link
Copy Markdown
Collaborator Author

/run a5 attention_csa_test_refresh_incore_81,attention_hca_test_incore_54,attention_swa_test_incore_40,decode_csa_test_incore_81,decode_hca_test_incore_54,decode_swa_test_incore_40,sparse_attn_test_incore_7

@reedhecre
Copy link
Copy Markdown

已接收 /run a5 attention_csa_test_refresh_incore_81 attention_hca_test_incore_54 attention_swa_test_incore_40 decode_csa_test_incore_81 decode_hca_test_incore_54 decode_swa_test_incore_40 sparse_attn_test_incore_7,A5 板测器会处理这条请求。

  • 进度页:http://154.9.227.233/ptoas-board-dashboard/#board-a5
  • 当前状态:板测器空闲,这条请求会在本轮轮询启动。
  • 指定用例:attention_csa_test_refresh_incore_81,attention_hca_test_incore_54,attention_swa_test_incore_40,decode_csa_test_incore_81,decode_hca_test_incore_54,decode_swa_test_incore_40,sparse_attn_test_incore_7

页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。

@reedhecre
Copy link
Copy Markdown

A5 板测成功

  • 触发方式:manual
  • 源码提交:87bb4421500b
  • 结果汇总:OK 7 / FAIL 0 / SKIP 0
  • 日志:/root/ptoas-board-monitor-a5/logs/20260522_151805_manual_pr693.log
  • 结果 TSV:/root/ptoas-board-monitor-a5/logs/20260522_151805_manual_pr693.tsv
  • 手动指令:/run a5 attention_csa_test_refresh_incore_81 attention_hca_test_incore_54 attention_swa_test_incore_40 decode_csa_test_incore_81 decode_hca_test_incore_54 decode_swa_test_incore_40 sparse_attn_test_incore_7
  • 触发人:HecreReed
  • 指定用例:attention_csa_test_refresh_incore_81,attention_hca_test_incore_54,attention_swa_test_incore_40,decode_csa_test_incore_81,decode_hca_test_incore_54,decode_swa_test_incore_40,sparse_attn_test_incore_7
  • 触发评论:test: add deepseek v4 decode cases #693 (comment)

@HecreReed HecreReed marked this pull request as ready for review May 22, 2026 07:23
@HecreReed
Copy link
Copy Markdown
Collaborator Author

/run a3 attention_csa_test_refresh_incore_81,attention_hca_test_incore_54,attention_swa_test_incore_40,decode_csa_test_incore_81,decode_hca_test_incore_54,decode_swa_test_incore_40,sparse_attn_test_incore_7

@reedhecre
Copy link
Copy Markdown

已接收 /run a3 attention_csa_test_refresh_incore_81 attention_hca_test_incore_54 attention_swa_test_incore_40 decode_csa_test_incore_81 decode_hca_test_incore_54 decode_swa_test_incore_40 sparse_attn_test_incore_7,A3 板测器会处理这条请求。

  • 进度页:http://154.9.227.233/ptoas-board-dashboard/#board-a3
  • 当前状态:板测器空闲,这条请求会在本轮轮询启动。
  • 指定用例:attention_csa_test_refresh_incore_81,attention_hca_test_incore_54,attention_swa_test_incore_40,decode_csa_test_incore_81,decode_hca_test_incore_54,decode_swa_test_incore_40,sparse_attn_test_incore_7

页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。

@reedhecre
Copy link
Copy Markdown

Manual Codex Review

该评论由 /review 手动触发。

Summary

未检查到 PR #693 存在问题

Findings

No issues found.

@reedhecre
Copy link
Copy Markdown

A3 板测成功

  • 触发方式:manual
  • 源码提交:87bb4421500b
  • 结果汇总:OK 7 / FAIL 0 / SKIP 0
  • 日志:/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260522_152505_manual_pr693.log
  • 结果 TSV:/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260522_152505_manual_pr693.tsv
  • 手动指令:/run a3 attention_csa_test_refresh_incore_81 attention_hca_test_incore_54 attention_swa_test_incore_40 decode_csa_test_incore_81 decode_hca_test_incore_54 decode_swa_test_incore_40 sparse_attn_test_incore_7
  • 触发人:HecreReed
  • 指定用例:attention_csa_test_refresh_incore_81,attention_hca_test_incore_54,attention_swa_test_incore_40,decode_csa_test_incore_81,decode_hca_test_incore_54,decode_swa_test_incore_40,sparse_attn_test_incore_7
  • 触发评论:test: add deepseek v4 decode cases #693 (comment)

@zhangstevenunity zhangstevenunity merged commit 4c8cf0e into main May 22, 2026
14 checks passed
@reedhecre
Copy link
Copy Markdown

A5 板测失败

  • 触发方式:merged
  • 源码提交:4c8cf0e92db1
  • 结果汇总:OK 13 / FAIL 1 / SKIP 0
  • 日志:/root/ptoas-board-monitor-a5/logs/20260522_160506_merged_pr693.log
  • 失败阶段:board-validation-qwen / exit=1

失败用例

  • qwen3_decode_incore_6 (run, exit=139)

@reedhecre
Copy link
Copy Markdown

A3 板测失败

  • 触发方式:merged
  • 源码提交:4c8cf0e92db1
  • 结果汇总:OK 216 / FAIL 2 / SKIP 2
  • 日志:/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260522_160405_merged_pr693.log
  • 失败阶段:board-validation / exit=1

失败用例

  • syncall_binding (run, exit=2)
  • tprefetch_async_binding (run, exit=2)

@reedhecre
Copy link
Copy Markdown

A3 板测失败详情:PR #693

syncall_binding

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260522_160405_merged_pr693/npu_validation/SyncAll/syncall_binding/syncall_binding_kernel.cpp:87:11: error: use of undeclared identifier 'SyncAllMode'
  SYNCALL<SyncAllMode::Soft, SyncCoreType::AIVOnly>(v8, v9, v2);
          ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260522_160405_merged_pr693/npu_validation/SyncAll/syncall_binding/syncall_binding_kernel.cpp:88:11: error: use of undeclared identifier 'SyncAllMode'
  SYNCALL<SyncAllMode::Soft, SyncCoreType::Mix>(v8, v9, v10, v2);
          ^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260522_160405_merged_pr693/npu_validation/SyncAll/syncall_binding/syncall_binding_kernel.cpp:89:11: error: use of undeclared identifier 'SyncCoreType'
  SYNCALL<SyncCoreType::Mix>();
          ^
3 errors generated.
gmake[2]: *** [CMakeFiles/syncall_binding_kernel.dir/build.make:76: CMakeFiles/syncall_binding_kernel.dir/syncall_binding_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/syncall_binding_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-05-22 16:29:14] ERROR: testcase failed (exit 2): syncall_binding
tprefetch_async_binding

stage=run info=exit=2

/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260522_160405_merged_pr693/npu_validation/TPrefetchAsync/tprefetch_async_binding/tprefetch_async_binding_kernel.cpp:88:8: error: no type named 'PrefetchAsyncContext' in namespace 'pto'
  pto::PrefetchAsyncContext v8 = pto::PrefetchAsyncContext((__gm__ uint8_t*) v2);
  ~~~~~^
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260522_160405_merged_pr693/npu_validation/TPrefetchAsync/tprefetch_async_binding/tprefetch_async_binding_kernel.cpp:88:39: error: no member named 'PrefetchAsyncContext' in namespace 'pto'
  pto::PrefetchAsyncContext v8 = pto::PrefetchAsyncContext((__gm__ uint8_t*) v2);
                                 ~~~~~^
2 errors generated.
gmake[2]: *** [CMakeFiles/tprefetch_async_binding_kernel.dir/build.make:76: CMakeFiles/tprefetch_async_binding_kernel.dir/tprefetch_async_binding_kernel.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:85: CMakeFiles/tprefetch_async_binding_kernel.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
[2026-05-22 16:29:47] ERROR: testcase failed (exit 2): tprefetch_async_binding

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants