test: add deepseek v4 decode cases#693
Conversation
Codex Review该评论由 review 机器人自动更新。
Summary发现 1 个兼容性问题:新加的 DeepseekV4Decode A3/A5 样例在 Findings
这里把新目录并入了 Qwen 的目录判定,但判定条件仍然只认 |
There was a problem hiding this comment.
Code Review
This pull request adds DeepSeek V4 decode PTO kernels for A3 and A5 architectures, including the associated .pto files and documentation. The test generation infrastructure is updated to support boolean scalar defaults, and the runop.sh script is modified to incorporate these new test directories. Feedback focuses on simplifying conditional logic in both the Python and shell scripts, specifically recommending the use of wildcard pattern matching in runop.sh to reduce repetition and reusing existing variables instead of re-calculating them.
| if t == "bool": | ||
| value = "true" | ||
| bool_override = _bool_scalar_default_value(testcase, p["name"]) | ||
| value = "true" if bool_override is None else ("true" if bool_override else "false") |
There was a problem hiding this comment.
The boolean string assignment can be simplified. Since bool_override is Optional[bool], and the default behavior (when None) is true, you only need to check for the False case explicitly.
| value = "true" if bool_override is None else ("true" if bool_override else "false") | |
| value = "false" if bool_override is False else "true" |
| use_ptobc_roundtrip=1 | ||
| fi | ||
| if [[ "$A" == "Qwen3DecodeA3" || "$A" == "Qwen3DecodeA5" ]]; then | ||
| if [[ "$A" == "Qwen3DecodeA3" || "$A" == "Qwen3DecodeA5" || "$A" == "DeepseekV4DecodeA3" || "$A" == "DeepseekV4DecodeA5" ]]; then |
There was a problem hiding this comment.
The list of model directories is becoming repetitive and hard to maintain. Using wildcard pattern matching would simplify this and future additions.
| if [[ "$A" == "Qwen3DecodeA3" || "$A" == "Qwen3DecodeA5" || "$A" == "DeepseekV4DecodeA3" || "$A" == "DeepseekV4DecodeA5" ]]; then | |
| if [[ "$A" == *DecodeA[35] ]]; then |
| done | ||
| fi | ||
| if [[ "$A" == "Qwen3DecodeA5" ]]; then | ||
| if [[ "$A" == "Qwen3DecodeA5" || "$A" == "DeepseekV4DecodeA5" ]]; then |
| ptoas_flags+=(--pto-level=level3) | ||
| fi | ||
| elif [[ "$A" == "Qwen3DecodeA3" ]]; then | ||
| elif [[ "$A" == "Qwen3DecodeA3" || "$A" == "DeepseekV4DecodeA3" ]]; then |
| for qwen_case in "$dir"/*.pto; do | ||
| [[ -f "$qwen_case" ]] || continue | ||
| case "$qwen_case" in | ||
| if [[ ( "$A" == "Qwen3DecodeA5" || "$A" == "DeepseekV4DecodeA5" ) && "$(printf '%s' "$target_arch" | tr '[:upper:]' '[:lower:]')" != "a5" ]]; then |
There was a problem hiding this comment.
In addition to using pattern matching for the directory name, you should use the target_arch_lc variable defined on line 224 instead of re-calculating it with printf and tr.
| if [[ ( "$A" == "Qwen3DecodeA5" || "$A" == "DeepseekV4DecodeA5" ) && "$(printf '%s' "$target_arch" | tr '[:upper:]' '[:lower:]')" != "a5" ]]; then | |
| if [[ "$A" == *DecodeA5 && "${target_arch_lc}" != "a5" ]]; then |
| decoded_pto="${out_subdir}/${base}-roundtrip.pto" | ||
| cpp="${out_subdir}/${base}.cpp" | ||
| if [[ "$A" == "Qwen3DecodeA3" || "$A" == "Qwen3DecodeA5" ]]; then | ||
| if [[ "$A" == "Qwen3DecodeA3" || "$A" == "Qwen3DecodeA5" || "$A" == "DeepseekV4DecodeA3" || "$A" == "DeepseekV4DecodeA5" ]]; then |
There was a problem hiding this comment.
db398dc to
28a4bf2
Compare
|
/review |
Manual Codex Review该评论由
Summary收到 FindingsReview in progress. |
|
/run a5 attention_csa_test_refresh_incore_81,attention_hca_test_incore_54,attention_swa_test_incore_40,decode_csa_test_incore_81,decode_hca_test_incore_54,decode_swa_test_incore_40,sparse_attn_test_incore_7 |
|
已接收
页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。 |
A5 板测成功
|
|
/run a3 attention_csa_test_refresh_incore_81,attention_hca_test_incore_54,attention_swa_test_incore_40,decode_csa_test_incore_81,decode_hca_test_incore_54,decode_swa_test_incore_40,sparse_attn_test_incore_7 |
|
已接收
页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。 |
Manual Codex Review该评论由
Summary未检查到 PR #693 存在问题 FindingsNo issues found. |
A3 板测成功
|
A5 板测失败
失败用例
|
A3 板测失败
失败用例
|
A3 板测失败详情:PR #693syncall_binding
tprefetch_async_binding
|
Summary
.ptocases for both A3 and A5 fromhw-native-sys/pypto-libmodels/deepseek/v4atbe3c7942420b48fbab4ab1150edbc4ca8a125b94runop.shso these direct.ptocases compile with the same A3/A5 arch and--pto-level=level3defaults as the existing Qwen decode casesblock_idx=0/block_num=32defaultsNotes
.ptofiles are copied directly from PyPTO raw PTO output and are not hand-editedValidation
PTOAS_BIN=/Users/laoda/pto/PTOAS-deepseek-v4/build/tools/ptoas/ptoas PTOAS_OUT_DIR=/tmp/deepseek_v4_runop_a3 bash test/samples/runop.sh -t DeepseekV4DecodeA3PTOAS_BIN=/Users/laoda/pto/PTOAS-deepseek-v4/build/tools/ptoas/ptoas PTOAS_OUT_DIR=/tmp/deepseek_v4_runop_a5 bash test/samples/runop.sh -t DeepseekV4DecodeA5test/npu_validation/scripts/generate_testcase.pyand executed each generatedgolden.pypython3 -m py_compile test/npu_validation/scripts/generate_testcase.py test/samples/DeepseekV4DecodeA3/*.py test/samples/DeepseekV4DecodeA5/*.py.ptomatches the corresponding PyPTO-exported source file byte-for-byte