Skip to content

fix: 9-16B struct return via XMM registers (P1, macOS Intel)#45

Merged
kolkov merged 8 commits into
mainfrom
fix/sse-struct-return
May 13, 2026
Merged

fix: 9-16B struct return via XMM registers (P1, macOS Intel)#45
kolkov merged 8 commits into
mainfrom
fix/sse-struct-return

Conversation

@kolkov
Copy link
Copy Markdown
Contributor

@kolkov kolkov commented May 13, 2026

Summary

Fixes P1 critical: macOS Intel invisible window caused by corrupted NSPoint/NSSize returns. 9-16B structs with float fields return in XMM0:XMM1 per System V ABI — goffi was misclassifying them as sret and reading RAX:RDX.

  • asm: save XMM1 to f2 after CALL (+1 line)
  • syscall: CallNFloat returns f2 float64 (XMM1)
  • types: 4 new return constants: ReturnStRaxRdx, ReturnStRaxXmm0, ReturnStXmm0Rax, ReturnStXmm0Xmm1
  • classification: classifyReturnAMD64 — per-eightbyte SSE/INTEGER for 9-16B structs
  • handleReturn: dispatch on 4 return modes using fret/fret2
  • e2e tests: 4 C functions returning {f64,f64}, {i64,f64}, {f64,i64}, {i64,i64} — all verified
  • docs: CHANGELOG, README, ARCHITECTURE (4-mode table), PERFORMANCE, ROADMAP updated

ADR: ADR-007 | Research: SSE_STRUCT_RETURN_ANALYSIS.md

Test plan

  • go build ./... — all 6 platforms
  • go test ./ffi ./types ./internal/... — no regressions
  • E2E struct return tests — 4/4 pass (WSL Linux, gcc)
  • Unit handleReturn SSE tests — 4/4 pass
  • CGO_ENABLED=1 go test -race — all pass (WSL Linux)
  • go fmt / golangci-lint — clean
  • CI: lint, tests (3 OS × CGO=0/1), cross-compile, quality gate

kolkov added 6 commits May 13, 2026 22:19
Per SysV AMD64 ABI, structs {SSE, SSE} (e.g. NSPoint) return
in XMM0:XMM1. The assembly already saved XMM0 at offset 128 (f1)
but discarded XMM1. Save XMM1 at offset 136 (f2) so CallNFloat
can expose it as a second float return register.
CallNFloat gains a fourth return value f2 (XMM1 bit pattern) to
support {SSE, SSE} 9-16B struct returns (e.g. NSPoint on macOS).

handleReturn gains fret/fret2 parameters so the Execute path can
supply both XMM return registers; call_windows.go passes zeros
since syscall.SyscallN does not capture XMM0/XMM1 (known gap).

All existing handleReturn unit tests updated to pass 0,0 for the
new float parameters.
…0Xmm1

Four new return-flag constants for SysV AMD64 9-16B struct returns.
The two-eightbyte classification (INTEGER vs SSE) determines which
register pair the callee uses; handleReturn will switch on these
flags to reconstruct the struct from the correct register slots.

Values 10-13 sit between the scalar flags (0-9) and the bit-field
flags (1<<10+) so they cannot collide with either group.
Previously all structs sized 9-16B fell through to ReturnViaPointer
which is wrong — those sizes are returned in register pairs per
SysV AMD64 ABI §3.2.3.

classifyReturnAMD64 now calls classifyEightbyte() for each of the
two eightbytes and selects one of four modes:
  ReturnStXmm0Xmm1 — {SSE, SSE}     (e.g. NSPoint: double+double)
  ReturnStXmm0Rax  — {SSE, INTEGER}
  ReturnStRaxXmm0  — {INTEGER, SSE}
  ReturnStRaxRdx   — {INTEGER, INTEGER}

Structs > 16B continue to use ReturnViaPointer (sret).

TestClassifyReturnAMD64 extended with four 9-16B struct cases.
Replace the single RAX:RDX path with a switch on cif.Flags that
dispatches to the correct register pair:

  ReturnStXmm0Xmm1 — both eightbytes from XMM0:XMM1 (NSPoint fix)
  ReturnStXmm0Rax  — first from XMM0, second from RAX
  ReturnStRaxXmm0  — first from RAX, second from XMM0
  ReturnStRaxRdx   — both from RAX:RDX (default / legacy)

The default branch preserves backward compatibility for callers that
do not set Flags (e.g. call_windows.go which passes 0,0 for floats).

TestHandleReturnSSEStructs added: four sub-tests covering each mode
with concrete double / int64 combinations.
Add four C functions to testdata/structtest.c that return 16-byte structs
covering all four SysV AMD64 eightbyte register combinations:
  - return_struct_2doubles: {double,double} → XMM0:XMM1
  - return_struct_int_float: {int64,double} → RAX:XMM0
  - return_struct_float_int: {double,int64} → XMM0:RAX
  - return_struct_2ints: {int64,int64} → RAX:RDX

Add four E2E tests in struct_e2e_test.go that verify CIF flag assignment
and correct field reconstruction from the assembled register pairs.
XMM-return tests skip on Windows (syscall.SyscallN limitation).

Also apply gofmt alignment to return flag constants in types/types.go.
@kolkov kolkov force-pushed the fix/sse-struct-return branch from b63d673 to 39aa138 Compare May 13, 2026 20:02
…r SSE struct return (TASK-045)

- CHANGELOG: 9-16B XMM return modes, 4-way classification
- README: struct return feature table updated
- ARCHITECTURE: 4-mode return table with flag names
- PERFORMANCE: struct return + callback struct args comparison rows
- ROADMAP: v0.5.0 released, v0.5.1 pending with all recent features
@kolkov kolkov force-pushed the fix/sse-struct-return branch from 39aa138 to a1465b4 Compare May 13, 2026 20:03
@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

classifyReturnAMD64 now checks runtime.GOOS == windows and returns
ReturnViaPointer for all structs >8B on Windows. Unit tests and e2e
tests updated with platform-aware expectations.
@kolkov kolkov merged commit 4b6a3fc into main May 13, 2026
13 checks passed
@kolkov kolkov deleted the fix/sse-struct-return branch May 13, 2026 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant