fix: 9-16B struct return via XMM registers (P1, macOS Intel)#45
Merged
Conversation
Per SysV AMD64 ABI, structs {SSE, SSE} (e.g. NSPoint) return
in XMM0:XMM1. The assembly already saved XMM0 at offset 128 (f1)
but discarded XMM1. Save XMM1 at offset 136 (f2) so CallNFloat
can expose it as a second float return register.
CallNFloat gains a fourth return value f2 (XMM1 bit pattern) to
support {SSE, SSE} 9-16B struct returns (e.g. NSPoint on macOS).
handleReturn gains fret/fret2 parameters so the Execute path can
supply both XMM return registers; call_windows.go passes zeros
since syscall.SyscallN does not capture XMM0/XMM1 (known gap).
All existing handleReturn unit tests updated to pass 0,0 for the
new float parameters.
…0Xmm1 Four new return-flag constants for SysV AMD64 9-16B struct returns. The two-eightbyte classification (INTEGER vs SSE) determines which register pair the callee uses; handleReturn will switch on these flags to reconstruct the struct from the correct register slots. Values 10-13 sit between the scalar flags (0-9) and the bit-field flags (1<<10+) so they cannot collide with either group.
Previously all structs sized 9-16B fell through to ReturnViaPointer
which is wrong — those sizes are returned in register pairs per
SysV AMD64 ABI §3.2.3.
classifyReturnAMD64 now calls classifyEightbyte() for each of the
two eightbytes and selects one of four modes:
ReturnStXmm0Xmm1 — {SSE, SSE} (e.g. NSPoint: double+double)
ReturnStXmm0Rax — {SSE, INTEGER}
ReturnStRaxXmm0 — {INTEGER, SSE}
ReturnStRaxRdx — {INTEGER, INTEGER}
Structs > 16B continue to use ReturnViaPointer (sret).
TestClassifyReturnAMD64 extended with four 9-16B struct cases.
Replace the single RAX:RDX path with a switch on cif.Flags that dispatches to the correct register pair: ReturnStXmm0Xmm1 — both eightbytes from XMM0:XMM1 (NSPoint fix) ReturnStXmm0Rax — first from XMM0, second from RAX ReturnStRaxXmm0 — first from RAX, second from XMM0 ReturnStRaxRdx — both from RAX:RDX (default / legacy) The default branch preserves backward compatibility for callers that do not set Flags (e.g. call_windows.go which passes 0,0 for floats). TestHandleReturnSSEStructs added: four sub-tests covering each mode with concrete double / int64 combinations.
Add four C functions to testdata/structtest.c that return 16-byte structs
covering all four SysV AMD64 eightbyte register combinations:
- return_struct_2doubles: {double,double} → XMM0:XMM1
- return_struct_int_float: {int64,double} → RAX:XMM0
- return_struct_float_int: {double,int64} → XMM0:RAX
- return_struct_2ints: {int64,int64} → RAX:RDX
Add four E2E tests in struct_e2e_test.go that verify CIF flag assignment
and correct field reconstruction from the assembled register pairs.
XMM-return tests skip on Windows (syscall.SyscallN limitation).
Also apply gofmt alignment to return flag constants in types/types.go.
b63d673 to
39aa138
Compare
…r SSE struct return (TASK-045) - CHANGELOG: 9-16B XMM return modes, 4-way classification - README: struct return feature table updated - ARCHITECTURE: 4-mode return table with flag names - PERFORMANCE: struct return + callback struct args comparison rows - ROADMAP: v0.5.0 released, v0.5.1 pending with all recent features
39aa138 to
a1465b4
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
classifyReturnAMD64 now checks runtime.GOOS == windows and returns ReturnViaPointer for all structs >8B on Windows. Unit tests and e2e tests updated with platform-aware expectations.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes P1 critical: macOS Intel invisible window caused by corrupted NSPoint/NSSize returns. 9-16B structs with float fields return in XMM0:XMM1 per System V ABI — goffi was misclassifying them as sret and reading RAX:RDX.
f2after CALL (+1 line)CallNFloatreturnsf2 float64(XMM1)ReturnStRaxRdx,ReturnStRaxXmm0,ReturnStXmm0Rax,ReturnStXmm0Xmm1classifyReturnAMD64— per-eightbyte SSE/INTEGER for 9-16B structsfret/fret2{f64,f64},{i64,f64},{f64,i64},{i64,i64}— all verifiedADR: ADR-007 | Research: SSE_STRUCT_RETURN_ANALYSIS.md
Test plan
go build ./...— all 6 platformsgo test ./ffi ./types ./internal/...— no regressionsCGO_ENABLED=1 go test -race— all pass (WSL Linux)go fmt/golangci-lint— clean