Skip to content
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- **Classification: >16B structs** — `classifyArgumentAMD64` now correctly returns zero register usage for MEMORY class structs (previously claimed GP registers)
- **Classification: mixed eightbyte** — per-eightbyte SSE/INTEGER classification now walks all members with INTEGER-wins merge rule per System V ABI
- **Deprecated `reflect.Ptr`** — replaced with `reflect.Pointer` in callback validation (PR [#38](https://github.com/go-webgpu/goffi/pull/38), flagged by golangci-lint v2.12.1)
- **AMD64: 9-16B struct return via XMM registers** — structs like `{float64, float64}` (NSPoint, NSSize, CGPoint, CGSize) now correctly return via XMM0:XMM1 per System V ABI. Previously misclassified as sret (hidden pointer), producing corrupted values on macOS Intel. Four return modes now supported: RAX:RDX, RAX:XMM0, XMM0:RAX, XMM0:XMM1 (TASK-045)

### Added
- `CGO_ENABLED=1` support ([#13](https://github.com/go-webgpu/goffi/issues/13), PR [#37](https://github.com/go-webgpu/goffi/pull/37) by [@jiyeyuran](https://github.com/jiyeyuran)) — goffi now builds and tests under both `CGO_ENABLED=0` (fakecgo) and `CGO_ENABLED=1` (real `runtime/cgo`). Enables race detector, coexistence with CGO libraries (gocv, database drivers, etc.), and resolves [#22](https://github.com/go-webgpu/goffi/issues/22) duplicate symbol conflict as alternative workaround
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ ffi.CallFunction(cif, sym, unsafe.Pointer(&result), args)
| **Cross-platform** | 7 targets | Windows, Linux, macOS, FreeBSD × AMD64 + ARM64 |
| **Callbacks** | C→Go safe | `crosscall2` integration, struct args, works from any C thread |
| **Type-safe** | Runtime validation | 5 typed error types with `errors.As()` support |
| **Struct pass/return** | Full ABI | Args: INTEGER/SSE classification. Returns: ≤8B (RAX), 9–16B (RAX+RDX), >16B (sret) |
| **Struct pass/return** | Full ABI | Args: INTEGER/SSE classification. Returns: ≤8B (RAX/XMM0), 9–16B (4 modes: RAX/XMM × RAX/XMM), >16B (sret) |
| **Context** | Timeouts | `CallFunctionContext(ctx, ...)` cancellation |
| **Race detector** | `-race` compatible | `CGO_ENABLED=1 go test -race` works cleanly |
| **Tested** | 89% coverage | CI on Linux, Windows, macOS (CGO=0 and CGO=1) |
Expand Down
34 changes: 26 additions & 8 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
> **Strategic Approach**: Build production-ready Zero-CGO FFI with benchmarked performance
> **Philosophy**: Performance first, usability second, platform coverage third

**Last Updated**: 2026-03-02 | **Current Version**: v0.4.1 | **Strategy**: Benchmarks → Callbacks → ARM64 → Runtime → API → v1.0 LTS | **Milestone**: v0.4.1 (ABI compliance) → v0.5.0 Usability → v1.0.0 LTS
**Last Updated**: 2026-05-13 | **Current Version**: v0.5.0 (v0.5.1 pending) | **Strategy**: Benchmarks → Callbacks → ARM64 → Runtime → ABI → v1.0 LTS | **Milestone**: v0.5.1 (struct ABI + CGO=1) → v0.6.0 Variadic/Builder → v1.0.0 LTS

---

Expand Down Expand Up @@ -56,11 +56,15 @@ v0.3.9 (CALLBACK FIXES) ✅ RELEASED 2026-02-18
↓ (runtime integration)
v0.4.0 (CROSSCALL2 INTEGRATION) ✅ RELEASED 2026-02-27
↓ (usability)
v0.5.0 (USABILITY + VARIADIC) → 2026 Q2-Q3
v0.5.0 (PLATFORM COVERAGE) ✅ RELEASED 2026-03-29
↓ (struct ABI + CGO support)
v0.5.1 (STRUCT ABI + CGO_ENABLED=1) → 2026-05 (pending tag)
↓ (variadic + builder API)
v0.6.0 (VARIADIC + BUILDER API) → 2026 Q3
↓ (advanced features)
v0.8.0 (ADVANCED FEATURES) → 2026 Q3-Q4
↓ (community adoption + validation)
v1.0.0 LTS → Long-term support release (Q1 2026)
v1.0.0 LTS → Long-term support release (2027 Q1)
```

### Critical Milestones
Expand Down Expand Up @@ -119,22 +123,36 @@ v1.0.0 LTS → Long-term support release (Q1 2026)
- Struct return 9-16 bytes, sret hidden pointer, HFA stack spill
- Overflow detection, `runtime.KeepAlive` safety

**v0.5.0** = Usability + Variadic (2026 Q2-Q3)
**v0.5.0** = Platform coverage ✅ RELEASED (2026-03-29)
- **Windows ARM64** support (Snapdragon X Elite, tested by @SideFx)
- **FreeBSD amd64** support (cross-compile verified)
- 7 platform targets (Linux/Windows/macOS/FreeBSD × amd64 + ARM64)

**v0.5.1** = Struct ABI + CGO_ENABLED=1 (2026-05, pending tag)
- **CGO_ENABLED=1 support** (PR #37 by @jiyeyuran) — dual-mode build, race detector compatible
- **Struct by-value argument passing** (PR #39, closes #33) — ≤8B/9-16B/>16B, INTEGER/SSE classification
- **Callback struct arguments** (PR #42 by @pekim, closes #41) — C→Go callbacks with struct args
- **9-16B struct return via XMM** (TASK-045) — 4 return modes: RAX:RDX, RAX:XMM0, XMM0:RAX, XMM0:XMM1
- **Race detector** — checkptr double-indirection fix (Go #58625), `-race` clean
- **E2E test infrastructure** — gcc-compiled C test library for struct passing verification
- Contributors: @jiyeyuran (CGO path maintainer), @pekim (callback structs)

**v0.6.0** = Variadic + Builder API (2026 Q3)
- Builder pattern API
- Platform-specific struct handling
- **Variadic function support** (printf, sprintf, etc.)
- RegisterFunc convenience API

**v1.0.0** = Long-term support release (Q1 2026)
**v1.0.0** = Long-term support release (2027 Q1)
- API stability guarantee
- Security audit
- Published benchmarks vs CGO/purego
- 3+ years LTS support

---

## 📊 Current Status (v0.4.1)
## 📊 Current Status (v0.5.1)

**Phase**: ABI compliance audit complete, forward call path fully verified
**Phase**: Struct ABI complete, CGO_ENABLED=1 supported, SSE struct return fixed

**What Works**:
- ✅ Dynamic library loading (`LoadLibrary`, `GetSymbol`, `FreeLibrary`)
Expand Down
13 changes: 11 additions & 2 deletions docs/ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -191,8 +191,17 @@ Classification uses `reflect.Type` (not `types.TypeDescriptor`) since callback s

ABI rules for returning structs depend on size:

- **≤ 8 bytes**: returned in RAX (AMD64) or X0 (ARM64)
- **9-16 bytes** (AMD64): split across RAX (low 8) + RDX (high 8)
- **≤ 8 bytes**: returned in RAX (INTEGER) or XMM0 (SSE) on AMD64, X0 or D0 on ARM64
- **9-16 bytes** (AMD64): two eightbytes, each returned in GP or XMM per classification. Four modes:

| Struct layout | Eightbyte 0 | Eightbyte 1 | Registers | Flag |
|---|---|---|---|---|
| `{int64, int64}` | INTEGER | INTEGER | RAX + RDX | `ReturnStRaxRdx` |
| `{int64, float64}` | INTEGER | SSE | RAX + XMM0 | `ReturnStRaxXmm0` |
| `{float64, int64}` | SSE | INTEGER | XMM0 + RAX | `ReturnStXmm0Rax` |
| `{float64, float64}` | SSE | SSE | XMM0 + XMM1 | `ReturnStXmm0Xmm1` |

Classification is computed at CIF-prepare time (`classifyReturnAMD64` using `classifyEightbyte`), stored in `cif.Flags`, and dispatched in `handleReturn`. This matches libffi's `UNIX64_RET_ST_*` pattern.
- **> 16 bytes**: caller passes a hidden pointer as the first argument (sret)

Implementation in `internal/arch/amd64/implementation.go`:
Expand Down
2 changes: 2 additions & 0 deletions docs/PERFORMANCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,8 @@ FFI overhead: 0.0001ms = 0.001% ✅
| **Type Safety** | ✅ TypeDescriptor validation | Go reflect.Type |
| **Error Handling** | ✅ 5 typed errors | Generic errors |
| **Callback float returns** | ✅ XMM0 in asm | ❌ panic |
| **Struct return 9-16B** | ✅ 4 modes (RAX/XMM × RAX/XMM) | ✅ 4 modes (f1/f2 + a1/a2) |
| **Callback struct args** | ✅ ≤8B, 9-16B, >16B | ❌ panic |
| **ARM64 HFA** | Recursive struct walk | Partial recursive (bug in nested path) |
| **Context support** | ✅ Timeouts/cancellation | ❌ |
| **Platforms** | 5 (quality focus) | 9+ (breadth focus) |
Expand Down
207 changes: 207 additions & 0 deletions ffi/struct_e2e_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -531,3 +531,210 @@ func TestCallbackStructArgWithScalar(t *testing.T) {
t.Errorf("expected %#v %d, received %#v %d", expected, extra, receivedArg1, receivedArg2)
}
}

// TestStructReturn16B_TwoDoubles verifies that {double, double} is returned in XMM0:XMM1.
// This is the NSPoint / NSSize case on macOS Intel — the primary motivation for TASK-045.
// SysV AMD64 ABI: both eightbytes are SSE class → ReturnStXmm0Xmm1.
func TestStructReturn16B_TwoDoubles(t *testing.T) {
if runtime.GOOS == "windows" {
t.Skip("Windows: XMM struct returns not captured by syscall.SyscallN")
}
requireStructLib(t)

sym, err := GetSymbol(structTestLib, "return_struct_2doubles")
if err != nil {
t.Fatal(err)
}

// {double, double} — both SSE → ReturnStXmm0Xmm1
structType := &types.TypeDescriptor{
Kind: types.StructType,
Size: 16,
Alignment: 8,
Members: []*types.TypeDescriptor{
types.DoubleTypeDescriptor,
types.DoubleTypeDescriptor,
},
}

var cif types.CallInterface
if err := PrepareCallInterface(&cif, types.DefaultCall, structType,
[]*types.TypeDescriptor{types.DoubleTypeDescriptor, types.DoubleTypeDescriptor}); err != nil {
t.Fatal(err)
}

if cif.Flags != types.ReturnStXmm0Xmm1 {
t.Fatalf("expected cif.Flags = ReturnStXmm0Xmm1 (%d), got %d", types.ReturnStXmm0Xmm1, cif.Flags)
}

type PairF64 struct{ A, B float64 }

a := 1.5
b := 2.5
args := []unsafe.Pointer{unsafe.Pointer(&a), unsafe.Pointer(&b)}
var result PairF64
if err := CallFunction(&cif, sym, unsafe.Pointer(&result), args); err != nil {
t.Fatal(err)
}

if result.A != a || result.B != b {
t.Errorf("return_struct_2doubles(%f, %f) = {%f, %f}, want {%f, %f}",
a, b, result.A, result.B, a, b)
}
}

// TestStructReturn16B_IntFloat verifies that {int64, double} returns in RAX:XMM0.
// SysV AMD64 ABI: eightbyte0 INTEGER (RAX), eightbyte1 SSE (XMM0) → ReturnStRaxXmm0.
func TestStructReturn16B_IntFloat(t *testing.T) {
if runtime.GOOS == "windows" {
t.Skip("Windows: XMM struct returns not captured by syscall.SyscallN")
}
requireStructLib(t)

sym, err := GetSymbol(structTestLib, "return_struct_int_float")
if err != nil {
t.Fatal(err)
}

// {int64, double} — INTEGER + SSE → ReturnStRaxXmm0
structType := &types.TypeDescriptor{
Kind: types.StructType,
Size: 16,
Alignment: 8,
Members: []*types.TypeDescriptor{
types.SInt64TypeDescriptor,
types.DoubleTypeDescriptor,
},
}

var cif types.CallInterface
if err := PrepareCallInterface(&cif, types.DefaultCall, structType,
[]*types.TypeDescriptor{types.SInt64TypeDescriptor, types.DoubleTypeDescriptor}); err != nil {
t.Fatal(err)
}

if cif.Flags != types.ReturnStRaxXmm0 {
t.Fatalf("expected cif.Flags = ReturnStRaxXmm0 (%d), got %d", types.ReturnStRaxXmm0, cif.Flags)
}

type MixedIntFloat struct {
A int64
B float64
}

a := int64(42)
b := 3.14
args := []unsafe.Pointer{unsafe.Pointer(&a), unsafe.Pointer(&b)}
var result MixedIntFloat
if err := CallFunction(&cif, sym, unsafe.Pointer(&result), args); err != nil {
t.Fatal(err)
}

if result.A != a || result.B != b {
t.Errorf("return_struct_int_float(%d, %f) = {%d, %f}, want {%d, %f}",
a, b, result.A, result.B, a, b)
}
}

// TestStructReturn16B_FloatInt verifies that {double, int64} returns in XMM0:RAX.
// SysV AMD64 ABI: eightbyte0 SSE (XMM0), eightbyte1 INTEGER (RAX) → ReturnStXmm0Rax.
func TestStructReturn16B_FloatInt(t *testing.T) {
if runtime.GOOS == "windows" {
t.Skip("Windows: XMM struct returns not captured by syscall.SyscallN")
}
requireStructLib(t)

sym, err := GetSymbol(structTestLib, "return_struct_float_int")
if err != nil {
t.Fatal(err)
}

// {double, int64} — SSE + INTEGER → ReturnStXmm0Rax
structType := &types.TypeDescriptor{
Kind: types.StructType,
Size: 16,
Alignment: 8,
Members: []*types.TypeDescriptor{
types.DoubleTypeDescriptor,
types.SInt64TypeDescriptor,
},
}

var cif types.CallInterface
if err := PrepareCallInterface(&cif, types.DefaultCall, structType,
[]*types.TypeDescriptor{types.DoubleTypeDescriptor, types.SInt64TypeDescriptor}); err != nil {
t.Fatal(err)
}

if cif.Flags != types.ReturnStXmm0Rax {
t.Fatalf("expected cif.Flags = ReturnStXmm0Rax (%d), got %d", types.ReturnStXmm0Rax, cif.Flags)
}

type MixedFloatInt struct {
A float64
B int64
}

a := 2.71828
b := int64(100)
args := []unsafe.Pointer{unsafe.Pointer(&a), unsafe.Pointer(&b)}
var result MixedFloatInt
if err := CallFunction(&cif, sym, unsafe.Pointer(&result), args); err != nil {
t.Fatal(err)
}

if result.A != a || result.B != b {
t.Errorf("return_struct_float_int(%f, %d) = {%f, %d}, want {%f, %d}",
a, b, result.A, result.B, a, b)
}
}

// TestStructReturn16B_TwoInts verifies that {int64, int64} returns in RAX:RDX.
// SysV AMD64 ABI: both eightbytes INTEGER → ReturnStRaxRdx.
func TestStructReturn16B_TwoInts(t *testing.T) {
if runtime.GOOS == "windows" {
t.Skip("Windows: 16B struct returns use sret, not RAX:RDX (Win64 ABI)")
}
requireStructLib(t)

sym, err := GetSymbol(structTestLib, "return_struct_2ints")
if err != nil {
t.Fatal(err)
}

// {int64, int64} — both INTEGER → ReturnStRaxRdx
structType := &types.TypeDescriptor{
Kind: types.StructType,
Size: 16,
Alignment: 8,
Members: []*types.TypeDescriptor{
types.SInt64TypeDescriptor,
types.SInt64TypeDescriptor,
},
}

var cif types.CallInterface
if err := PrepareCallInterface(&cif, types.DefaultCall, structType,
[]*types.TypeDescriptor{types.SInt64TypeDescriptor, types.SInt64TypeDescriptor}); err != nil {
t.Fatal(err)
}

if cif.Flags != types.ReturnStRaxRdx {
t.Fatalf("expected cif.Flags = ReturnStRaxRdx (%d), got %d", types.ReturnStRaxRdx, cif.Flags)
}

type PairI64 struct{ A, B int64 }

a := int64(1000000)
b := int64(2000000)
args := []unsafe.Pointer{unsafe.Pointer(&a), unsafe.Pointer(&b)}
var result PairI64
if err := CallFunction(&cif, sym, unsafe.Pointer(&result), args); err != nil {
t.Fatal(err)
}

if result.A != a || result.B != b {
t.Errorf("return_struct_2ints(%d, %d) = {%d, %d}, want {%d, %d}",
a, b, result.A, result.B, a, b)
}
}
30 changes: 30 additions & 0 deletions ffi/testdata/structtest.c
Original file line number Diff line number Diff line change
Expand Up @@ -54,3 +54,33 @@ void callback_struct_and_int(int32_t a, uint32_t b, int64_t extra,
struct pair_i32_u32 s = {.a = a, .b = b};
callback(s, extra);
}

// Struct RETURN functions — test XMM0:XMM1 / RAX:RDX register pair selection.
// {double, double}: SysV AMD64 ABI returns this in XMM0:XMM1 (SSE, SSE).
// Models NSPoint / NSSize on macOS Intel.
struct pair_f64 { double a; double b; };
struct pair_f64 return_struct_2doubles(double a, double b) {
struct pair_f64 s = {.a = a, .b = b};
return s;
}

// {int64, double}: eightbyte0 INTEGER (RAX), eightbyte1 SSE (XMM0).
struct mixed_int_float { int64_t a; double b; };
struct mixed_int_float return_struct_int_float(int64_t a, double b) {
struct mixed_int_float s = {.a = a, .b = b};
return s;
}

// {double, int64}: eightbyte0 SSE (XMM0), eightbyte1 INTEGER (RAX).
struct mixed_float_int { double a; int64_t b; };
struct mixed_float_int return_struct_float_int(double a, int64_t b) {
struct mixed_float_int s = {.a = a, .b = b};
return s;
}

// {int64, int64}: both INTEGER, returned in RAX:RDX.
struct return_pair_i64 { int64_t a; int64_t b; };
struct return_pair_i64 return_struct_2ints(int64_t a, int64_t b) {
struct return_pair_i64 s = {.a = a, .b = b};
return s;
}
Loading
Loading