feat: support callback struct arguments on AMD64 #42
Conversation
Add support for callback struct by-value arguments on AMD64 Unix (System V ABI). It's the inverse of the struct type arg support in internal/arch/amd64/call_unix.go. Instead of marshalling from a struct to registers and the stack, data from registers and the stack are marshalled in to a struct.
Writing to valPtr, for example
*(*float64)(valPtr) = getFloat()
would result in a data race
checkptr: converted pointer straddles multiple allocations
Using a []byte to represent the struct value, and then copying the
slice in to a struct reflect.Value avoids the data race.
I'm a bit more confident now that I've added some end-to-end tests. |
kolkov
left a comment
There was a problem hiding this comment.
@pekim Thank you for implementing this — callback struct args are a significant feature, and your approach (all three size buckets + e2e tests with C library) is exactly right. purego doesn't support this, so goffi would be the first pure Go FFI library with callback struct args.
The getInt()/getFloat() refactoring is clean and the reflect.Type-based classification is well done. Four issues to fix before merge:
1. MEMORY class: double stackIdx increment (bug)
In the > 16B loop, the partial last chunk (bytesLeft < 8) calls getInt() which increments stackIdx, then the loop's stackIdx++ runs again — advancing by 2 instead of 1. Any argument following a >16B struct reads from the wrong slot.
// Current (line ~336):
chunk := frame[stackIdx]
if bytesLeft >= 8 {
*(*uintptr)(chunkPtr) = chunk
} else {
writePartial(chunkPtr, bytesLeft, getInt()) // getInt() does stackIdx++
}
stackIdx++ // second increment for partial chunkFix — use the pre-read chunk in both branches:
chunk := frame[stackIdx]
if bytesLeft >= 8 {
*(*uintptr)(chunkPtr) = chunk
} else {
writePartial(chunkPtr, bytesLeft, chunk) // use pre-read value
}
stackIdx++ // single incrementTo verify: add a test with MEMORY-class struct followed by a scalar argument (e.g., func(s TripleI64, extra int64)).
2. Buffer overrun in structData allocation (bug)
make([]byte, sz) allocates exactly sz bytes, but *(*float64)(valPtr) = getFloat() writes 8 bytes and *(*uintptr)(valPtr) writes 8 bytes. For a 4-byte struct (e.g., struct{float32}), this overwrites 4 bytes past the allocation.
Fix:
structData := make([]byte, max(sz, 8))This provides safe headroom. The final copy(valByteSlice, structData) already uses sz, so only the correct bytes are copied to the reflect.Value.
3. ARM64 build tag (crash)
callback_struct_args_test.go has //go:build (linux || darwin || freebsd) && (amd64 || arm64), but callback_arm64.go does not support reflect.Struct — NewCallback would panic on ARM64.
Fix: change to amd64 only:
//go:build (linux || darwin || freebsd) && amd644. Windows e2e tests (crash)
struct_e2e_test.go includes Windows in the build tag. The new TestCallbackStruct* tests call NewCallback with struct-accepting functions, but callback_windows.go doesn't support reflect.Struct → panic.
Fix: add skip at the top of each TestCallbackStruct* function:
if runtime.GOOS == "windows" {
t.Skip("callback struct args not supported on Windows")
}Nice to have (non-blocking)
-
classifyEightbytesimplification — consider usingfield.Offsetfromreflect.StructFieldinstead of manually recomputing alignment. Simpler and guaranteed to match Go's actual layout. -
VoidTypeDescriptor—TestCallbackStructArg8B_FloatPairand other callback tests useSInt64TypeDescriptoras return type forvoidC functions.VoidTypeDescriptorwould be more accurate. -
Nested struct classification —
isStructAllFloatsandclassifyEightbytedon't recurse into nested struct fields. This is fine for the initial implementation but worth a comment noting the limitation.
Overall: excellent contribution. The four fixes are straightforward — happy to help if you have questions on any of them.
It was unnecessay to call getInt for the partial chunk, as the chunk is already in the chunk variable. And the getInt function would increment stackIdx an extra time. Fix the tests that test MEMORY class with a final partial chunk. The structs used were all a multiple of 8 bytes rather than the intended size.
|
@kolkov , thank you for the clear and comprehensive review. It was very helpful
Suggested change implemented.
I don't think that such a test would be affected by this, as the extra scalar argument would be in an integer register not the stack. However I did notice that my tests intended to exercise the partial chunk code were wrong. Their structs' sizes were all multiples of 8 because of Go's struct field alignment and padding. So I've fixed those tests.
Suggested change implemented.
fixed
fixed
Nice suggestion, implemented.
Oops, the perils of copy and pasted. Fixed.
I added support for nested structs to When it came to commits I've left all of the fixes as separate commits for ease of reviewing. I can squash all of the PR's commits once you're happy with everything. |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Fixes #41
Adds support for callback struct by-value arguments on AMD64 Unix (System V ABI).
It's the inverse of the struct type arg support in internal/arch/amd64/call_unix.go.
Instead of marshalling from a struct to registers and the stack, data from
registers and the stack are marshalled in to a struct.
Pull Request Requirements
go fmt ./...)golangci-lint run)go test -race ./...)main, so I guess that my hardware is just slow. In any case it doesn't look like there's any regression.maincoverage of thegithub.com/go-webgpu/goffi/ffipackage is 87.3%. With this PR it's 91.0%.I've tried to broadly follow the practices of the project. But I'm still getting familiar with it, so it's quite possible that I may have deviated too far in places.
I've tried to follow the instructions in https://github.com/go-webgpu/goffi/blob/main/CONTRIBUTING.md where applicable. But I've ignored the references to the
developbranch, as it appears to be quite out of date. So I've targetedmain.tests
I'm also somewhat new to the intricacies of the function calling ABI. So while all of the tests that I've added pass, it wouldn't surprise me if I have any of the tests wrong. I have tried the implementation out with the use of clang_visitChildren that was my original use case that prompted #41. And it appears to work, so I have some confidence that the implementation might be broadly on the right lines.
data race
There was a data race when writing to the
reflect.Valueused for a new struct. For example at https://github.com/pekim/goffi/blob/bugfix/fix-issue-41/ffi/callback.go#L306.Commit pekim@50eb215 works around this. But perhaps there's a better approach?
overhead
The use of a
[]byteand the subsequentcopy(to address the data race) will introduce an overhead. But I've not yet managed to find a better solution.