Currently specifically Metal and Vulkan allocate memory in a way that makes them incompatible with persistently mapping CPU pointers for convenient access without extra calls.
This may have been done for performance reasons, but reading or writing currently requires careful use of the map() and unmap() semantics after #1113. Specifically Metal requires an unmap() to make CPU writes visible to the GPU (MTLBuffer::didModifyRange()), whereas Vulkan only supports map()ing readback buffers after the GPU-fence (with dst=HOST_READ barrier) has been waited on so that it can perform the vkInvalidateMappedMemoryRanges() call. This split in semantics can become extra cumbersome when developing cross-platform tests, as one might easily miss this mis-match between flushing semantics in both backends. I did not yet take Dx12 into account.
If this sort of performance-split is desired, it simply means the existing CpuToGpu/GpuToCpu abstraction isn't expressive enough when it requires such platform divergence. It also doesn't work very well for UMA for example, where often times most if not all buffers are DEVICE_LOCAL | HOST_VISIBLE and likely also cached and coherent with no reduced aperture.
Currently specifically Metal and Vulkan allocate memory in a way that makes them incompatible with persistently mapping CPU pointers for convenient access without extra calls.
This may have been done for performance reasons, but reading or writing currently requires careful use of the
map()andunmap()semantics after #1113. Specifically Metal requires anunmap()to make CPU writes visible to the GPU (MTLBuffer::didModifyRange()), whereas Vulkan only supportsmap()ing readback buffers after the GPU-fence (withdst=HOST_READbarrier) has been waited on so that it can perform thevkInvalidateMappedMemoryRanges()call. This split in semantics can become extra cumbersome when developing cross-platform tests, as one might easily miss this mis-match between flushing semantics in both backends. I did not yet take Dx12 into account.If this sort of performance-split is desired, it simply means the existing
CpuToGpu/GpuToCpuabstraction isn't expressive enough when it requires such platform divergence. It also doesn't work very well for UMA for example, where often times most if not all buffers areDEVICE_LOCAL | HOST_VISIBLEand likely also cached and coherent with no reduced aperture.