Poorly, in my experience. CUDA is compiled into PTX, an intermediate language. P...

JonChesterfield · on June 2, 2023

It's really difficult to tell whether the PTX layer approach is something AMD _should_ adopt. That's roughly what the (I think now abandoned) HSAIL thing was.

It's one where packaging concerns and compiler dev concerns are probably in tension. Compiling for N different GPUs is really annoying for library distribution and probably a factor in the shortish list of officially supported ROCm cards.

However translating between IRs is usually lossy so LLVM to PTX to SASS makes me nervous as a pipeline. Intel are doing LLVM to SPIRV to LLVM to machine code which can't be ideal. Maybe that's a workaround for LLVM's IR being unstable, but equally stability in IR comes at a development cost.

I think amdgpu should use a single llvm IR representation for multiple hardware revisions and specialise in the backend. That doesn't solve binary stability hazards but would take the edge off the packaging challenge. That seems to be most of the win spirv markets at much lower engineering cost.

KeplerBoy · on June 2, 2023

OpenCL also gets compiled to PTX on Nvidia GPUs.

dragontamer · on June 2, 2023

But as an OpenCL programmer, you don't distribute PTX intermediate code. You distribute OpenCL kernels around and recompile every time. That's more or less the practice.

KeplerBoy · on June 2, 2023

True.

And the resulting PTX is worse when it's generated from OpenCL C instead of CUDA C. I tested that recently with a toy FFT kernel and the CUDA pipeline produced a lot more efficient FMA instructions.