Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Poorly, in my experience.

CUDA is compiled into PTX, an intermediate language. PTX is then compiled into a specific NVidia assembly language (often called SASS, though each SASS for each generation of cards is different). This way, NVidia can make huge changes to the underlying assembly code from generation-to-generation, but still have portability.

OpenCL, especially OpenCL 1.2, (which is the version of OpenCL that works on the widest set of cards), does not have an intermediate language. SPIR is an OpenCL2.+ concept.

This means that OpenCL 1.2 code is distributed in source and recompiled in practice. But that means that compiler errors can kill your code before it even runs. This is especially annoying because the OpenCL 1.2 compiler is part of a device-driver. Meaning if the end-user updates the device driver, the compiler may have a new bug (or old bug), that changes the behavior of your code.

-------------

This doesn't matter for DirectX, because like CUDA, Microsoft compiles DirectX into DXIR / DirectX intermediate language. And then has device drivers compile the intermediate-language into the final assembly code on a per-device basis.

-------------

It is this intermediate layer that AMD is missing, and IMO is the key to their problems in practice.

SPIR (OpenCL's standard intermediate layer) has spotty support across cards. I'm guessing NVidia knows that PTX intermediate language is their golden goose and doesn't want to offer good SPIR support. Microsoft probably prefers people to use DirectX / DXIR as well. So that leaves AMD and Intel as the only groups who could possibly push SPIR and align together. SPIR is a good idea, but I'm not sure if the politics will allow it to happen.



It's really difficult to tell whether the PTX layer approach is something AMD _should_ adopt. That's roughly what the (I think now abandoned) HSAIL thing was.

It's one where packaging concerns and compiler dev concerns are probably in tension. Compiling for N different GPUs is really annoying for library distribution and probably a factor in the shortish list of officially supported ROCm cards.

However translating between IRs is usually lossy so LLVM to PTX to SASS makes me nervous as a pipeline. Intel are doing LLVM to SPIRV to LLVM to machine code which can't be ideal. Maybe that's a workaround for LLVM's IR being unstable, but equally stability in IR comes at a development cost.

I think amdgpu should use a single llvm IR representation for multiple hardware revisions and specialise in the backend. That doesn't solve binary stability hazards but would take the edge off the packaging challenge. That seems to be most of the win spirv markets at much lower engineering cost.


OpenCL also gets compiled to PTX on Nvidia GPUs.


But as an OpenCL programmer, you don't distribute PTX intermediate code. You distribute OpenCL kernels around and recompile every time. That's more or less the practice.


True.

And the resulting PTX is worse when it's generated from OpenCL C instead of CUDA C. I tested that recently with a toy FFT kernel and the CUDA pipeline produced a lot more efficient FMA instructions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: