I've been learning about computer architecture [1] and I've become comfortable with my understanding of how a processor communicates with main memory - be it directly, with the presence of caches or even virtual memory - and I/O peripherals.
But something that seems weirdly absent from the courses I took and what I have found online is how the CPU communicates with other processing units, such as GPUs - not only that, but an in-depth description of interconnecting different systems with buses (by in-depth I mean an RTL example/description).
I understand that as you add more hardware to a machine, complexity increases and software must intervene - so a generalistic answer won't exist and the answer will depend on the implementation being talked about. That's fine by me.
What I'm looking for is a description of how a CPU tells a GPU to start executing a program. Through what means do they communicate - a bus? How does such a communication instance look like?
I'd love get pointers to resources such as books and lectures that are more hands-on/implementation aware.
[1] Just so that my background knowledge is clear: I've concluded NAND2TETRIS, watched and concluded Berkeley's 2020 CS61C and have read a good chunk of H&P (both Computer Architecture: A Quantitative Approach and Computer Organization and Design: RISC-V edition), and now am moving on to Onur Mutlu's lectures on advanced computer architecture.
The GPU also has access to system memory through the PCIE bus. Typically, the CPU will construct buffers in memory with data (textures, vertices), commands, and GPU code. It will then store the buffer address in a GPU register and ring some sort of “doorbell” by writing to another GPU register. The GPU (specifically, the GPU command processor) will then read the buffers from system memory, and start executing the commands. Those commands can include, for example, loading GPU shader programs into shader memory and triggering the shaders to execute those shaders.