Not used routinely, but it is possible to shadow the BASIC ROM on the Amstrad CPC 464 (1984). That ROM is normally located at C000h, shared with bank-switched RAM storing the display at the same address. You could copy the ROM to the RAM and permanently switch in the RAM. (You had to do another trick to move the display RAM elsewhere too, but it was all possible by writing to the correct gate array registers[1])
Which allowed you to do stuff like modifying the error messages in BASIC to say rude words, not that we would have ever done that.
> it is possible to shadow the BASIC ROM on the Amstrad CPC 464 (1984).
The ROM and RAM run at the same speed on that machine, so speed gain is not a motivation.
> You could copy the ROM to the RAM and permanently switch in the RAM.
Permanently? In principle you might be able to do that, but it's certainly not easy. The firmware switches ROMs/RAM all the time (lower ROM 300 times per second, higher ROM often when BASIC runs). You would need to patch the routines that perform ROM switching to make them switch only lower ROM. Fortunately those routines are already copied in central RAM because they need to operate whatever the ROM configuration, so it is possible to patch them. Still, the patched version would probably be more complex, so you'd need to borrow some more bytes elsewhere. Also make sure that no change is needed to the first 64 bytes (address range 0x0000-0x003F) because when ROM is enabled there they don't have your RAM-based changes.
> That ROM is normally located at C000h, shared with bank-switched RAM storing the display at the same address.
Beware, the optional external disk drive on the CPC464 adds another ROM at the sames address. So, if you want to keep compatibility, do patch the routine right, so that they do bank-switch that ROM, but when targeting BASIC ROM they actually re-enable RAM (not the same hardware registers as explained on the reference you posted).
> Which allowed you to do stuff like modifying the error messages in BASIC to say rude words,
After all that hard work is done, yeah you could replace some strings with other strings of the same length, keeping the most significant bit set as separator, why not. But most strings aren't even stored as is, they are encoded in various ways to save bytes (Basic instruction names, for example), typically only portions of words are in plain 7-bit ASCII.
> You had to do another trick to move the display RAM elsewhere too
More than one trick as shown above.
The welcome message is in lower ROM, so hey, let's go crazy like "I don't need a reason, let's shadow all the ROM". The simplest bet would be to start with a CPC464 without a disk drive, copy both ROMs to RAM, tell firmware to use screen at 0x4000 instead of 0xC000 (easy compared to the rest), patch all areas that define low addresses to shift these by 0x8000 more (much less easy). Then you'd have a system running, with your free RAM for Basic programs around 10k instead of 42k. But hey, the welcome message on system "reset" would be yours.
Can someone explain why ROM was slower? I would have guessed it was faster.
Is there some technical limitation? Or was it more of a “can can shadow the ROM so save the extra few cents on faster ROM” thing? Just that no one was pushing as hard as for faster RAM.
It depends on the individual system, and as you expect there are systems where accessing ROM is equivalent to RAM speed wise.
You mainly first began to see a speed disparity with full 32 bit systems which had a 32 bit data path to memory. It'd be common to still only have a 16 bit datapath to ROM, necessitating the hardware to split 32 bit transfers in two when accessing ROM.
This disparity ended up diverging further as the CPU only had to hit the north bridge (or more recently the memory controller in uncore on the CPU die) but practically having to go all the way out to some of the slowest, legacy interfaces like LPC to hit ROM.
The last paragraph is the result of shadow rom working and cost savings. Of course firmware is now in SPI rom, because it's read once and SPI roms cost less than parallel roms and are easier to fit on the motherboard, and performance almost doesn't matter since they're not that big and again, only read once.
> The other reason is that the Deskpro 386 was almost certainly the first PC where ROM shadowing made a real difference. The Deskpro 386 had 32-bit RAM, significantly faster than 16-bit ROMs. That was not the case with 16-bit PC/AT compatibles with slower CPUs, slower RAM, and a 16-bit data path to both RAM and ROM. On the Deskpro 386, the system RAM was likely several times faster than ROM, and that’s what made ROM shadowing desirable.
Also it doesn't make sense to push for a faster ROM if it's wouldn't be utilized or benefit anything.
Let's suppose your PC system ROM was 64 KiB F000:0000 to F000:FFFF (F0000 to FFFFF linear). If it wasn't a 286 or higher, then it couldn't simply use protected<->real-mode virtual memory tricks to copy ROMs and remap memory pages. Instead, there would need to be a motherboard-assisted hardware mechanism to redirect the decoding of the top 4 bits matching 1111 from ROM to copy to and steal some RAM. It would probably be easiest to do this at boot (during POST) by temporarily mapping future shadow ROM (RAM) to RAM, copy the ROM->RAM, and then remap that RAM over the ROM. Let's borrow A000:0000 since we're in text mode 0x3 using 4 KiB of the EGA/VGA frame buffer at B800:0000 and assume there's no other adapter configured to use this area.
; copy 64 KiB from F000:0000 -> A000:0000
; registered not saved: AX CX SI DI
; registered saved: CS DS ES FLAGS
; 12 bytes of stack required
PUSH DS
PUSH ES
PUSHF
CLD ; technically, this isn't needed
; we're copying all 64 KiB and
; could wrap backwards
MOV AX, F000h
XOR SI, SI
MOV DS, AX
MOV AX, A000h
XOR DI, DI
MOV ES, AX
MOV CX, 8000h ; 64 KiB / (sizeof(word) == 2)
REP MOVSW ; DS:[SI] -> ES:[DI]
; 16 bits at a time until CX == 0
POPF
POP ES
POP DS
segment:offset addressing in real-mode
linear address = segment * 16 + offset
With the exception of some memory areas are mapped only by hardware decoding the segment but not the offset to where reads beyond offset FFF0 wrap around instead of pointing to adjacent memory.
AC00:FFFF (BBFFFh linearly) != B000:BFFF
There are variations of real mode on 286+ that use linear flat addressing outside of protected mode (it would be 16 MiB for 286 and 386SX, and 4 GiB max for 386+) by messing with the hidden masks of the segment registers by switching to protected mode temporarily. The downside is all of the system provided (ROM) real-mode interrupt handlers, OS, and drivers would need to be rewritten for this different addressing scheme. It's called unreal mode. Hypothetically, you could algorithmically transpile ROM code by parsing instructions and rewriting them. Real-mode interrupt handlers are stored in a table of 256 far pointers at 0000:0000 to 0000:03FFF.
The term Shadow RAM comes from the fact that it is at the same address as ROM, "shadowing" it.
Compaq's implementation was somewhat complicated. The more typical solution, for example with the C&T chipset, was to have all RAM be contiguous, leaving some parts of it unusable since it would overlap with the area reserved for ROMs, video memory and possibly other hardware.
If you only had 1 MB in total and wanted extended memory, there was the option to remap the extra 384K high, but then it could not be used to speed up ROM access. With more than 1 MB, that option wasn't provided by C&T since it made the memory mapping hardware simpler.
A control register in the chipset is used to direct only write accesses to RAM, while reads would still go to the ROM - so the BIOS code can copy itself by reading and then writing the same address range (DS:SI = ES:DI). Setting a different value in the control register would then enable read access to the RAM, while also write-protecting it.
Also, unreal mode requires a 386+. You couldn't switch the 286 back to real mode except via reset(*), and it wouldn't be able to use linear addresses anyway since the registers were still 16 bit :)
What was possible is to use LOADALL (0F 05) in order to set up the segment descriptor caches to address high memory. This is similar to how after reset, the code segment has a base address of FF0000, at the top of the 286's 16 MiB address space. When a segment register is reloaded, the base address is again set to the 8086 compatible value (segment * 16).
(*) there is an undocumented instruction (STOREALL, 0F 04) that is capable of leaving protected mode. However it also enters a special "real mode" that would later become SMM, in which the CPU uses alternate bus control pins. On the chips that you could buy from Intel/AMD, these only existed as unbonded test pads on the die.
What was called Shadow RAM on the BBC Micro was a bit different to what's described here - it was a technique to have display memory to be paged outside the normal memory map to allow more space for programs.
I think what you're describing might actually be what the Acorn 8-bit world called "sideways ROM/RAM" - the 16K between 0x8000 and 0xbfff was paged and different ROMs (or RAM, if you had the right add-on hardware) could be selected to select different pieces of utility/system software (eg filing systems and BASIC).
However, at least one peripheral I'm aware of for the BBC Micro did do the "copy ROM code to RAM then page out the ROM" trick this is talking about. The 6502 second processor had a small ROM containing startup and basic OS code. The hardware passes writes through to the RAM either way but there's a latch which initially puts reads through to ROM. The first thing that ROM does on reset is copy itself into RAM - you can see the original source code which was recently discovered in GitHub[0]. I'm not quite sure why it did this though.
Once loaded into RAM, the WORM latch is closed and the RAM becomes non-writable, and accessible at a different memory address[0], which is the same address other Amiga have a physical ROM mapped at.
While discussing RAM, can someone clear up another minor conundrum for me: How does a current x86 chipset map RAM modules to linear adresses? They can differ in size, so the mapping is probably dynamic? But I suppose an adder causes latency? Or is everything just mapped with big holes between modules, with CPU paging fixing up the mess?
Which allowed you to do stuff like modifying the error messages in BASIC to say rude words, not that we would have ever done that.
[1] https://www.cpcwiki.eu/imgs/f/f6/S968se02.pdf