First ROM Shadowing

rwmj · on April 23, 2023

Not used routinely, but it is possible to shadow the BASIC ROM on the Amstrad CPC 464 (1984). That ROM is normally located at C000h, shared with bank-switched RAM storing the display at the same address. You could copy the ROM to the RAM and permanently switch in the RAM. (You had to do another trick to move the display RAM elsewhere too, but it was all possible by writing to the correct gate array registers[1])

Which allowed you to do stuff like modifying the error messages in BASIC to say rude words, not that we would have ever done that.

[1] https://www.cpcwiki.eu/imgs/f/f6/S968se02.pdf

s_gourichon · on April 24, 2023

With a nod to https://en.wikipedia.org/wiki/Ward_Cunningham#Cunningham's_L...

> it is possible to shadow the BASIC ROM on the Amstrad CPC 464 (1984).

The ROM and RAM run at the same speed on that machine, so speed gain is not a motivation.

> You could copy the ROM to the RAM and permanently switch in the RAM.

Permanently? In principle you might be able to do that, but it's certainly not easy. The firmware switches ROMs/RAM all the time (lower ROM 300 times per second, higher ROM often when BASIC runs). You would need to patch the routines that perform ROM switching to make them switch only lower ROM. Fortunately those routines are already copied in central RAM because they need to operate whatever the ROM configuration, so it is possible to patch them. Still, the patched version would probably be more complex, so you'd need to borrow some more bytes elsewhere. Also make sure that no change is needed to the first 64 bytes (address range 0x0000-0x003F) because when ROM is enabled there they don't have your RAM-based changes.

> That ROM is normally located at C000h, shared with bank-switched RAM storing the display at the same address.

Beware, the optional external disk drive on the CPC464 adds another ROM at the sames address. So, if you want to keep compatibility, do patch the routine right, so that they do bank-switch that ROM, but when targeting BASIC ROM they actually re-enable RAM (not the same hardware registers as explained on the reference you posted).

> Which allowed you to do stuff like modifying the error messages in BASIC to say rude words,

After all that hard work is done, yeah you could replace some strings with other strings of the same length, keeping the most significant bit set as separator, why not. But most strings aren't even stored as is, they are encoded in various ways to save bytes (Basic instruction names, for example), typically only portions of words are in plain 7-bit ASCII.

> You had to do another trick to move the display RAM elsewhere too

More than one trick as shown above.

The welcome message is in lower ROM, so hey, let's go crazy like "I don't need a reason, let's shadow all the ROM". The simplest bet would be to start with a CPC464 without a disk drive, copy both ROMs to RAM, tell firmware to use screen at 0x4000 instead of 0xC000 (easy compared to the rest), patch all areas that define low addresses to shift these by 0x8000 more (much less easy). Then you'd have a system running, with your free RAM for Basic programs around 10k instead of 42k. But hey, the welcome message on system "reset" would be yours.

> not that we would have ever done that.

Definitely. Thanks for the ride.

MBCook · on April 22, 2023

Can someone explain why ROM was slower? I would have guessed it was faster.

Is there some technical limitation? Or was it more of a “can can shadow the ROM so save the extra few cents on faster ROM” thing? Just that no one was pushing as hard as for faster RAM.

monocasa · on April 22, 2023

It depends on the individual system, and as you expect there are systems where accessing ROM is equivalent to RAM speed wise.

You mainly first began to see a speed disparity with full 32 bit systems which had a 32 bit data path to memory. It'd be common to still only have a 16 bit datapath to ROM, necessitating the hardware to split 32 bit transfers in two when accessing ROM.

This disparity ended up diverging further as the CPU only had to hit the north bridge (or more recently the memory controller in uncore on the CPU die) but practically having to go all the way out to some of the slowest, legacy interfaces like LPC to hit ROM.

toast0 · on April 23, 2023

The last paragraph is the result of shadow rom working and cost savings. Of course firmware is now in SPI rom, because it's read once and SPI roms cost less than parallel roms and are easier to fit on the motherboard, and performance almost doesn't matter since they're not that big and again, only read once.

justsomehnguy · on April 22, 2023

> The other reason is that the Deskpro 386 was almost certainly the first PC where ROM shadowing made a real difference. The Deskpro 386 had 32-bit RAM, significantly faster than 16-bit ROMs. That was not the case with 16-bit PC/AT compatibles with slower CPUs, slower RAM, and a 16-bit data path to both RAM and ROM. On the Deskpro 386, the system RAM was likely several times faster than ROM, and that’s what made ROM shadowing desirable.

Also it doesn't make sense to push for a faster ROM if it's wouldn't be utilized or benefit anything.

1letterunixname · on April 23, 2023

Let's suppose your PC system ROM was 64 KiB F000:0000 to F000:FFFF (F0000 to FFFFF linear). If it wasn't a 286 or higher, then it couldn't simply use protected<->real-mode virtual memory tricks to copy ROMs and remap memory pages. Instead, there would need to be a motherboard-assisted hardware mechanism to redirect the decoding of the top 4 bits matching 1111 from ROM to copy to and steal some RAM. It would probably be easiest to do this at boot (during POST) by temporarily mapping future shadow ROM (RAM) to RAM, copy the ROM->RAM, and then remap that RAM over the ROM. Let's borrow A000:0000 since we're in text mode 0x3 using 4 KiB of the EGA/VGA frame buffer at B800:0000 and assume there's no other adapter configured to use this area.

    ; copy 64 KiB from F000:0000 -> A000:0000
    ; registered not saved: AX CX SI DI
    ; registered saved: CS DS ES FLAGS
    ; 12 bytes of stack required

    PUSH DS
    PUSH ES
    PUSHF

    CLD           ; technically, this isn't needed
                  ; we're copying all 64 KiB and
                  ; could wrap backwards

    MOV AX, F000h
    XOR SI, SI
    MOV DS, AX

    MOV AX, A000h
    XOR DI, DI
    MOV ES, AX

    MOV CX, 8000h  ; 64 KiB / (sizeof(word) == 2)

    REP MOVSW      ; DS:[SI] -> ES:[DI]
                   ; 16 bits at a time until CX == 0

    POPF
    POP ES
    POP DS

segment:offset addressing in real-mode

linear address = segment * 16 + offset

With the exception of some memory areas are mapped only by hardware decoding the segment but not the offset to where reads beyond offset FFF0 wrap around instead of pointing to adjacent memory.

AC00:FFFF (BBFFFh linearly) != B000:BFFF

There are variations of real mode on 286+ that use linear flat addressing outside of protected mode (it would be 16 MiB for 286 and 386SX, and 4 GiB max for 386+) by messing with the hidden masks of the segment registers by switching to protected mode temporarily. The downside is all of the system provided (ROM) real-mode interrupt handlers, OS, and drivers would need to be rewritten for this different addressing scheme. It's called unreal mode. Hypothetically, you could algorithmically transpile ROM code by parsing instructions and rewriting them. Real-mode interrupt handlers are stored in a table of 256 far pointers at 0000:0000 to 0000:03FFF.

rep_lodsb · on April 23, 2023

The term Shadow RAM comes from the fact that it is at the same address as ROM, "shadowing" it.

Compaq's implementation was somewhat complicated. The more typical solution, for example with the C&T chipset, was to have all RAM be contiguous, leaving some parts of it unusable since it would overlap with the area reserved for ROMs, video memory and possibly other hardware.

If you only had 1 MB in total and wanted extended memory, there was the option to remap the extra 384K high, but then it could not be used to speed up ROM access. With more than 1 MB, that option wasn't provided by C&T since it made the memory mapping hardware simpler.

A control register in the chipset is used to direct only write accesses to RAM, while reads would still go to the ROM - so the BIOS code can copy itself by reading and then writing the same address range (DS:SI = ES:DI). Setting a different value in the control register would then enable read access to the RAM, while also write-protecting it.

Also, unreal mode requires a 386+. You couldn't switch the 286 back to real mode except via reset(*), and it wouldn't be able to use linear addresses anyway since the registers were still 16 bit :)

What was possible is to use LOADALL (0F 05) in order to set up the segment descriptor caches to address high memory. This is similar to how after reset, the code segment has a base address of FF0000, at the top of the 286's 16 MiB address space. When a segment register is reloaded, the base address is again set to the 8086 compatible value (segment * 16).

(*) there is an undocumented instruction (STOREALL, 0F 04) that is capable of leaving protected mode. However it also enters a special "real mode" that would later become SMM, in which the CPU uses alternate bus control pins. On the chips that you could buy from Intel/AMD, these only existed as unbonded test pads on the die.

sbisson · on April 22, 2023

I had a third-party shadow ROM kit for a BBC Micro in 1984; it let me back-up and load add-on ROMs into its memory and switch between them quickly.

tomatocracy · on April 23, 2023

What was called Shadow RAM on the BBC Micro was a bit different to what's described here - it was a technique to have display memory to be paged outside the normal memory map to allow more space for programs.

I think what you're describing might actually be what the Acorn 8-bit world called "sideways ROM/RAM" - the 16K between 0x8000 and 0xbfff was paged and different ROMs (or RAM, if you had the right add-on hardware) could be selected to select different pieces of utility/system software (eg filing systems and BASIC).

However, at least one peripheral I'm aware of for the BBC Micro did do the "copy ROM code to RAM then page out the ROM" trick this is talking about. The 6502 second processor had a small ROM containing startup and basic OS code. The hardware passes writes through to the RAM either way but there's a latch which initially puts reads through to ROM. The first thing that ROM does on reset is copy itself into RAM - you can see the original source code which was recently discovered in GitHub[0]. I'm not quite sure why it did this though.

0. https://github.com/stardot/Acorn6502TubeROM/blob/master/src/...

smilespray · on April 22, 2023

The Amiga 1000 had to get the Kickstart ROM from floppy into RAM in 1985.

I think that counts.

monocasa · on April 22, 2023

I could see an argument either way.

Particularly since the data on the floppy isn't directly addressable, so it's not really shadowing another region.

cout · on April 23, 2023

I don't think that does count as shadowing, unless the floppy is mapped into memory and then the RAM is later mapped at the same address.

snvzz · on April 23, 2023

Once loaded into RAM, the WORM latch is closed and the RAM becomes non-writable, and accessible at a different memory address[0], which is the same address other Amiga have a physical ROM mapped at.

0. https://retrocomputing.stackexchange.com/questions/1140/how-...

ecpottinger · on April 23, 2023

Then you have to include the C64 where the ROM could be copied into RAM and then modified to allow extra functions.

antijava · on April 23, 2023

The TRS-80 CoCo used ROM shadowing to allow access to all 64K of RAM. The default mode was 32K RAM and 32K ROM.

FreeFull · on April 23, 2023

I'd say that's just banking rather than shadowing. The difference is that with shadowing, ROM contents are actually copied into RAM and kept there.

hyperman1 · on April 23, 2023

While discussing RAM, can someone clear up another minor conundrum for me: How does a current x86 chipset map RAM modules to linear adresses? They can differ in size, so the mapping is probably dynamic? But I suppose an adder causes latency? Or is everything just mapped with big holes between modules, with CPU paging fixing up the mess?

monocasa · on April 23, 2023

The mapping is more or less completely dynamic per channel. DRAM is already ~100 cycles away, so the latency of address decoding is pretty hidden.

gladiatr72 · on April 23, 2023

Nice write up. Hadn't thought about this since I was a kid.