If both processors are running the same process in the same state, why won't both processors hit the same error condition at the same time?
I understand there are random hardware faults that can happen, bits can flip, etc., but logic errors should be bug-for-bug the same on both processors.
So, were those random faults so frequent that the redundancy was worth it? Or am I missing something?
You'd probably be interested in their set of slides on techniques for constructing robust software [1]. It talks about this issue, among others. For one thing, the processors could easily be in different states due to resources each has access to, so the same code could behave differently on different processors. Another topic they touch on in the slides is the notion of having multiple implementations of the same program, with compatible inputs and outputs but different implementations written by different teams that did not communicate.
If both processors are running the same process in the same state, why won't both processors hit the same error condition at the same time?
I understand there are random hardware faults that can happen, bits can flip, etc., but logic errors should be bug-for-bug the same on both processors.
So, were those random faults so frequent that the redundancy was worth it? Or am I missing something?