Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A question about how nonstop worked:

If both processors are running the same process in the same state, why won't both processors hit the same error condition at the same time?

I understand there are random hardware faults that can happen, bits can flip, etc., but logic errors should be bug-for-bug the same on both processors.

So, were those random faults so frequent that the redundancy was worth it? Or am I missing something?



You'd probably be interested in their set of slides on techniques for constructing robust software [1]. It talks about this issue, among others. For one thing, the processors could easily be in different states due to resources each has access to, so the same code could behave differently on different processors. Another topic they touch on in the slides is the notion of having multiple implementations of the same program, with compatible inputs and outputs but different implementations written by different teams that did not communicate.

[1] https://www.fastonline.it/sites/default/files/2019-06/Robust... (I couldn't find a version on the stratus website any more, but this appears to be the same as the one I downloaded from their site many years ago).


Thank you!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: