These thing are so exciting, but as a regular web dev I always wonder what are s...

vajrabum · on June 6, 2021

Imagine that you are doing this at web scale. Instead of one web server you have 2000, and another 1000 middleware servers all talking to a distributed database cluster with another 500 machines. You can probably run your web servers close to stateless sharing sessions across redis. And your middleware worker machines can all run pub/sub so as a web dev you won't have this problem much. But the developers of your distributed key/value store, your distributed pub/sub queueing system and your database are going to be very different animals. Take a look at the last 10 years of Kyle Kingsbury's work with Jepsen and you'll see what I mean (https://aphyr.com/tags/jepsen). Opportunities for corruption, unavailability and lossage abound with distributed systems.

jonnadal · on June 6, 2021

Yep, Kyle has done an amazing job demonstrating how pervasive bugs are within real world distributed systems.

hwayne · on June 6, 2021

Imagine you're making a theatre booking system, where groups can reserve tickets in blocks. Lots of people are reserving tickets both individually and in groups at a time. How do you enforce that groups are mostly booked together? What about if there's a contiguous block when they start purchasing but then some other individuals want one of those seats before the group finishes purchasing? What if one person in the group needs to cancel: do you pick a seat in the middle and let an individual "split" an existing group block, or do you pick one off the edge and shuffle the seats? Does that changed if the group spans two rows? Do any of these decisions change if the tickets are "family" tickets instead of "group" tickets? What if some of the individual tickets are for "floating" tickets and not strictly assigned seats? What about wheelchair accessible seats?

Whatever things you end up deciding, does your system, where many people are reserving, cancelling, and re-reserving over long periods of time, guarantee all of the client's expectations?

politician · on June 6, 2021

Imagine that you are designing an electronic bank vault door. The product requirements state that the door should be secure when it’s closed and should only be able to be opened when authorized. Oh, in an emergency, the vault should fail closed unless there are people inside.

How would you prove to the client that your solution implements these requirements? Modeling the system using a language like TLA+ is one way of doing that.

gradschool · on June 6, 2021

How would I prove to the client that the implementation conforms to the model? Wouldn't it end up being written in some crappy programming language for which no one can agree on a formal semantics anyway, running on some grotesquely complicated processor family full of defects and bugs?

nivertech · on June 6, 2021

The model is to prove your design, not the implementation.

There is no reason to start implementing things, if your design is wrong.

Formal Specification proves your design, while Formal Verification proves your implementation.

As far as I understand this project - Stateright does both (i.e. 2-in-1). Kind of executable model?

jonnadal · on June 6, 2021

Regarding the last point — correct, Stateright aims to verify both.

It’s important to clarify that this doesn’t provide a proof of correctness, but it can dramatically improve confidence in both the design and implementation compared with fuzz testing, for example. This is done by exhaustively enumerating possible nondeterministic outcomes (e.g. due to message reordering) within specified constraints (e.g. up to S servers and C clients performing X operations…).

Examples:

SD Paxos: https://github.com/stateright/stateright/blob/master/example...

ABD (linearizable register algorithm): https://github.com/stateright/stateright/blob/master/example...

hwayne · on June 6, 2021

We've been working on some techniques for this. To my understanding, Stateright (the OP's repo) directly connects the spec with Rust code, so you can make some implementation guarantees.

If you don't have that (say you're using TLA+), one really promising technique is to generate behavioral traces and then refine those into code tests. I know a few companies have done this successfully, and I've been working on some examples of how it's done, but to my knowledge there's no battle-tested tooling that do this automatically. It's all bespoke for now.

faeyanpiraat · on June 6, 2021

Wow, these formal specification tools is something I always wanted to look into but the learning curve seems steep.

Having a specification which is guaranteed to be correct before implementation begins is a programmer’s wet dream.

hwayne · on June 6, 2021

> Wow, these formal specification tools is something I always wanted to look into but the learning curve seems steep.

One of my dreams is to flatten that learning curve. I don't think we'll ever make these tools effortless, but I think it's possible to reach a point where "these tools are too hard to learn" isn't a barrier anymore.

(Even then, I imagine a lot of people will learn some formal specification, and think "this isn't useful for me", and never touch them again. And that's fine. But I want anybody to be able to reach the point where they can make that decision, as opposed to feeling like they're locked out by default. We have a long way to go, but I think we're considerably further along than we were five years ago.)

> Having a specification which is guaranteed to be correct before implementation begins is a programmer’s wet dream.

I wouldn't say guaranteed to be correct, but at least it's, like, closer to correct? Less likely to have horrific bugs, or bizarre edge cases in the requirements that nobody notices until neck deep into the implementation. Stuff that increases founded confidence.

mullr · on June 6, 2021

As a regular web dev, you are constantly staring a distributed system in the face: it’s made of the program running on the server (back end), and the program running in the web browser (front end). How do those two programs agree on pieces of common information? How does the server deal with communications from other clients? At what cadence? How are the caches invalidated? Regular web dev is anything but simple, if you seriously consider the questions it poses. Tools like this help you think such questions.

woah · on June 7, 2021

If you’re building a CRUD app and using database transactions religiously, you can punt on a lot of these questions

yagizdegirmenci · on June 6, 2021

Imagine you are implementing a database engine and you want to make your database be able to run in run multiple nodes in order to make it fault-tolerant, therefore you need to implement some kind of consensus algorithm (Paxos, Raft) to select a leader and a followers, in case if some node fails, your entire cluster can keep working.

That said, implementing a consensus algorithm is quite hard. Modeling and testing is much harder, there is even a language just for this purpose; TLA+ [0].

[0]: https://lamport.azurewebsites.net/tla/tla.html

thesz · on June 7, 2021

Web page can be seen as a client to Paxos/Raft - a non-voting participant. It is a way to structure application for resilience, scalability and/or availability.

parhamn · on June 6, 2021

Wrong post?