Isn't this the most convenient setup though? I'm most familiar with async/await in UI programming and you most often have a main thread for synchronization. You want to assume that most of your main thread work is synchronous and non-yielding until you explicitly yield. Seems like it would be a lot harder to use main thread synchronization in the style your suggesting.
Maybe I just can't imagine it. Whats a good language that shows off the style you're suggesting?
>Whats a good language that shows off the style you're suggesting?
javascript.
What you describe sounds like native UI work since forever before javascript. "Don't block the main thread" and all that.
Javascript is diferent in that it's a single-thread with an event loop. Synchronous functions execute until they end. Asynchronous functions are handled by the event loop which "loops" between the pool and runs each one for some time, then switches to other, concurrenly (think round robin). What happens when the runtime is running an asynchronous function and inside it reaches a synchronous one? it stops round-robin and executes this function until it ends.
What OP wants is a language like javascript but without having to write code distinguishing synchronous and asynchronous functions and instead having some other tool to tell the runtime when a function is synchronous or asynchronous without having to write it again.
Yes, I realize all this. My question is how you can have such a system and still keep UI thread synchronization without having the opposite problem of marking all your synchronous methods.
in a strict language? I don't think it's possible. Because if you take a better look you'll see that it's not enough to mark functions as sync or async since inside the functions each line of code can be considered a synchronous function in it's own.
What you want is something like Haskell that's lazy and its not about "executing statements" but rather "evaluting expressions".
Not the OP, but Go doesn't have this problem because all I/O is async under the hood, but it exposes a sync interface. This means the entire Go ecosystem is bought into a single concurrency model and runtime, which some find irksome, but it works pretty well most of the time. Of course, Go also lacks Rust's static safety features, but I think that's orthogonal to its concurrency approach.
What you tried wasn't "this", though. It was one particular implementation of lightweight threading that has to cope with Rust's peculiarities, special requirements and compilation targets. There is absolutely nothing essential about lightweight threads that prevents them from emitting essentially the same code as the stackless-coroutine approach. It's just that in Rust it might be very hard or even not worth it, given the language's target audience.
I don't understand what your objection is. It's a given that what I wrote applies to Rust. This is a thread about Rust. I didn't say that M:N threading is always slower than 1:1.
Besides, fibers don't emit essentially the same code as async code. One has a stack, and the other doesn't. That's a significant difference.
If the stack could be sufficiently small, it's not that different from heap allocated async state. But you probably needs segmented stacks, or at least separate stack async preemptible or non-async-preemptible code (has anyone tried making a system like this?)
It's isn't a given that M:N threading is slower than 1:1 threading even in Rust. A particular implementation you tried exhibited that behavior.
> One has a stack, and the other doesn't. That's a significant difference.
They both have some memory area to which they write state. Calling it "a stack" refers to the abstraction in the programmer's mind, not to how the memory is actually written/read. It is true that in order to support recursion, a thread might need to dynamically allocate memory, but so would async/await, except it'll make it more explicit.
> It's isn't a given that M:N threading is slower than 1:1 threading even in Rust. A particular implementation you tried exhibited that behavior.
I don't see any way around the problems of segmented stacks and FFI. There is no way to implement stack growth by reallocating stacks and rewriting pointers in Rust, even in theory. It would break too much code: there is a lot of unsafe (and even safe!) code out there that assumes that stack pointer addresses are stable. In fact, async/await in Rust had to introduce a new explicit pinning concept in order to solve this exact problem while remaining backwards compatible. And when calling the FFI, you have to switch to a big stack, which was an insurmountable performance problem. Rust code by its nature is FFI-heavy; it's part of the niche that Rust finds itself in.
You can make what are virtually zero-cost copies from what you call a "big stack" to a resizable stack with virtual memory tricks. You don't even need to copy the entire stack, but cleverly rewrite the return address stored on the stack to do this kind of "code-switching". But it does mean doing backend manipulations in a platform-dependent way. There are several good ways to do this, none of them particularly easy. What is perhaps impossible is allowing FFI code to block the lightweight thread, but async/await doesn't solve this, either.
> You can make what are virtually zero-cost copies from what you call a "big stack" to a resizable stack with virtual memory tricks. You don't even need to copy the entire stack, but cleverly rewrite the return address stored on the stack to do this kind of "code-switching".
OK. It really is hard when you're what you call "FFI-heavy" and don't like a significant runtime. So Rust has several * self-imposed* constraints (whether they're all essential for its target domains is a separate discussion, but some of those constraints certainly are) that makes this task particularly hard, but my point is that there is nothing fundamental to n:m threading that makes it slower than async/await, and async/await does fundamentally come at the significant cost of a particularly viral form of accidental complexity.
Fibers under the magnifying glass [1] might be a relevant paper here. Its conclusion, after surveying many different implementations, is that lightweight threads are slower than stack less coroutines.
No, its conclusion is that fibers with certain properties in C/C++ are slower -- and particularly hard to implement correctly -- than stackless coroutines in C/C++. That's because of the particular characteristics of those languages. In fact, you'll note that the only negative thing he says about Go is that it incurs an overhead when interacting with non-Go code.
Sure, but that overhead is also not essential, but a feature of Go's particular implementation. Fibers aren't one thing and there are many, many ways of implementing them. As I said before, implementing them for Rust well would have likely required changes to LLVM and Web Assembly, and even then it would be harder than async/await, perhaps to the point of being too hard to be worth it and probably against aspects of Rust's philosophy (I would say that that is the main difference between the two: achieving similar performance is much easier for the language implementors with async/await). But it's just not true that there is something essential about them that makes them slower. After all, you're running all of your code inside a particular implementation of threads.
> Sure, but that overhead is also not essential, but a feature of Go's particular implementation.
The only way to get around the FFI performance problem would be for all fibers to have big stacks. At that point you've thrown away their biggest selling points: high scalability and fast spawning.
> The only way to get around the FFI performance problem would be for all fibers to have big stacks.
I don't know all of Rust's specific constraints, but it is not the case in general. There are two levels for FFI support in this context, based on whether you want to allow FFI to block the lightweight thread (perhaps through an upcall), or not. Only if you want to allow that do you need "big stacks", but even then they can be "virtually big" but "physically small". If you don't, then all you need to do is to temporarily run FFI code on a "big stack", but you know that all the FFI frames are gone by the time you want to block. Depending on your FFI, if you don't allow the FFI code to hold pointers into you language's stack, you're all good.
Maybe I just can't imagine it. Whats a good language that shows off the style you're suggesting?