A small scheme VM, compiler, and REPL in 4k

throwaway81523 · on Nov 26, 2021

Paper is acm-walled but code repo is https://github.com/udem-dlteam/ribbit . I would like to have seen Hedgehog Lisp ( https://github.com/sbp/hedgehog ) in the comparison chart, since it has actually been used for practical projects. Its VM is around 20KB of object code. 4KB is impressively tiny, but any machine where 4k vs 20k makes a big difference is unlikely to have enough ram to run garbage collected applications. That seems to call more something more like Forth rather than Scheme.

Hedgehog is a somewhat weird functional dialect of Lisp and I've been wanting for a while to convert it to a Scheme subset (mostly just using Scheme syntax instead of defun, etc) so people would be more familiar with it.

Added: paper is here http://www.iro.umontreal.ca/~feeley/papers/YvonFeeleyVMIL21....

bsder · on Nov 26, 2021

The rib encoding looks very clever. I need to digest the full implications of that.

> 4KB is impressively tiny, but any machine where 4k vs 20k makes a big difference is unlikely to have enough ram to run garbage collected applications.

Yeah, the paper itself says that the heap was 240K. A 4K system isn't very helpful if your heap is 240K.

I'd like to see how this runs (if at all) with a 4K heap. Deceptively straightforward things like map and apply can cons up a whopping amount of intermediate garbage.

> I would like to have seen Hedgehog Lisp ( https://github.com/sbp/hedgehog ) in the comparison chart, since it has actually been used for practical projects

Links to projects?

djwatson24 · on Nov 26, 2021

I agree the optimization looks backwards to me - in most of the microcontrollers I work on, the flash program space can easily be multiple megabytes, but the ram is usually the limiter around 128-512k.

feeley · on Nov 26, 2021

The introduction in the paper explains the motivation for Ribbit:

The use case which has motivated our work is code mobility where an executable program can be embedded in a document, email, or website. In that use case the size of the program must be small to minimize the transmission or loading time or to satisfy space constraints, such as the size of a disk boot sector, the URL length limit and the UDP packet size. On the other hand, the space used while the program is executing is of secondary importance.

So the main target application is not microcontrollers, even if it can obviously be used for some microcontrollers (for example a 1-2$ ESP32-C3 has 400 KB of RAM).

One of the possible applications of Ribbit is to implement the RVM (Ribbit VM) in the Excel spreadsheet formula language (which will soon be turing-complete with the addition of the LAMBDA form) to be able to program spreadsheets in Scheme. The execution environment has lots and lots of RAM, but you want the .xls file itself (which contains the RVM) to be as small as possible so it can easily be sent in emails, etc.

pumanoir · on Nov 26, 2021

Can Ribbit w/ the max lib run everything from SICP? It'd be really cool to have a self contained (html or other text) file of the SICP book.

feeley · on Nov 26, 2021

Ribbit's max library contains most of the R4RS predefined procedures (even call/cc). The main things missing are variadic procedures (unfortunately this includes the "list" procedure so you need to replace (list 1 2 3) by (cons 1 (cons 2 (cons 3 '())))) and also all the file opening operations. There is read-char, write-char, etc but they only do console I/O. These things could be added at the price of a larger footprint. One simple way to do this is to have the RVM run the code of a meta-circular Scheme evaluator written in Scheme.

If you are interested in building a a self-contained HTML document that embeds a full featured Scheme implementation, you should look into Gambit. Check out https://try.gambitscheme.org

pumanoir · on Nov 28, 2021

Thanks. I will check it out.

rcarmo · on Nov 26, 2021

Added to https://taoofmac.com/space/dev/lisp along with a few others, since I'm fascinated by this kind of thing.

feeley · on Nov 26, 2021

In your list you say "small and portable Scheme implementation with AOT compilation and 4K footprint". However the presence of an AOT compiler is not the most interesting part. The distinguishing feature is that in 4K there is a REPL that uses a dynamic compiler to do the "eval". In other words, the code entered at the REPL runs essentially as fast as if it was compiled by the AOT compiler.

throwaway81523 · on Nov 27, 2021

It seems to me (granted I haven't gotten it to work reliably at all yet) that the REPL is somewhat fragile, and I wonder if it might be easier as well as smaller to use a remote (tethered) compiler, as some Forth systems do.

Re the repl, I may have figured out what went wrong with user-defined functions. Instead of saying (define (sq x) (* x x)) I apparently have to say (define sq (lambda (x) (* x x))). Maybe that is documented in the paper, which I still have to read.

chubot · on Nov 26, 2021

Code looks very cool

There's even a Python implementation with 1 space indentation :)

https://github.com/udem-dlteam/ribbit/blob/main/src/host/py/...

throwaway81523 · on Nov 27, 2021

I was unable to build ribbit from the git repo under Debian. The compile-all.sh script complained about absence of "gambit", and after installing gambc (Debian's gambit package) and changing the "gambit" command in the build script to gsc (gambit compiler), the script still throws error messages. I might mess with it some more, but I'm unfamiliar with the tool chain, so at first look I don't know what to try next.

Did anyone else have better luck?

feeley · on Nov 27, 2021

You need to install Gambit HEAD, then you can:

    % cd src
    % gsi rsc.scm -m -t py -l max repl-max.scm
    % python3 repl-max.scm.py
    > (+ 1 (* 2 3))
    7

or for the C version:

    % gsi rsc.scm -m -t c -l max repl-max.scm
    % gcc -DCPROG -o repl-max.scm.exe repl-max.scm.c 
    % ./repl-max.scm.exe
    > (+ 1 (* 2 3))
    7

To install Gambit follow the instructions here: https://github.com/gambit/gambit

throwaway81523 · on Nov 27, 2021

Ok, I installed gambit from master, ran the compile-all script with "gsi" but it has been running for an hour or so. Trying your gsi command above also seems to take awfully long. I tried building rsc.scm with gsc and that made an executable that let me create repl-min.scm.c that was invalid C because of two definitions of "input". I commented out one of them and was able to compile a REPL that could do (+ 2 2) but segfaulted when I tried making a user-defined function and calling it. Something similar happened with the web demo on github.

I see that the extra "input" definition in the .c file is in an #ifndef next to some clang pragmas: does it expect to be compiled with clang? I'm using gcc 8.3.

I got rsc working with python after a few false starts, but that also crashes (with a python backtrace instead of a segfault) when I try to define and use a function.

Also have similar crashes when the scheme output of rsc, under both gsi and guile.

throwaway81523 · on Nov 27, 2021

Ah thanks, I'll try that. I made a little progress earlier and found that compilation was failing because some functions like (script-file) were not defined. Maybe those are in HEAD. It would be nice if the compilation process stayed with reasonably portable Scheme.

feeley · on Nov 27, 2021

The Ribbit compiler was developed using Gambit but most of the code is portable. I rewrote a few parts with cond-expand to port rsc.scm to older versions of Gambit and also Guile and Chicken. If you pull the latest commit the Ribbit system should work with any of those Scheme systems. The README also contains usage instructions, here is a relevant part:

The Ribbit compiler is written in Scheme and can be executed with Gambit, Guile or Chicken. It has been tested with Gambit v4.7.5 and above. For the best experience install Gambit from https://github.com/gambit/gambit .

Currently Ribbit supports the target languages C, JavaScript, Python and Scheme which are selectable with the compiler's `-t` option with `c`, `js`, `py`, and `scm` respectively. The compacted RVM code can be obtained with the target `none` which is the default.

The `-m` option causes a minification of the generated program. This requires a recent version of Gambit.

The `-l` option allows selecting the Scheme runtime library (located in the `lib` subdirectory). The `min` library has the fewest procedures and a REPL that supports the core Scheme forms only. The `max` library has most of the R4RS predefined procedures, except for file I/O. The `max-tc` library is like `max` but with run time type checking. The default is the `max-tc` library.

Here are a few examples:

    Use Gambit to compile the minimal REPL to JavaScript
    and execute with nodejs:

      % cd src
      % gsi rsc.scm -t js -l min repl-min.scm
      % echo "(define f (lambda (n) (if (< n 2) n (+ (f (- n 1)) (f (- n 2))))))(f 25)" | node repl-min.scm.js
      > 0
      > 75025
      >

throwaway81523 · on Nov 27, 2021

Thanks, I did get the C, Python, and Scheme RVM's to work. I just didn't realize that (define (f x) ...) didn't work as I was used to. I didn't get JS working because it required a JS build tool that I didn't have installed. I didn't pursue it as I don't use JS much.

Can I ask how long "gsi rsc.scm..." took in your example? It ran for around an hour when I tried it with python output. I then stopped it and compiled rsc.scm with gsc and that took just a fraction of a second to build the repl's.

Do you have particular applications in mind for Ribbit? I see some general mentions of running it in browsers and email clients, but those tend to have JS already available.

Finally, I'm wondering if you think that functional-style programming (avoiding mutation) inherently uses more ram than a more mutation-heavy style. I'm asking because Micropython is fairly usable with 32k of ram and has been shoehorned into 16k (BBC micro:bit v1), but Hedgehog and apparently Ribbit both want around 200k. Hedgehog relies on bytecode and an offline AOT compiler but Micropython has a REPL.

feeley · on Nov 27, 2021

It is almost instantaneous:

    % /usr/bin/time gsi rsc.scm -t c -l min repl-min.scm
            0.06 real         0.05 user         0.00 sys
    % gcc -O3 repl-min.scm.c
    % echo "(define f (lambda (n) (if (< n 2) n (+ (f (- n 1)) (f (- n 2))))))(f 25)" | /usr/bin/time ./a.out
    > 0
    > 75025
    >
            0.10 real         0.10 user         0.00 sys

My guess is that you were using an old version of Gambit (before I added the support for older versions of Gambit).

The work on Ribbit is not driven by a specific application. It started as a challenge to write the smallest footprint implementation of Scheme after being inspired by the "sectorlisp" system (which has now reached an amazing milestone of just 0.5 KB!). However sectorlisp doesn't have a GC, continuations, tail-calls, etc so the comparison is a bit apples-to-oranges.

The main advantages of the RVM are its small size and its portability. This is particularly useful for code mobility. The exact same Scheme code can run on the desktop, in the browser, on the web server, on a microcontroller, an Excel spreadsheet, a shell script, etc etc And because continuations are a good way to represent the state of a running process I foresee applications where a running program migrates easily from one environment to the other by serializing the continuation (a game running in your desktop browser could be migrated to the browser on your phone to continue playing on the subway ride back home).

Concerning "functional-style programming (avoiding mutation) inherently uses more ram than a more mutation-heavy style" it greatly depends on how smart your compiler is. A smart one can in some cases generate the same code. An example that comes to mind is a tail-recursive function that adds numbers from a list. The functional program has no mutations but a compiler doing TCO can generate the same code as an imperative program that uses mutations.

throwaway81523 · on Nov 27, 2021

Ah, thanks for all that, and I will try the new version of gambit. I don't remember for certain whether the looping I encountered was with the old version. It will be cool if Ribbit can become completely self-hosting.

I do know there have been some web sites (including I think HN) written in Scheme, that originally used continuations as sessions, but ended up switching away from that approach once things got more complicated. It's a cool idea though.

Ribbit reminds me of this article mentioning "Fluchtpunkt Lisps", in case you're not familiar with it: http://blog.fogus.me/2011/05/03/the-german-school-of-lisp-2/

I guess it is an example of one.

Ribbit seems more interesting to me than Sectorlisp because of the issues you mention! I'll keep playing with it. Even if I don't find direct uses for it, its implementation methods are worthy of study.

I've been wanting for a while to make a Hedgehog Lisp back end for Purescript (purescript.org). Ribbit might be another interesting target.

Added: I think the Ribbit REPL needs some kind of exception trap. Otherwise saying something like (3) crashes the whole interpreter, which is not so great for an interactive environment.

Hedgehog has built-in functional AVL trees that substitute for hash tables, arrays, and other lookup structures. They are very useful, but are an example of functional style possibly requiring more ram than imperative style where you can mutate stuff.

neilv · on Nov 26, 2021

Non-ACM-paywalled paper from the home page of Marc Feeley: http://www-labs.iro.umontreal.ca/~feeley/papers/YvonFeeleyVM...