Thursday, April 14, 2022

Using Lisp libraries from other programming languages - now with sbcl

Have you ever wanted to call into your Lisp library from C? Have you ever written your nice scientific application in Lisp, only to be requested by people to rewrite it in Python, so they can use its functionality? Or, maybe you've written an RPC or pipes library to coordinate different programming languages, running things in different processes and passing messages around to simulate foreign function calls.

Calling into C from other languages is much easier in comparison. A dlopen() and a foreign function interface suffices to use library functionality exported with the C ABI, without spawning separate processes. Bindings written over such a C standard library for language X make it seem like the underlying functions really are written in X. For cross-language inter-operation, this is the most lightweight option available. Languages like Rust and C++ can use extern "C" { } blocks to wrap their own functions in the C standard ABI, to expose their functions in exactly this way for the foreign function interfaces of other programming languages.

For languages with a sizable runtime included (with, for example, automatic memory management, signal handling, etc.) such as Python or even Javascript, having a way to expose functions to be callable with the C ABI is much less common. You quickly need to deal with issues of how objects are represented internally, how to deal with type translations, and what happens when objects in memory move for GC. Some other Lisp implementations like ECL (the E stands for Embeddable Common Lisp) and some commercial Lisps do support this use-case however: that of allowing Lisp libraries to essentially be packaged up as a shared library linkable with ordinary object files where Lisp code is callable with the normal C calling convention.

Some of you may already know you can use ECL for this purpose; ECL is a great implementation, but it does lack some functionality and features that makes it unattractive for some.

If you prefer using SBCL, you can now join in on the cross-language programming frenzy too.

1.1. wait, really?

Those of you who have used foreign callbacks in libraries such as CFFI will already be somewhat familiar with how the Lisp-side interface works – you write your Lisp code normally and then define C convention callable function pointers in Lisp through a fairly comprehensive translation interface. With callbacks, you just pass these function pointers as parameters to your foreign functions. What's needed to make Lisp callable as a shared library on top of this interface is to iron out runtime management and linkage issues, e.g. how do we initialize the Lisp runtime and GC, and then how does the system linker actually find the C callable function pointers defined in Lisp so that you can call those functions normally as if they were written in C?

I'll explain how to use low-level machinery now exposed in SBCL and how it works, and wrap up by talking about a convenience library that creates a high level interface for creating bindings and manages some of the details.

1.2. example scenario: a calculator

Let's take a classic example to illustrate this new functionality: Suppose you want to write C code and call out to a library called calc. calc is written entirely in Common Lisp, full of classes and objects and methods defining ASTs for a simple symbolic calculator. It has functions like parse-expr, simplify, and pretty-print-expr, where everything works on Lisp objects.

(defclass expression ()
  ()
  (:documentation "An abstract expression."))
(defclass int-literal (expression)
  ((value :reader int-literal-value))
  (:documentation "An integer literal."))
(defclass sum-expression (expression)
  ((left-arg :reader sum-expression-left-arg)
   (right-arg :reader sum-expression-right-arg)))
(defun parse (string) ...)
(defun simplify (expr) ...)
(defun expression-to-string (expr) ...)

We'd like to expose these functions so that we can call these functions from other programming languages using the C ABI. So how does it work in the simplest case of using calc from a simple C program?

First, we need to define these function pointers in Lisp. The primitives SBCL now exposes to deal with this is through the macro sb-alien:define-alien-callable. You can use this macro in the following way:

(define-alien-callable calc-parse int ((source c-string) (result (* (* t))))
  (handler-case
      (progn
         ;; The following needs to use SBCL internal functions to
         ;; coerce a Lisp object into a raw pointer value. This is
         ;; unsafe and will be fixed in the next section.
         (setf (deref result) (sap-alien (int-sap (get-lisp-obj-address (parse source))) (* t)))
         0)
    (t (condition) (declare (ignore condition)) 1)))

(define-alien-callable calc-simplify int ((expr (* t)) (result (* (* t))))
  (handler-case
      (progn
         ;; The following needs to use SBCL internal functions to
         ;; coerce a raw pointer value into a Lisp object and
         ;; back. This is unsafe and will be fixed in the next
         ;; section.
         (setf (deref result)
               (sap-alien (int-sap (get-lisp-obj-address
                                    (simplify (%make-lisp-obj (sap-int (alien-sap expr))))))
                          (* t)))
         0)
    (t (condition) (declare (ignore condition)) 1)))

(define-alien-callable calc-expression-to-string int ((expr (* t)) (result (* c-string)))
  (handler-case
      (progn
         ;; The following needs to use SBCL internal functions to coerce a raw
         ;; pointer value into a Lisp object. This is unsafe and
         ;; will be fixed in the next section.
         (setf (deref result) (%make-lisp-obj (sap-int (alien-sap (expression-to-string result)))))
         0)
    (t (condition) (declare (ignore condition)) 1)))

Notice that for example purposes we just translate any exceptional Lisp conditions into the C return error code convention. The actual returned value is set through the second passing argument's out-pointer in typical C fashion.

Now we've defined C callable function pointers associated with names. You can get the underlying function pointers with the function sb-alien:alien-callable-function, which is useful for passing these callable functions as callbacks to C. SBCL actually didn't even have an external interface to callbacks before this, and CFFI had been using an internal system interface which is now superseded by this interface.

However, we're not too interested in callbacks right now. We want to be able to call these functions from C! You can save the Lisp image containing new callable function pointer definitions like so:

(sb-ext:save-lisp-and-die "calc.core"
  :callable-exports '("calc_parse" "calc_simplify" "calc_expression_to_string" ...))

What the :callable-exports argument describes is the set of C symbols you want to initialize with the corresponding function pointers created with define-alien-callable. This image, when started, has only two jobs: finish starting up the Lisp system and initialize these callable exports with the proper function pointers. It then immediately passes control back to C

Here's an example C file which uses these functions:

#include "libcalc.h"
int main () {
    initialize_lisp("libcalc.core");
    expr_type expr;
    if (calc_parse(source, &expr) != ERR_SUCCESS)
       die("unable to parse expression");
    char *result;
    expr_type simplified_expr;
    calc_simplify(expr, &simplified_expr);
    if (calc_expression_to_string(simplified_expr, &result) != ERR_SUCCESS)
      die("unable to print expression to string");
    printf("\n%s\n", result);
    return 0;
}

Notice that there is a symbol, initialize_lisp, that we can use to initialize the Lisp runtime. Its arguments are the same as the arguments you can normally pass to the main sbcl executable, so once the core name is specified and the runtime initializes the function pointers we are going to use, control is returned to the program.1

Once the Lisp runtime has been initialized, your function pointers are ready to use, and calling them from C works just like a callback would, with the same type translation machinery used.

1.3. what about those raw pointers? objects? GC???

Now comes the question of what to do with objects. There are actually several:

  1. How do we make sure that references living outside the Lisp heap keep the object alive?
  2. Solving that, how can we make sure objects in the Lisp heap don't stay alive forever and do end up getting garbage collected at some point?
  3. Even more conspicuously, how do we deal with objects getting moved by the GC?

In ECL, we don't really need to worry about this last aspect too much. Since the garbage collector in ECL uses Boehm, objects in memory never move, so you could get away with just passing the object address as a void pointer into C, and opaquely manipulating that in your non-Lisp code, as long as you have some way of telling Boehm where in your C program Lisp objects can reside.

Unfortunately, this is an unworkable scheme with SBCL. The primary garbage collector used in SBCL is generational and copying. Therefore, passing an object address to the outside world directly outside of the managed Lisp heap means that once that object moves, the pointer becomes invalid, and a segfault ensues. Luckily, the solution is not too complicated: A layer of indirection for opaque pointers is all that's needed to resolve the problem. This can be implemented in any number of ways, but arguably the simplest scheme is just to have a map associating integers (fixnums) to objects which cross the foreign function boundary. If the map exists in the Lisp heap, for example by using a global hash table stored in a global Lisp variable, then GCs simply update the map values like everything else, and because fixnums do not move, the keys to the map are therefore stable handles we can pass to the outside world. Since we usually only use object pointers opaquely, the indirection through the map can simply be handled on the Lisp side. This solution also solves problems 1) and 2) at the same time: the map lives in the Lisp heap, so it will keep any objects in its entries live. When we want to remove the reference to the object from the outside world, we simply remove the entry in the map. If that object is no longer referenced anywhere else, the garbage collector happily cleans it up.

Let's illustrate this indirection by modifying the above example to be safe and also teach the C program how to clean up memory. Assuming we have Lisp functions make-handle and dereference-handle to add this layer of indirection, we would simply change our callable definitions like so, allowing us to remove unsafe raw pointer coercions. make-handle and dereference-handle deal with Lisp fixnums coerced into opaque pointer values, so we know that they are always safe to pass around.

(define-alien-callable calc-parse int ((source c-string) (result (* (* t))))
  (handler-case
      (progn
         (setf (deref result) (make-handle (parse source)))
         0)
    (t (condition) (declare (ignore condition)) 1)))

(define-alien-callable calc-simplify int ((expr (* t)) (result (* (* t))))
  (handler-case
      (progn
         (setf (deref result) (make-handle (simplify (dereference-handle expr))))
         0)
    (t (condition) (declare (ignore condition)) 1)))

(define-alien-callable calc-expression-to-string int ((expr (* t)) (result (* c-string)))
  (handler-case
      (progn
         (setf (deref result) (expression-to-string (dereference-handle result)))
         0)
    (t (condition) (declare (ignore condition)) 1)))

Now let's modify our C example to include this type of handling using the exported function lisp_release_handle:

#include "libcalc.h"
int main () {
    initialize_lisp("libcalc.core");
    expr_type expr;
    if (calc_parse(source, &expr) != ERR_SUCCESS)
       die("unable to parse expression");
    char *result;
    expr_type simplified_expr;
    calc_simplify(expr, &simplified_expr);
    if (calc_expression_to_string(simplified_expr, &result) != ERR_SUCCESS)
      die("unable to print expression to string");
    printf("\n%s\n", result);
    lisp_release_handle(expr);
    lisp_release_handle(simplified_expr);
    return 0;
}

As you can see, we only ever need to make and dereference handles on the Lisp callable definition side, while the foreign world only needs to pass around these objects opaquely and know how to "free" their references to these objects. Any time you feel like you need to peek inside the internals of Lisp object from the foreign world, all you need to do is define an alien callable function exposing that portion of the object. That way, cross-language pointers can be kept totally opaque and safe through some indirection like through the handles illustrated above.

1.4. sharing a process

There are some more sophisticated intra-process related issues I haven't touched upon, including how the Lisp runtime interacts with the outside world in relation to signals, sessions, threading, or file descriptors such as standard input and out. In particular, users will need to be careful when using Lisp together with another managed runtime in the same process, such as a Python interpreter. However, since many of these issues are general problems anyone running multiple systems in one process has to deal with, I won't cover the myriad number of ways people use to share process stuff. However, POSIX signal handlers can all be set and removed from Lisp itself in SBCL, if the need arises. Some examples of ways Lisp implementations could use signals:

  • signals relating to handling keyboard interrupts to drop into the debugger. The debugger can of course be disabled.
  • some kind of mechanism for telling other threads a stop-the-world garbage collection is happening and they need to suspend. This can be implemented via signals, but it can be implemented by other means such as safepoints.
  • some kind of mechanism for actually starting a garbage collection, for example through a segfault on an unallocated page.

Different builds of SBCL on different operating systems require different signals or do threading in different ways, so consult the documentation for details.

1.5. wrapping it up

While the functionality that SBCL exposes is fairly low-level, a high level library called sbcl-librarian is in development (used in, for example, the Quil language stack), which defines a declarative interfaces for generating bindings for types and errors, as well as producing the corresponding Lisp callable definitions. It also defines some API exports for important Lisp-side functionality such as the handles described above, as well as exposing the Lisp loader and debugger for loading and debugging Lisp code at runtime from the foreign world. Much of the interface is still subject to design and change, but hopefully things will stabilize soon with use.

Well, so now you have another implementation on the block for embedding Lisp in larger systems. Go try it out!

Footnotes:

1

Note that behind the scenes, the symbols parse_expr, simplify_expr, etc. are all global variables that happen to hold function pointers. They are not function symbols in linker lingo, but from the perspective of C, you can call them the same as if you defined a normal function. Hence there's only one level of indirection to make the call on the C side, which is the same as the indirection to call any other function from a shared library via the normal PLT/GOT mechanism, which is pretty much optimal for cross-language shared library linkage, if you ask me.

1 comment:

  1. That is very cool! Thank you for sharing. I've starting calling C code from SBCL and I'm looking forward to calling SBCL from C.

    ReplyDelete