Calling the C world from the Scheme World

A brief essay, with examples, by

Brian Beckman

Updated 12 January 2000 (thanks to John D. Corbett for probing questions)

Original 19 August 1999

What are we trying to do?

We'd like to call programs written in C and C++ from programs written in Scheme. You thought I was going to say "and vice versa", but that's not entirely true. Some C programs call other C programs through explicit function pointers, or callbacks. The only case in which we need C to call Scheme is the case of callbacks. In other words, we want to call C and C++ from Scheme and we want write our callbacks from C and C++ in Scheme.

To be a bit more blunt about it, we want Scheme to be in charge, and we want C and C++ programs to be the unwitting slaves of Scheme. In other words, we'd like our Scheme to be able to call any C/++ code whether that code was ever intended to be called by some other programming system. And, we want our methods to scale to real-world C/++ programs. For example, we'd like to call kernel32.dll and oleaut32.dll from Scheme code.

We can expand our ambitions to C++ vtables and to COM. If we can call C, we ought to be able to call through a vtable, which is just an array of (non-callback) function pointers. So, our class of unwitting slaves includes all of C++ and COM, including OLE Automation Servers, because all of these are based on vtables. From now on, then, I'll write “C" and mean “C and C++ and COM and OLE Automation" because we can handle all these cases in the same way.

We require only that the target C code be packaged in a DLL, or Dynamically Linked Library, and that the DLL export its entry points symbolically. DLLs are a very general standard for symbolic linking at runtime in the C-programming world on Microsoft platforms like Windows. Instead of using the operating system's symbolic linking loader, we'll just gin up our own inside an implementation of Scheme. This is how we're going to call C functions in DLLs.

All that said, this is just a demonstration. We're not actually trying to make an industrial-strength, shippable Scheme product that can call C. Instead, we're trying to demonstrate solutions to the "hard" problems of callbacks and transformations among C types to Scheme types, and leave the problems of incorporating these solutions into a product-grade implementation to others. However, this is not mere theory: we show actual solutions to these problems in real, scalable code, and we call big, hairy DLLs like kernel32.dll and oleaut32.dll directly, with no glue code written in C, and we’ve stressed this by calling millions of iterations over the “hard” bits.

Why would you want to do that?

Up front, let me say that I'm not trying to advertise or proselytize or push my work in the slightest way. I built it for my own reasons and just thought some other Scheme and Siod users might find it amusing if not useful.

C is a wonderful language for writing libraries of special-purpose and high-performance components. It's a lousy language for algorithms, data structures, scripting, distributed command and control, experimentation, prototyping, learning by doing, and so on. That is, it’s so-so when you need lots of flexibility and elbow room or when you need to focus on the application domain. C forces you to do all your own pipe-fitting and resource management, as well as to invent data formats and write custom I/O code for your application data.

Scheme is great for flexibility and elbow room and for maximizing the impact of your precious programming hours by taking care of data formats, types, and resources for you. But Scheme is not so good for special-purpose, high-performance libraries.

So, let's use C for what it's good for and Scheme for what it's good for. Let's also decide that we want to use all kinds of C libraries that might only have been intended to be called from other C programs. We want to be able to use libraries tomorrow we haven’t thought about today without having to drop everything and invent new, abstract wrappers for them. So we need to make our Scheme smart enough to fool C into thinking it's being called by C. This isn't as hard as it sounds.

C isn't Safe; how can you call it?

Everyone knows that Scheme is safer than C. Most memory-corruption bugs are simply impossible in Scheme, and they're quite easy in C. In fact, C is no safer than assembly language, and it's that way on purpose. C is meant for hardcore programming close to the machine. While it has nice abstraction features like static type-checking and object-oriented programming, it always gives you a way to break the rules when you need to do. If we're going to call arbitrary C code from Scheme, we need to deal with the fact that C code doesn't have anything like Scheme's notion of safety.

We have to open holes in Scheme's safety nets, but we can close them all up on the Scheme side (even though we haven't done that completely in this demonstration package). This means that we can do all sorts of potentially unsafe things in some isolated bits of Scheme code, and we can stress, debug, test, and quarantine all the unsafe code on the other side of a well-defined fence. We'll show lots more on this below.

What is the C World?

This is best shown by example. Consider the following C program:

#include <stdio.h>

#include <string.h<

__declspec (dllexport) int __stdcall

TestVictim0 () {

    return 42 ;

__declspec (dllexport) char * __stdcall

TV4 (char * s) {

    printf("Got this: \"%s\"\n", s) ;

    return s ;

__declspec (dllexport) char * __cdecl

TV5cdecl (char * s, char * t) {

    return strcat (s, t) ;

typedef int (*PFII) (int) ;

__declspec (dllexport) int __stdcall

TVcallback (int i1, int i2, PFII f) {

    int i = f ( i1 * i2 ) ;

    return i ;

That is source code for a little DLL that has four exports, TestVictim0, TV4, TV5cdecl, and TVcallback. Trust me, the methods we develop here work for very big DLLs, like kernel32.dll and oleaut32.dll, and we show it below.

This DLL is completely in the C world. There is nothing in here about Scheme-like types. These exports give and take machine-word integers; strings, as machine-level addresses; and functions as machine-level addresses. All C-like stuff. Also, some of the entry points are cdecls, i.e., where caller pops arguments, and some of them are stdcalls, i.e., where callEE pops arguments. The former have the advantage of flexibility, that is, you can implement functions of variable arity, like printf, using cdecls. The latter have the advantage of brevity: call sites do not need to take up space with argument-popping instructions.

If we can call cdecls and stdcalls with ints, strings, and function pointers, we will have a sizeable fraction of the C world covered. Further, we argue, we've got all the hard cases covered and the rest is straightforward.

What is the Scheme World?

By example, once again. What should calls to the entry points above look like? A very simple, but surprisingly useful assumption is that C functions return machine words, which, on 32-bit architectures, are DWORDS or unsigned longs. The C language permits a function to "return a struct", that is, an aggregate much larger than 32 bits. But, this facility is not used very often in practice. Furthermore, compilers would be likely to implement the facility by putting the data somewhere and having the function return a pointer in a DWORD.

The upshot is that we may cover the vast majority of industrially important cases simply by assuming that C functions return DWORDs. This makes the Scheme "superglue" easier. The "superglue" is the generic C code we need to write as part of the Scheme implementation to extend it so it can call arbitrary C code. This is different from special-purpose "glue" that would extend Scheme so it could call a particular C function. If you haven't gathered it by now, we eschew any use of "glue" in favor of "superglue".

All that said, here is a real trace of some calls from Scheme, through Superglue, to the DLL shown above. The first one is easy to understand--the function returns 42, which superglue turns from a C number into a Scheme number. The next two return addresses, interepreted as Scheme numbers. To get at the data inside, we need some superglue for converting from addresses in the C world into values in the Scheme world. The last one converts the Scheme closure into a C function (by dyanmically painting code bytes in memory), calls it, and returns the answer.

(TestVictim0) --> 42

(TV4 "Hello World") --> 0x7B6D80

(TV5cdecl "yadda yadda, " "blah blah") --> 0x7B6C00

(TVcallback 3 4 (lambda (x) (* 2 x))) --> 24

To look inside the pointers returned by TV4 and TV5cdecl, we use the Scheme Superglue function cstring->string, which takes a Scheme number, which must have the value of a C pointer pointing to a C string, and conses up a fresh Scheme string. So, we get the following:

(cstring->string (TV4 "Hello World")) --> "Hello World"

(cstring->string (TV5cdecl "yadda yadda, " "blah blah")) --> "yadda yadda, blah blah"

Multiary Callbacks: WndProcs

Now, in Windows, a WndProc is a quaternary callback, that is, a

(lambda (hWnd msg wParam lParam) ...)

So, given multi-ary (or multiadic) callbacks, and given that we have a way to populate C structs (see structs.scm in the source below), then we should be able to write Windows applications entirely in Scheme, and we can; here’s “Hello, World!”:

(require "gdi32.scm")

(require "winuser.scm")

(require 'fm.scm)

(define (WndProc hWnd msg wParam lParam)

(cond

((= msg WM_DESTROY) (PostQuitMessage 0))

((= msg WM_PAINT)

(let* ((ps (new-PAINTSTRUCT))

(hDC (BeginPaint hWnd ps)))

(TextOutA hDC 10 10 (string->cstring "Hello, World!") 13)

(EndPaint hWnd ps) 0))

(#t (DefWindowProcA hWnd msg wParam lParam))) )

(define (myCreateWindow)

(let ((wc (new-WNDCLASS))

(clsNym (string->cstring "GenericAppClass"))

(appNym (string->cstring "Generic Application"))

(clsStyl (| CS_OWNDC (| CS_VREDRAW CS_HREDRAW)))

(winStyl (| WS_OVERLAPPEDWINDOW (| WS_VISIBLE

(| WS_HSCROLL WS_VSCROLL))))

(hInst #x400000)

(msg (new-MSG)) )

(WNDCLASS::set-lpszClassName wc clsNym)

(WNDCLASS::set-lpfnWndProc wc (callback WndProc ))

(WNDCLASS::set-style wc clsStyl)

(WNDCLASS::set-hInstance wc hInst)

(WNDCLASS::set-hbrBackground wc 6)

(ChkHandleReturn (RegisterClassA wc))

(ChkHandleReturn (CreateWindowExA 0 clsNym appNym winStyl

0 0 CW_USEDEFAULT CW_USEDEFAULT

0 0 hInst 0))

(while (GetMessageA msg 0 0 0)

(TranslateMessage msg)

(DispatchMessageA msg)) ))

Hostile Foreign-Function Interface in SIOD

We wrote a Hostile FFI for George Carette’s wonderful, little SIOD program (http://people.delphi.com/gjc/siod.html). The FFI is hostile because the C code doesn’t need to cooperate. We can run kernel32, COM, Ole Automation, etc. Please feel free to download the Visual Studio 6.0 Project. Build it, run it (you may have to adjust the project directories in the Debug Tab of the Project\Settings dialog box; make sure the program starts in the siodffi\scripts\scheme directory), and type

(load "regress.scm")

This will run a bunch of tests on Superglue, calling kernel32, calling raw COM, consing up callbacks like the ones documented above, and calling OLE Automation. Type

(load "windows.scm")

To see “Hello, World!” in action (wowee!!)

You can examine what’s going on and how we do it by just looking at the scheme code in “regress.scm” and backtracking through the Superglue code. Most of that code is in “Realload.cpp”, and there isn’t very much of it. Superglue consists entirely of the following new Scheme functions implemented in Realload.cpp and slib.cpp:

init_subr_1 ("GetProcAlist", GetProcAlist ) ;

init_subr_1 ("GetProcAlistCDecls", GetProcAlistCDecls ) ;

init_subr_2 ("MainDispatch", MainDispatcher ) ;

init_subr_2 ("MainDispatchCDecl", MainDispatcherCDecls ) ;

init_subr_1 ("cstring->string", StringFromCString ) ;

init_subr_1 ("string->cstring", CStringFromString ) ;

init_subr_1 ("array->rgpv", RgPvFromLispArray ) ;

init_subr_2 ("rgpv->array", LispArrayFromRgPv ) ;

init_subr_1 ("array->cbytes", CByteArrayFromArray ) ;

init_subr_1 ("array->carray", CByteArrayFromArray ) ;

init_subr_2 ("cbytes->array", ByteArrayFromCByteArray) ;

init_subr_3 ("carray->array", SiodArrayFromCByteArray) ;

init_subr_0 ("ScmGetSystemDirectory", SiodGetSystemDirectory ) ;

init_subr_0 ("sgsd", SiodGetSystemDirectory ) ;

init_subr_2 ("newVariant", newVariant ) ;

init_subr_1 ("num->lisp",             LispFromFlonum    ) ;

init_subr_1 ("lisp->num", FlonumFromLisp ) ;

init_subr_1 ("cptr->num", FlonumFromCPtr ) ;

init_subr_1 ("num->cptr", CPtrFromFlonum ) ;

init_subr_1 ("&", CPtrAddressOfCPtr ) ;

init_subr_0 ("cptr", newcptr ) ;

init_subr_1 ("val", indirect ) ;

init_subr_1 ("indirect", indirect ) ;

init_subr_1 ("peek-dw", indirect ) ;

init_subr_2 ("poke-dw", pokeDw ) ;

init_subr_1 ("peek-byte", peekByte ) ;

init_subr_2 ("poke-byte", pokeByte ) ;

All the rest of the fun stuff is written in Scheme.

Alternatives

Ok, we have a pretty comprehensive solution to this, but there’s another partial one in the world, too. The Rice University PLT group has come up with a system for calling Active X controls from Scheme when those controls are scripted in an HTML window. References are below.

DrScheme V100

This is very cool. It’s R4RS-compliant, and our gizmo is not. But, it only calls ActiveX controls in an HTML context, and we call arbitrary C/C++/COM DLLs. See http://www.cs.rice.edu/CS/PLT/packages/drscheme/

Why not write safe, high-level Scheme Wrappers?

Now, I had four reasons to opt for designing a "hardcore, unsafe" Hostile FFI over a set of sane wrappers: laziness (typeI), laziness (typeII), coverage, and uniquess. To elaborate:

1. Laziness (typeI): the Windows API is enormous: at least 5,000 APIs if one includes Multimedia, DirectX, OLE, (D)COM etc.; just enumerate the exports in *.dll in your Windows directory. To cover even a small fraction of it with sane wrappers is a huge amount of work. I once did sane wrappers for GL on SGI machines, and that was weeks of spade work for a 700-item API that is vastly simpler and more repetitive than that of Windows. I couldn't imagine actually trying to do it for Windows in the first place, let alone trying to keep up with the deluge of new DLLs hitting the fan continually.

The lazy programmer's way out is to find a generic way to suck up any and all the APIs, and that's the way I took. This approach, BTW, is not different in principle than the approch taken by Java FFI, Visual Basic's FFI, and a host of others. I just don't use those language systems because I like scheme better (subjective rationale).

2. Laziness (typeII): It's hard enough to learn the minimum, required knowledge to write Windows programs. To learn this or that sane wrapper system *before* finding out whether it can actually do what you need is burdensome. For example, I wanted to play with MIDI. I was unable to find any way to hook into the Windows MIDI API amongst the current crop of free scheme implementations, e.g., DrScheme, VScheme, and SCM. Those are all fabulous systems, but I had to devote some considerable time to learning them before realizing that I could not find out how to work MIDI with them. Note that this was not time spent on Scheme itself, but on the libraries and wrappers and interfaces and other accoutrements that come with each system. Now, I'm not an expert in those systems, and they *may* have MIDI capability, but I had only finite time to look and I couldn't find it within my time horizon. I had to trade this off against how much time it would take me to write an FFI, and that turned out to be 3 days (because I already had my own PE loader and I already knew the insides of Siod). So this was another subjective point: it was easier *for me* to do it my way.

3. Coverage: I sort of touched on this, but there is another point. I do not know, today, what corners of Windows I will need tomorrow (so this topic might be called ignorance or blindness:). I would hate to have to derail my train-of-thought to go design yet-another-sane-wrapper before I can get back to playing with Windows. So, I trade "sanity" off against cognitive throughput: if I can find away to slay the *entire* dragon (and all future dragons, to boot) with one blow, then I don't ever have to think about it again. I'll gladly take a little insanity and usafety in exchange. Subjective again.

4. Uniqueness: There are lots of sane-wrapper solutions out there with their own pros and cons each. I really didn't want to add another.

Conclusion

Have fun! Email me at brianbec@hotmail.com. I’ll be documenting this stuff as time allows.