emblemparade.com

Keeping the C ABI Without C

Originally published on LiveJournal, 9.19.07

There’s an enormous benefit in writing code in C: it has a standard ABI, and is thus the master of interoperability. That means that you can link to any C library out there if it’s available for your platform, and, if you write your code in C, it’s guaranteed to be linkable on any platform. (Another language with a standard ABI, much less used, is Pascal.) C++, for example, doesn’t have a standard ABI: if you write a class in C++, you cannot have it binary linkable anywhere. If you have access to the source, and compile it all on that platform, you’re OK, but that can add a very complex requisite. For example, if you’re writing a web browser, and want to include a 3rd-party HTML renderer, it would be unreasonably difficult to have to compile that component when you build your browser. Thus, though C++ is almost as portable as C (in that it can compile on many platforms), it’s practically not interoperable. That’s why C still rules supreme. You wouldn’t dream of writing an operating system, for example, in anything but C.

However, C is very much a lowest common denominator, and we’re obviously much more productive in the higher-level, object-oriented, garbage-collected offspring of C: C++, Objective C, Java and C#. But, with the choice of using these we have a lot to lose in terms of interoperability. For example, this is the underlying difference between proponents of the Qt library, based on C++, and the Gtk library, based on C (used by KDE and GNOME desktops respectively). Gtk plays nicely with others because it uses C (and especially since GObject was introduced to it, mentioned in #3, below). Qt needs special wrappers for each language it supports.

There are, however, new ways we can keep the advantages of the C standard ABI with high-level productivity.

1) C++ allows mixing in C code. Though it does mean programming to that lowest common denominator, it’s still quite possible and straightforward. There are also great garbage collectors for C++. Add a stringent coding standard, and C++ can be as productive as C# and Java, and fully portable. You do have the problem, though, of having to “program down” in order to have the ABI advantage. You end up, basically, writing in mixed C/C++.

2) Compiled Java, via the GCJ project. This is not a virtual machine: GCC treats Java as yet another C-like language it can compile, in addition to C++ and Objective C. It plugs in a very good garbage collector (Boehm, which you can replace), and voila. Interoperability is allowed by mixing in C++ code via something called CNI. CNI allows a kind of ABI for Java. Unfortunately, it is particular to GCJ, and not widely used. (Java virtual machines use JNI, which is not an ABI, but an API into the virtual machine.) So, in compiled Java, you end up mostly “programming down” to C in order to allow interoperability. Choosing compiled Java is thus not too different from choosing C++ in those terms. The big advantage over using C++, though, is that you already have a good coding standard and garbage collection as part of the language. It ends up being safer and more productive. Also, you can mix in Java bytecode, giving you access to a whole other universe of portability and libraries. With compiled Java, you can access C, C++ and Java code.

3) Vala. This new project, from the Gtk folk, is ingenious. It’s a C#-like language that is translated internally into C source code. Object orientation is handled automagically through GObject, which is really the main mover of Vala and its inspiration. GObject is a complete OO solution in pure C, essentially an OO ABI. Bindings exist between GObject and practically any language you have heard of. Thus, a GObject-based C library is immediately usable, for example, by Python, in binary form. Read that again: you can expose an object from your code, and use it, just like that, in Python, without having to write any “wrapper.” No other platform in existence allows that. The closest you get to anything like that is exporting COM objects from .NET, but COM is far, far more complex than GObject. GObject is a self-contained, standard C library. You can run it anywhere. Thing is, using GObject is a pain in C, which is not itself an OO language. Enter Vala: it changes all that by adding a higher compilation level. The end result is that you get all the advantages of all the high-level languages, with the added advantage of instant interoperability with a host of powerful platforms. I really think this is the killer solution. Vala code could run anywhere there is a C compiler, and be more interoperable than any of the above solutions. I’m definitely going to be following this development very closely!