emblemparade.com

An Explanation of the Debian Packaging System

Originally published on LiveJournal, 3.30.08

I think I finally have an idea of how the whole Debian packaging system works. It makes a lot of sense all in all, but could really use some better tools, and better explaining. Here’s an attempt at explaining.

The first thing to note is that it’s meant, from the ground up, for free, open source software. This makes a huge difference for three reasons: 1) for legal purposes, you want to make sure that all the binaries have their source code available; 2) for security purposes, you want to make sure that a hacker won’t be able to upload malicious source code, in addition to being able to upload malicious binaries; and 3) open source projects are organized not so much as hierarchies than as collaborative communities of volunteers. Rather than put all the power in one part of the system, which could mean a full-time job for a maintainer, you would need a system that lets part-timers from various parts of the system do the work. At most, you’d want a simple approval system for a core maintainer, where a fix or an update would enter the system quickly.

Let’s start from the bottom.

In the old days, there was no real packaging standard. The culture was for open source projects to be deployed in “tarballs,” a nickname for a source code archive that follows these loose guidelines:

1) It’s archived using both “tar” (Tape Archive) and GNU zip, which is why it has a .tar.gz extension, and why it’s called a "tarball."

2) It has at its root a script called “configure”, which is expected to check the operating system environment, identify the CPU architecture (Intel 80x86, SPARC, PowerPC, etc.), find all the tools it needs for building/compiling, and prepare for using these tools. Many “configure” scripts also expect you to tell them where these tools are. And so, there’s a README file included explaining what you have to do. There’s no single standard for this at all, and different projects use different “configure” mechanisms, including home-grown systems.

3) A GNU “make” file which supports a few customary scenarios: “build” (=compile) the source, turning it into a binary; “install” the binary in the operating system; “uninstall” it; and possibly “clean” to get ready for rebuilding in case things go wrong. You can see that it’s quite possible that the “configure” script would modify this “make” file to customize it for the current operating system.

The culprit is step #2: there’s simply no way to “configure” for every operating system, and no standard at all for the script. Unless the tarball was specifically designed for your operating system, it’s going to be a nightmare. Worse, what happens if “configure” doesn’t find what it needs? You’re going to have to manually look for, find, and build the dependencies one at a time....

“Source based” packaging systems, such as those you’d find in Slackware and Gentoo, are really not much more than tarball quality assurance systems. The tarballs are first tested to actually work (or tweaked to work) with various architectures, and then put in a central repository. Additionally, they are tagged for their “configure” dependencies. So, you could simply type something like “install openoffice”, and all the dependent tarballs are downloaded, built, and installed first. Because all these tarballs are tested in advance, you could be sure the process will go smoothly.

Assuring quality could be, in fact, very difficult. The list of dependencies can be enormous, and these dependencies themselves keep getting upgraded. Managing the whole repository is very hard. And of course, there’s a lot of room for security breaches, via malicious tarballs. Screwing up an important dependency can screw up large parts of the system.

Enter Debian. The point is not to create an entirely new standard, which is simply impossible. Thousands of projects release their code in tarballs, and you can’t expect them to adapt to any standard you might introduce. You’re stuck with tarballs, and that’s it. The point is to make these tarballs manageable, and easy to test.

Debian, then, introduces two kinds of packages. Source packages, with the .dsc extension, are very similar to what you have in Slackware/Gentoo. They are tarballs wrapped in a “control” file, which describes build dependencies, supported architectures, etc. Binary packages, with the .deb extension, are pre-built. There would be a separate .deb per architecture. They have a similar “control” file.

Not many people seem to know this, but Debian systems can be built entirely from source, just like Slackware/Gentoo. Debian is not a binary packaging system, but a dual system. Because every .deb in the repository must have a .dsc, there’s no reason why you can’t use those those .dsc. It’s simply much faster to get a ready-made binary .deb.

Now, here comes the complicated part. Packages are usually signed with digital keys, ensuring that only approved packages get into the repository. This, however, would seem to require that there to be one maintainer who tests and does the signing. Hard work! And so, in addition to the package specification, Debian introduced APT (Advanced Packaging Tool), a standard for repository management.

APT repositories are not simply file servers. They define and maintain relationships between parts. Thus, .dsc packages are tightly linked to their respective .deb packages by digital signing. Different sections of the repository can use different keys, and they can be linked for an approval process. For example, one section can be “test”. I, a lowly coder, would upload my .dsc package to it, signing it with my approved key. The grand maintainer would review my package, see that it builds, and if it’s OK, move it to the “approved” section, with her key (that I don’t have access to). Changes would thus propagate slowly to the core, and finally to the operating system distribution itself.

Note that I upload a .dsc, not a binary .deb package! The repository does the building! This makes sure that the binaries are built from the correct source, so that I can’t upload nice source and a malicious binary. Security is guaranteed.

This gets a bit more complicated... The thing is, source packages can be quite enormous. It doesn’t make sense to upload the entire .dsc for one small change, and expect the maintainers to find that change during review. And so, the APT system defines a “changes” file, which contains only the differences from the current .dsc. In fact, you can only upload “changes” to an APT repository. Even the initial upload of a .dsc is defined as “changes” from nothing.

Actually, that requirement is a huge complication, and one that caused me most of my grief. It’s quite impossible to create this “changes” file by hand. You need tools to do the comparing and digital signing. You need to learn how to use them. The only way is to create a complete Debian build environment, using a suite of tools called “debhelper.” These produce .dsc, .deb, .changes and a host of other files for the Debian packaging system. There are more than 20 different tools to learn. Remember, this system is supposed to work for source code of any arbitrary language on any arbitrary architecture. It’s expected to be very flexible, hence complicated. Most of this complication is a file called “rules” which defines how a .dsc gets built into a .deb. The “rules” file is, in fact, a GNU “make” file. To make things easier, a project called CDBS (Common Debian Build System) created sets of “rules,” called “classes,” for most common types of build tasks. They really do much of the work for you, but you still need to learn how “rules” works before you use them.

In the old days it was enough to set a tarball loose into the wild. With Debian, in addition to creating that tarball, you need to learn how to use debhelper and possibly CDBS, in order to create that final “changes” file, which will allow you entry into the APT repository. It’s a learning curve, but the beauty of it is that, once you do that, you’re done. Every time you’re ready to make an update to your project, you run “debuild” to create a new “changes”, and that’s it. These changes will slowly propagate through the repositories, passing through various members of the community, being tested and digitally signed. Your tarball is no longer in the wild. It’s part of a tested and safe collection of free software, automatically accessible by any operating system which uses Debian packaging.

One final note: The “beauty” of this system is also its downfall, in that it can take a very long time for changes to reach users. What if you want to deliver right now? There is one solution, and one workaround.

The solution is Launchpad, a service from Canonical that allows you to create your own Personal Package Archive (PPA). The PPA is a true APT repository, entirely at your command. The nice thing about this is that you can use Launchpad to start a project, and as it matures, your PPA can be merged into the Ubuntu repositories. In fact... this is exactly how the Ubuntu operating system is developed! As you can see, Ubuntu is strongly oriented around the Debian community process, and in fact enhances it through Launchpad. Launchpad fills in an important gap in making software quickly available to users, and also bring users into the testing process early on.

The quick-and-dirty workaround is simply to host a .deb file, rather than use an APT repository. You don’t need to upload a “changes” file or any of that messy nonsense. You don’t even need a .dsc package. Thus, you don’t really need to use the Debian build suite. Learning how to make a .deb is very easy. However, for new projects, I strongly recommend learning the Debian build suite, and using Launchpad, thus getting it right from the start, as well as making it automatic for users to get updates. If you’re ambitious about your project, you’re going to have to learn the Debian build suite at some point, so why not start off with it?

You can host your .deb anywhere on the net, but GetDeb is a good place for it. GetDeb was not meant for new projects, and definitely not as a replacement for APT, but for the freshest (and poorly tested) versions of software that already exists in the repositories, in cases where some users might really, really need these updates and can’t wait for the Debian community process turnaround.