This talk covers the file formats used in Debian and its derivatives for source and binary packages. A binary package contains code and/or data that's ready to install and use. A source package contains the source code and metadata used to build one or more binary packages.
We'll look at the 'hello' example package. On a Debian system and many derivatives, this is available by running:
apt source hello
This downloads the source package and unpacks it into a new directory.
The directory name will depend on the version, but for me it's
hello-2.10.
At the top level of this directory we see all the upstream source.
'hello' is actually a GNU example project, so it has all the clutter
that GNU mandates. Ignore that for now; we're interested in the
debian subdirectory, which contains the packaging control files.
These three files are absolutely required by the dpkg development tools.
This is a human-readable list of changes in each new version, along with machine-readable metadata about it. It's in reverse order. It's normally installed into every binary package.
There's an emacs mode for this changelog format in the dpkg-dev-el
package, and vim also recognises it.
We can extract information from the changelog using the
dpkg-parsechangelog command; for example dpkg-parsechangelog
-SVersion will output the version in the top entry.
This contains most of the other metadata for the source package, and templates for the metadata for the binary packages.
The metadata for binary packages can contain variable references which
will be substituted during the build process. For example,
${shlibs:Depends} is replaced by a list of dependencies on shared
library packages.
The file format used here is similar to the format of Internet mail headers, and is known as 'deb822' after RFC 822 which specified the Internet mail format. This same general format is used in many different metadata files in Debian packages.
This is an executable makefile that supports at least some standard
targets. 'Executable makefile' means it has the x permission bits
set, and its first line is:
#!/usr/bin/make -f
In principle, debian/rules can be implemented using some other
scripting language, but Debian doesn't allow this.
Looking at the makefile, the first rule is something very strange:
%:
dh $@
This is a wildcard that defines commands for all targets that aren't
explicitly defined later. It runs the dh command with the name of
the target.
dh is part of the debhelper package, which provides commands to
assist in building packages. The vast majority of Debian packages
use debhelper, and a large proportion use the newer dh command.
dh is smart enough to recognise that the upstream build system is
autotools, so for example it can build using ./configure && make.
So instead of writing more or less the same rules as for thousands of
other packages, the maintainer only needed to write explicit rules for
a few exceptions where dh didn't automatically do the right thing.
debhelper is pretty well documented in manual pages, so it's generally quite easy to write the rules file this way.
As this is an example package, there's also debian/rules-old
which uses the older debhelper commands but not dh.
Almost all source packages should also contain these files.
This contains the copyright statements and licence texts for the package, to be installed into binary packages. Debian specifies a machine-readable format for this file, but it's not mandatory: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
This can be omitted if the package adds copyright information to its binary packages in another way.
Debian supports multiple variants of the source package format, and
this file specifies which variant we're using. Here it's '3.0
(quilt)' which is the most common variant. The 'quilt' part means
that any changes to the upstream source code are stored as a patch
series under debian/patches. (There aren't any in this package.)
It could also be '3.0 (native)' which means that there is no
separation between upstream and Debian parts, or '1.0' which is the
original and largely obsolete source format.
This can be omitted by packages using the 1.0 format.
Most source packages will also contain these files.
debhelper is continually being improved, but some of those
improvements aren't totally backward compatible so they are opt-in. A
source package using debhelper can specifiy which version of the
debhelper API it wants through this file. However, the recommended
way to do this is now through the debian/control file; you can read
the details in the debhelper(7) manual page.
This defines how to poll for new upstream versions. The implementation and the documentation for the file format are in the 'uscan' package.
There are many other files that dpkg, debhelper and other tools look
for under the debian subdirectory. These are documented in the
respective manual pages.
Unlike source RPMs, Debian source packages are made of multiple files even when packed. Part of the reason for this is to separate upstream code from Debian-specific changes, which some licences require.
For the 'hello' example package, APT downloaded these files:
hello_2.10-2.dsc - Debian source controlhello_2.10.orig.tar.gz - upstream source tarballhello_2.10-2.debian.tar.xz - Debian-specific tarballThe first of these is the Debian source control (dsc) file. APT
locates and verifies this file through the source package index in the
archive. It contains metadata that was generated from the
debian/changelog and debian/control files, plus the names, sizes
and checksums of all the other files. APT uses this to locate and
verify those other files.
We can unpack this by running dpkg-source -x hello_2.10-2.dsc,
but APT already did that for us.
dpkg-source -b hello-2.10 will do (roughly) the inverse: it will
create the dsc and Debian tarball files. It will never create a new
upstream tarball; that's typically downloaded or otherwise created by
uscan. In case anything was built in the source directory, this won't
remove the build products. So normally a source package is built
using the higher-level command dpkg-buildpackage.
The main command used to build binary (and source) packages and to
prepare an upload is dpkg-buildpackage. It has many options for
which packages it builds, whether to sign them, whether to upload the
upstream source, and so on. It invokes various other commands
including dpkg-checkbuilddeps which checks that build-dependencies
are satisifed, dpkg-source, debian/rules, and dpkg-genchanges
which generates a list of files to be uploaded to a package archive
server.
Unlike some other build tools, dpkg-buildpackage always builds in
the current directory and the current OS installation. To build in a
more controlled environment, we would need to use a higher level tool
such as sbuild or pbuilder.
You're probably familiar with these. Their filenames follow the convention:
name_version_architecture.deb
so for hello we got:
hello_2.10-2_amd64.debhello-dbgsym_2.10-2_amd64.debThe format of these files is documented in the deb(5) manual page, but
in short it's an 'ar' archive (a format originally designed for static
libraries of code). It contains a debian-binary file which specifies
the format version, and tarballs for 'control' and 'data'.
(The second binary package is an automatically generated debug symbol
package, which isn't listed in debian/control. There's some hackery
in debhelper and the Debian archive software that allows this.)
We can unpack (rather than install) a binary package using the
ar and tar commands, but it's easier to use dpkg-deb -R.
This unpacks the data tarball into the specified directory and
the control tarball into the DEBIAN subdirectory.
Looking at DEBIAN/control, we see the metadata for this binary
package generated from the debian/control file in the source
package. The ${shlibs:Depends} variable has been replaced by a
dependency on libc6. This metadata can also be displayed without
unpacking the package, using dpkg-deb -I.
The DEBIAN/md5sums file contains checksums for all the files in the
data tarball. This can be useful for checking for accidental changes
or corruption.
We can install a binary package with dpkg -i or a higher level tool
like APT. dpkg will always check dependencies and will abort an
installation if they're not met, but it doesn't know how to resolve
them. APT knows how to resolve dependencies and to fetch and install
packages in the right order.
When a package is installed, all the files in the data tarball are
extracted into the root of the filesystem. The files in the control
tarball are incorporated into dpkg's status files under
/var/lib/dpkg. This is private to dpkg, so you shouldn't access
it directly.
We can use dpkg to list all the 'data' files for an installed package:
dpkg -L hello
We can check the installed 'data' files against the package's
checksums with the debsums command:
debsums hello # show status of each (non-config) file
debsums -c hello # only show changed (non-config) files
We can show the metadata for the package - mostly copied from the
control file:
dpkg -s hello
We can list and show all the other files from the control tarball:
dpkg-query --control-list hello
dpkg-query --control-show hello md5sums
man command
or manpages.debian.org