This talk covers the file formats used in Debian and its derivatives for source and binary packages. A binary package contains code and/or data that's ready to install and use. A source package contains the source code and metadata used to build one or more binary packages.
We'll look at the 'hello' example package. On a Debian system and many derivatives, this is available by running:
apt source hello
This downloads the source package and unpacks it into a new directory.
The directory name will depend on the version, but for me it's
hello-2.10
.
At the top level of this directory we see all the upstream source.
'hello' is actually a GNU example project, so it has all the clutter
that GNU mandates. Ignore that for now; we're interested in the
debian
subdirectory, which contains the packaging control files.
These three files are absolutely required by the dpkg development tools.
This is a human-readable list of changes in each new version, along with machine-readable metadata about it. It's in reverse order. It's normally installed into every binary package.
There's an emacs mode for this changelog format in the dpkg-dev-el
package, and vim also recognises it.
We can extract information from the changelog using the
dpkg-parsechangelog
command; for example dpkg-parsechangelog
-SVersion
will output the version in the top entry.
This contains most of the other metadata for the source package, and templates for the metadata for the binary packages.
The metadata for binary packages can contain variable references which
will be substituted during the build process. For example,
${shlibs:Depends}
is replaced by a list of dependencies on shared
library packages.
The file format used here is similar to the format of Internet mail headers, and is known as 'deb822' after RFC 822 which specified the Internet mail format. This same general format is used in many different metadata files in Debian packages.
This is an executable makefile that supports at least some standard
targets. 'Executable makefile' means it has the x
permission bits
set, and its first line is:
#!/usr/bin/make -f
In principle, debian/rules
can be implemented using some other
scripting language, but Debian doesn't allow this.
Looking at the makefile, the first rule is something very strange:
%:
dh $@
This is a wildcard that defines commands for all targets that aren't
explicitly defined later. It runs the dh
command with the name of
the target.
dh
is part of the debhelper package, which provides commands to
assist in building packages. The vast majority of Debian packages
use debhelper, and a large proportion use the newer dh
command.
dh
is smart enough to recognise that the upstream build system is
autotools, so for example it can build using ./configure && make
.
So instead of writing more or less the same rules as for thousands of
other packages, the maintainer only needed to write explicit rules for
a few exceptions where dh
didn't automatically do the right thing.
debhelper is pretty well documented in manual pages, so it's generally quite easy to write the rules file this way.
As this is an example package, there's also debian/rules-old
which uses the older debhelper commands but not dh
.
Almost all source packages should also contain these files.
This contains the copyright statements and licence texts for the package, to be installed into binary packages. Debian specifies a machine-readable format for this file, but it's not mandatory: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
This can be omitted if the package adds copyright information to its binary packages in another way.
Debian supports multiple variants of the source package format, and
this file specifies which variant we're using. Here it's '3.0
(quilt)' which is the most common variant. The 'quilt' part means
that any changes to the upstream source code are stored as a patch
series under debian/patches
. (There aren't any in this package.)
It could also be '3.0 (native)' which means that there is no
separation between upstream and Debian parts, or '1.0' which is the
original and largely obsolete source format.
This can be omitted by packages using the 1.0 format.
Most source packages will also contain these files.
debhelper is continually being improved, but some of those
improvements aren't totally backward compatible so they are opt-in. A
source package using debhelper can specifiy which version of the
debhelper API it wants through this file. However, the recommended
way to do this is now through the debian/control
file; you can read
the details in the debhelper(7) manual page.
This defines how to poll for new upstream versions. The implementation and the documentation for the file format are in the 'uscan' package.
There are many other files that dpkg, debhelper and other tools look
for under the debian
subdirectory. These are documented in the
respective manual pages.
Unlike source RPMs, Debian source packages are made of multiple files even when packed. Part of the reason for this is to separate upstream code from Debian-specific changes, which some licences require.
For the 'hello' example package, APT downloaded these files:
hello_2.10-2.dsc
- Debian source controlhello_2.10.orig.tar.gz
- upstream source tarballhello_2.10-2.debian.tar.xz
- Debian-specific tarballThe first of these is the Debian source control (dsc) file. APT
locates and verifies this file through the source package index in the
archive. It contains metadata that was generated from the
debian/changelog
and debian/control
files, plus the names, sizes
and checksums of all the other files. APT uses this to locate and
verify those other files.
We can unpack this by running dpkg-source -x hello_2.10-2.dsc
,
but APT already did that for us.
dpkg-source -b hello-2.10
will do (roughly) the inverse: it will
create the dsc
and Debian tarball files. It will never create a new
upstream tarball; that's typically downloaded or otherwise created by
uscan. In case anything was built in the source directory, this won't
remove the build products. So normally a source package is built
using the higher-level command dpkg-buildpackage
.
The main command used to build binary (and source) packages and to
prepare an upload is dpkg-buildpackage
. It has many options for
which packages it builds, whether to sign them, whether to upload the
upstream source, and so on. It invokes various other commands
including dpkg-checkbuilddeps
which checks that build-dependencies
are satisifed, dpkg-source
, debian/rules
, and dpkg-genchanges
which generates a list of files to be uploaded to a package archive
server.
Unlike some other build tools, dpkg-buildpackage
always builds in
the current directory and the current OS installation. To build in a
more controlled environment, we would need to use a higher level tool
such as sbuild or pbuilder.
You're probably familiar with these. Their filenames follow the convention:
name_
version_
architecture.deb
so for hello we got:
hello_2.10-2_amd64.deb
hello-dbgsym_2.10-2_amd64.deb
The format of these files is documented in the deb(5) manual page, but
in short it's an 'ar' archive (a format originally designed for static
libraries of code). It contains a debian-binary
file which specifies
the format version, and tarballs for 'control' and 'data'.
(The second binary package is an automatically generated debug symbol
package, which isn't listed in debian/control
. There's some hackery
in debhelper and the Debian archive software that allows this.)
We can unpack (rather than install) a binary package using the
ar
and tar
commands, but it's easier to use dpkg-deb -R
.
This unpacks the data tarball into the specified directory and
the control tarball into the DEBIAN
subdirectory.
Looking at DEBIAN/control
, we see the metadata for this binary
package generated from the debian/control
file in the source
package. The ${shlibs:Depends}
variable has been replaced by a
dependency on libc6
. This metadata can also be displayed without
unpacking the package, using dpkg-deb -I
.
The DEBIAN/md5sums
file contains checksums for all the files in the
data tarball. This can be useful for checking for accidental changes
or corruption.
We can install a binary package with dpkg -i
or a higher level tool
like APT. dpkg
will always check dependencies and will abort an
installation if they're not met, but it doesn't know how to resolve
them. APT knows how to resolve dependencies and to fetch and install
packages in the right order.
When a package is installed, all the files in the data tarball are
extracted into the root of the filesystem. The files in the control
tarball are incorporated into dpkg's status files under
/var/lib/dpkg
. This is private to dpkg, so you shouldn't access
it directly.
We can use dpkg
to list all the 'data' files for an installed package:
dpkg -L hello
We can check the installed 'data' files against the package's
checksums with the debsums
command:
debsums hello # show status of each (non-config) file
debsums -c hello # only show changed (non-config) files
We can show the metadata for the package - mostly copied from the
control
file:
dpkg -s hello
We can list and show all the other files from the control tarball:
dpkg-query --control-list hello
dpkg-query --control-show hello md5sums
man
command
or manpages.debian.org