Searching for invalid object code
Jérémy Bobbio suggested that I should explain how I looked for packages affected by a compiler bug (Debian bug 506713; gcc bug 38287). I don't claim that this is a particularly good way to do it, but here it is:
First, I identified a pattern to search for. Unfortunately I don't really understand the cause or fix for the bug, but I did have the example which led to this bug report: Debian bug 490999. The bad code was a stack-pointer-relative load immediately after a stack allocation (SPARC save instruction) where the offset was not adjusted for the stack allocation:
I generalised this to:save %sp, -112, %sp ld [ %sp + 0x40 ], %i5
save %sp, offset1, sp ... ld [ %sp + offset2 ], register
where offset1 + offset2 < 0. Of course, this may
be valid if the intervening instructions include a restore
,
branch or store to the effective stack location that the last instruction
loads from. I ended up allowing up to 10 intervening instructions and
examining a disassembly to work out which cases were valid.
I looked up the instruction encoding for these two instructions. Thankfully SPARC is RISC so they are simple and regular:
save %sp, offset1, sp
is encoded as0x9de3a000 | offset1 & 0x1fff
ld [ %sp + offset2 ], register
is encoded as0xc003a000 | reg << 25 | offset2 & 0x1fff
I took the dumb but effective approach of scanning entire files for this pattern rather than only scanning their code sections. This seemed to work - I got no false hits for non-code - but might not work for other patterns that could match ASCII text.
I wrote the scanning program in Python, which is my default choice of
language unless I know it's going to be too slow. I was hoping to be able
to read the code files into arrays, but unfortunately the Python array
type only supports the native byte-order (SPARC is big-endian and I was
intending to use an x86 which is little-endian). I tried reading into a
tuple using struct.unpack, which does support explicit byte-ordering, but
this used so much memory for larger files that the program swapped to a
crawl. So finally I resorted to reading the file into a string, doing a
string search for '\x9d\xe3', rejecting matches that weren't appropriately
aligned, then unpacking and comparing the code words from the point of the
string match.
(In Python 3.0 I would have to use the bytes
type for this,
as str
is a Unicode string type.)
So that's how I scanned single files. The next step was to find, unpack
and scan all the SPARC shared libraries in the archive. (This particular
code generation bug is understood to affect only PIC code, and that is
normally only used in shared libraries.) I wrote functions to search
Contents-sparc for shared library files - assumed to match the pattern
([^\s]*/lib[^/\s]+\.so(?:\.[^/\s]*)?)
- and to parse
Packages to find the filenames for the packages containing those files.
The latter uses the debian_bundle.deb822
module from
python-debian.
The last key function downloads and unpacks a package using
wget
and dpkg-deb
. I could have used
the httplib
module for downloading but I correctly
anticipated that I'd need to restart the script several times so I
wanted to cache the packages which was easier to do using
wget
.
So, that's the explanation. If you really want to see it, here's the code.