Email: email@example.com • Twitter: @benhutchingsuk • Debian: benh • Gitweb: git.decadent.org.uk • Github: github.com/bwhacks
I was assigned 12.75 hours of work by Freexian's Debian LTS initiative and carried over 5.5 from December. I worked only 3 hours, so I carry over 15.25 hours - but I will probably give up some of those to the general pool.
I spent some time finishing off the linux security update mentioned in my December entry. I also backported the current version of wireless-regdb - not a security update, but an important one anyway - and issued DLA 785-1.
There are a fair number of outstanding security issues in the Linux kernel for Debian 8 "jessie", but none of them were considered serious enough to issue a security update and DSA. Instead, most of them are being fixed through the point release (8.7) which will be released this weekend. Don't forget that you need to reboot to complete a kernel upgrade.
This update to linux (version 3.16.39-1) also adds the perf security mitigation feature from Grsecurity. You can disable unprivileged use of perf entirely by setting sysctl kernel.perf_event_paranoid=3. (This is the default for Debian "stretch".)
I was assigned 13.5 hours of work by Freexian's Debian LTS initiative and carried over 2 from November. I worked only 10 hours, so I carry over 5.5 hours.
As for the last few months, I spent all of this time working on the linux (kernel) package. I backported several security fixes and did some testing of the more invasive changes.
I also added the option to mitigate security issues in the performance events (perf) subsystem by disabling use by unprivileged users. This feature comes from Grsecurity and has been included in Debian unstable and Android kernels for a while. However, for Debian 7 LTS it has to be explicitly enabled by setting sysctl kernel.perf_event_paranoid=3.
I uploaded these changes as linux 3.2.84-1 and then (on 1st January) issued DLA 722-1.
I was assigned 11 hours of work by Freexian's Debian LTS initiative. I worked 9 hours and carry over 2 hours.
In my role as Linux 3.2 stable maintainer, I made a 3.2.84 release with a large number of backported fixes. I then rebased wheezy's linux package on this and made some additional changes to maintain the kernel module ABI. This will probably be released some time in December or January, depending on what security issues turn up.
I attended this year's Linux Kernel Summit in Santa Fe, NM, USA and made notes on some of the sessions that were relevant to Debian. LWN also reported many of the discussions. This is the second and last part of my notes; part 1 is here.
Updated: I corrected the description of which Intel processors support SMEP. Updated again: I made several more corrections, thanks to PaX Team.
Kees Cook presented the ongoing work on upstream kernel hardening, also known as the Kernel Self-Protection Project or KSPP.
The kernel build system can now build and use GCC plugins to implement some protections. This requires gcc 4.5 and the plugin headers installed. It has been tested on x86, arm, and arm64. It is disabled by CONFIG_COMPILE_TEST because CI systems using allmodconfig/allyesconfig probably don't have those installed, but this ought to be changed at some point.
There was a question as to how plugin headers should be installed for cross-compilers or custom compilers, but I didn't hear a clear answer to this. Kees has been prodding distribution gcc maintainers to package them. Mark Brown mentioned the Linaro toolchain being widely used; Kees has not talked to its maintainers yet.
These protections are based on hidden state that an attacker will need to discover in order to make an effective attack; they reduce the probability of success but don't prevent it entirely.
Kernel address space layout randomisation (KASLR) has now been implemented on x86, arm64, and mips for the kernel image. (Debian enables this.) However there are still lots of information leaks that defeat this. This could theoretically be improved by relocating different sections or smaller parts of the kernel independently, but this requires re-linking at boot. Aside from software information leaks, the branch target predictor on (common implementations of) x86 provides a side channel to find addresses of branches in the kernel.
Page and heap allocation, etc., is still quite predictable.
struct randomisation (RANDSTRUCT plugin from grsecurity) reorders members in (a) structures containing only function pointers (b) explicitly marked structures. This makes it very hard to attack custom kernels where the kernel image is not readable. But even for distribution kernels, it increases the maintenance burden for attackers.
These protections block a class of attacks completely.
Read-only protection of kernel memory is either mandatory or enabled by default on x86, arm, and arm64. (Debian enables this.)
Protections against execution of
user memory in kernel mode are now implemented in hardware on x86
(SMEP, in Intel processors from
onward) and on arm64 (PXN, from
ARMv8.1). But Skylake is not
available for servers and ARMv8.1 is not yet implemented at all! s390
always had this protection.
It may be possible to 'emulate' this using other hardware
protections. arm (v7) and arm64 now have this, but x86 doesn't.
Linus doesn't like the overhead of previously proposed
implementations for x86. It is possible to do this using PCID (in
Intel processors from
onward), which has already been done in PaX - and this should be
Virtually mapped stacks protect against stack overflow attacks. They were implemented as an option for x86 only in 4.9. (Debian enables this.)
Copies to or from user memory sometimes use a user-controlled size that is not properly bounded. Hardened usercopy, implemented as an option in 4.8 for many architectures, protects against this. (Debian enables this.)
Memory wiping (zero on free) protects against some information leaks and use-after-free bugs. It was already implemented as debug feature with non-zero poison value, but at some performance cost. Zeroing can be cheaper since it allows allocator to skip zeroing on reallocation. That was implemented as an option in 4.6. (Debian does not currently enable this but we might do if the performance cost is low enough.)
Constification (with the CONSTIFY gcc plugin) reduces the
amount of static data that can be written to. As
with RANDSTRUCT, this is applied to function pointer tables
and explicitly marked structures. Instances of some types need to
be modified very occasionally. In PaX/Grsecurity this is done with
globally disable write protection temporarily. It would be preferable
to override write protection in a more directed way, so that the
permission to write doesn't leak into any other code that interrupts
this process .
The feature is not in mainline yet.
Atomic wrap detction protects against reference-counting bugs which
can result in a use-after-free. Overflow and underflow are trapped
and result in an 'oops'. There is
no measurable performance
impact. It would be applied to all operations on
the atomic_t type, but there needs to be an opt-out for
atomics that are not ref-counters - probably by adding
an atomic_wrap_t type for them. This has been implemented
for x86, arm, and arm64 but is not in mainline yet.
For the second year running, Jiri Kosina raised the problem of 'freezing' kthreads (kernel-mode threads) in preparation for system suspend (suspend to RAM, or hibernation). What are the semantics? What invariants should be met when a kthread gets frozen? They are not defined anywhere.
Most freezable threads don't actually need to be quiesced. Also many non-freezable threads are pointlessly calling try_to_freeze() (probably due to copying code without understanding it)).
At a system level, what we actually need is I/O and filesystem consistency. This should be achieved by:
The system suspend code should not need to directly freeze threads.
Jon Corbet and Mauro Carvalho presented the recent work on kernel documentation.
The kernel's documentation system was a
house of cards
involving DocBook and a lot of custom scripting. Both the DocBook
templates and plain text files are gradually being converted
by Sphinx. However, manual
page generation is currently 'broken' for documents processed by
There are about 150 files at the top level of the documentation tree, that are being gradually moved into subdirectories. The most popular files, that are likely to be referenced in external documentation, have been replaced by placeholders.
Sphinx is highly extensible and this has been used to integrate kernel-doc. It would be possible to add extensions that parse and include the MAINTAINERS file and Documentation/ABI/ files, which have their own formats, but the documentation maintainers would prefer not to add extensions that can't be pushed to Sphinx upstream.
There is lots of obsolete documentation, and patches to remove those would be welcome.
Linus objected to PDF files recently added under the Documentation/media directory - they are not the source format so should not be there! They should be generated from the corresponding SVG or image files at build time.
Steve Rostedt and Shuah Khan led a discussion about tracepoints.
Currently each maintainer decides which tracepoints to create. The
cost of each added tracepoint is minimal, but the cost of very many
tracepoints is more substantial. So there is such a thing as too
many tracepoints, and we need a policy to decide when they are
justified. They advised not to create tracepoints
case, since kprobes can be used for tracing (almost) anywhere
There was some support for requiring documentation of each new tracepoint. That may dissuade introduction of obscure tracepoints, but also creates a higher expectation of stability.
Tools such as bcc and IOVisor are now being created that depend on specific tracepoints or even function names (through kprobes). Should we care about breaking them?
Linus said that
we should strive to be polite to developers
and users relying on tracepoints, but
if it's too painful to
maintain a tracepoint then we should go ahead and change it. Where
the end users of the tool are themselves developers it's more
reasonable to expect them to upgrade the tool and we should care
less about changing it. In some cases tracepoints could provide
dummy data for compatibility (as is done in some places in procfs).