Better living through software

Ben Hutchings's diary of life and technology

Email: ben@decadent.org.uk • Twitter: @benhutchingsuk • Debian: benh • Gitweb: git.decadent.org.uk • Github: github.com/bwhacks

Mon, 05 Dec 2016

Linux Kernel Summit 2016, part 2

I attended this year's Linux Kernel Summit in Santa Fe, NM, USA and made notes on some of the sessions that were relevant to Debian. LWN also reported many of the discussions. This is the second and last part of my notes; part 1 is here.

Updated: I corrected the description of which Intel processors support SMEP.

Kernel Hardening

Kees Cook presented the ongoing work on upstream kernel hardening, also known as the Kernel Self-Protection Project or KSPP.

GCC plugins

The kernel build system can now build and use GCC plugins to implement some protections. This requires gcc 4.5 and the plugin headers installed. It has been tested on x86, arm, and arm64. It is disabled by CONFIG_COMPILE_TEST because CI systems using allmodconfig/allyesconfig probably don't have those installed, but this ought to be changed at some point.

There was a question as to how plugin headers should be installed for cross-compilers or custom compilers, but I didn't hear a clear answer to this. Kees has been prodding distribution gcc maintainers to package them. Mark Brown mentioned the Linaro toolchain being widely used; Kees has not talked to its maintainers yet.

Probabilistic protections

These protections are based on hidden state that an attacker will need to discover in order to make an effective attack; they reduce the probability of success but don't prevent it entirely.

Kernel address space layout randomisation (KASLR) has now been implemented on x86, arm64, and mips for the kernel image. (Debian enables this.) However there are still lots of information leaks that defeat this. This could theoretically be improved by relocating different sections or smaller parts of the kernel independently, but this requires re-linking at boot. Aside from software information leaks, the branch target predictor on (common implementations of) x86 provides a side channel to find addresses of branches in the kernel.

Page and heap allocation, etc., is still quite predictable.

struct randomisation (RANDSTRUCT plugin from grsecurity) reorders members in (a) structures containing only function pointers (b) explicitly marked structures. This makes it very hard to attack custom kernels where the kernel image is not readable. But even for distribution kernels, it increases the maintenance burden for attackers.

Deterministic protections

These protections block a class of attacks completely.

Read-only protection of kernel memory is either mandatory or enabled by default on x86, arm, and arm64. (Debian enables this.)

Protections against execution of user memory in kernel mode are now implemented in hardware on x86 (SMEP, in Intel processors from Skylake Broadwell onward) and on arm64 (PXN, from ARMv8.1). But Skylake Broadwell is not available for servers in high-end server variants and ARMv8.1 is not yet implemented at all! s390 always had this protection.

It may be possible to 'emulate' this using other hardware protections. arm (v7) and arm64 now have this, but x86 doesn't. Linus doesn't like the overhead of previously proposed implementations for x86. It is possible to do this using PCID (in Intel processors from Sandy Bridge onward), which has already been done in PaX - and this should be fast enough.

Virtually mapped stacks protect against stack overflow attacks. They were implemented as an option for x86 only in 4.9. (Debian enables this.)

Copies to or from user memory sometimes use a user-controlled size that is not properly bounded. Hardened usercopy, implemented as an option in 4.8 for many architectures, protects against this. (Debian enables this.)

Memory wiping (zero on free) protects against some information leaks and use-after-free bugs. It was already implemented as debug feature with non-zero poison value, but at some performance cost. Zeroing can be cheaper since it allows allocator to skip zeroing on reallocation. That was implemented as an option in 4.6. (Debian does not currently enable this but we might do if the performance cost is low enough.)

Constification (with the CONSTIFY gcc plugin) reduces the amount of static data that can be written to. As with RANDSTRUCT, this is applied to function pointer tables and explicitly marked structures. Instances of some types need to be modified very occasionally. In PaX/Grsecurity this is done with pax_{open,close}_kernel() which globally disable write protection temporarily. It would be preferable to override write protection in a more directed way, so that the permission to write doesn't leak into any other code that interrupts this process. The feature is not in mainline yet.

Atomic wrap detction protects against reference-counting bugs which can result in a use-after-free. Overflow and underflow are trapped and result in an 'oops'. There is no measurable performance impact. It would be applied to all operations on the atomic_t type, but there needs to be an opt-out for atomics that are not ref-counters - probably by adding an atomic_wrap_t type for them. This has been implemented for x86, arm, and arm64 but is not in mainline yet.

Kernel Freezer Hell

For the second year running, Jiri Kosina raised the problem of 'freezing' kthreads (kernel-mode threads) in preparation for system suspend (suspend to RAM, or hibernation). What are the semantics? What invariants should be met when a kthread gets frozen? They are not defined anywhere.

Most freezable threads don't actually need to be quiesced. Also many non-freezable threads are pointlessly calling try_to_freeze() (probably due to copying code without understanding it)).

At a system level, what we actually need is I/O and filesystem consistency. This should be achieved by:

The system suspend code should not need to directly freeze threads.

Kernel Documentation

Jon Corbet and Mauro Carvalho presented the recent work on kernel documentation.

The kernel's documentation system was a house of cards involving DocBook and a lot of custom scripting. Both the DocBook templates and plain text files are gradually being converted to reStructuredText format, processed by Sphinx. However, manual page generation is currently 'broken' for documents processed by Sphinx.

There are about 150 files at the top level of the documentation tree, that are being gradually moved into subdirectories. The most popular files, that are likely to be referenced in external documentation, have been replaced by placeholders.

Sphinx is highly extensible and this has been used to integrate kernel-doc. It would be possible to add extensions that parse and include the MAINTAINERS file and Documentation/ABI/ files, which have their own formats, but the documentation maintainers would prefer not to add extensions that can't be pushed to Sphinx upstream.

There is lots of obsolete documentation, and patches to remove those would be welcome.

Linus objected to PDF files recently added under the Documentation/media directory - they are not the source format so should not be there! They should be generated from the corresponding SVG or image files at build time.

Issues around Tracepoints

Steve Rostedt and Shuah Khan led a discussion about tracepoints. Currently each maintainer decides which tracepoints to create. The cost of each added tracepoint is minimal, but the cost of very many tracepoints is more substantial. So there is such a thing as too many tracepoints, and we need a policy to decide when they are justified. They advised not to create tracepoints just in case, since kprobes can be used for tracing (almost) anywhere dynamically.

There was some support for requiring documentation of each new tracepoint. That may dissuade introduction of obscure tracepoints, but also creates a higher expectation of stability.

Tools such as bcc and IOVisor are now being created that depend on specific tracepoints or even function names (through kprobes). Should we care about breaking them?

Linus said that we should strive to be polite to developers and users relying on tracepoints, but if it's too painful to maintain a tracepoint then we should go ahead and change it. Where the end users of the tool are themselves developers it's more reasonable to expect them to upgrade the tool and we should care less about changing it. In some cases tracepoints could provide dummy data for compatibility (as is done in some places in procfs).

posted at: 00:01 | path: / | permanent link to this entry

Sat, 03 Dec 2016

Linux Kernel Summit 2016, part 1

I attended this year's Linux Kernel Summit in Santa Fe, NM, USA and made notes on some of the sessions that were relevant to Debian. LWN also reported many of the discussions. This is the first of two parts of my notes; part 2 is here.

Stable process

Jiri Kosina, in his role as a distribution maintainer, sees too many unsuitable patches being backported - e.g. a fix for a bug that wasn't present or a change that depends on an earlier semantic change so that when cherry-picked it still compiles but isn't quite right. He thinks the current review process is insufficient to catch them.

As an example, a recent fix for a minor information leak (CVE-2016-9178) depended on an earlier change to page fault handling. When backported by itself, it introduced a much more serious security flaw (CVE-2016-9644). This could have been caught very quickly by a system call fuzzer.

Possible solutions: require 'Fixes' field, not just 'Cc: stable'. Deals with 'bug wasn't present', but not semantic changes.

There was some disagreement whether 'Fixes' without 'Cc: stable' should be sufficient for inclusion in stable. Ted Ts'o said he specifically does that in some cases where he thinks backporting is risky. Greg Kroah-Hartman said he takes it as a weaker hint for inclusion in stable.

Is it a good idea to keep 'Cc: stable' given the risk of breaking embargo? On balance, yes, it only happened once.

Sometimes it's hard to know exactly how/when the bug was introduced. Linus doesn't want people to guess and add incorrect 'Fixes' fields. There is still the option to give some explanation and hints for stable maintainers in the commit message. Ideally the upstream developer should provide a test case for the bug.

Is Linus happy?

Linus complained about minor fixes coming later in the release cycle. After rc2, all fixes should either be for new code introduced in the current release cycle or for important bugs. However, new, production-ready drivers without new infrastructure dependencies are welcome at almost any point in the release cycle.

He was unhappy about some big changes in RDMA, but I'm not sure what those were.

Bugzilla and bug tracking

Laura Abbott started a discussion of bugzilla.kernel.org, talking about subsystems where maintainers ignore it and any responses come from random people giving bad advice. This is a terrible experience for users. Several maintainers are actively opposed to using it, and the email bridge no longer works (or not well?). She no longer recommends Fedora bug submitters to submit reports there.

Are there any alternatives? None were proposed.

Someone asked whether Bugzilla could tell reporters to use email for certain products/components instead of continuing with the bug entry process.

Konstantin Ryabitsev talked about the difficulty of upgrading a customised instance of Bugzilla. Much customisation requires patches which don't apply to next version (maybe due to limitations of the extension mechanism?). He has had to drop many such patches.

Email is hard to track when a bug is handed over from one maintainer to another. Email archives are very unreliable. Linus: I'll take Bugzilla over mail-archive.

No-one is currently keeping track of bugs across the kernel and making sure they get addressed by an appropriate maintainer. It's (at least) a full-time job but no individual company has business case for paying for this. Konstantin suggested (I think) that CII might pay for this.

There was some discussion of what information should be included in a bug report. The Cut here line in oops messages was said to be a mistake because there are often relevant messages before it. The model of computer is often important. Beyond that, there was not much interest in the automated information gathering that distributions do. Distribution maintainers should curate bugs before forwarding upstream.

There was a request for custom fields per component in Bugzilla. Konstantin says this is doable (possibly after upgrade to version 5); it doesn't require patches.

The future of the Kernel Summit

The kernel community is growing, and the invitation list for the core day is too small to include all the right people for technical subjects. For 2017, the core half-day will have an even smaller invitation list, only ~30 subsystem maintainers that Linus pulls from. The entire technical track will be open (I think).

Kernel Summit 2017 and some mini-summits will be held in Prague alongside Open Source Summit Europe (formerly LinuxCon Europe) and Embedded Linux Conference Europe. There were some complaints that LinuxCon is not that interesting to kernel developers, compared to Linux Plumbers Conference (which followed this year's Kernel Summit). However, the Linux Foundation is apparently soliciting more hardcore technical sessions.

Kernel Summit and Linux Plumbers Conference are quite small, and it's apparently hard to find venues for them in cities that also have major airports. It might be more practical to co-locate them both with Open Source Summit in future.

time_t and 2038

On 32-bit architectures the kernel's representation of real time (time_t etc.) will break in early 2038. Fixing this in a backward-compatible way is a complex problem.

Arnd Bergmann presented the current status of this process. There has not yet been much progress in mainline, but more fixes have been prepared. The changes to struct inode and to input events are proving to be particularly problematic. There is a need to add new system calls, and he intends to add these for all (32-bit) achitectures at once.

Copyright retention

James Bottomley talked about how developers can retain copyright on their contributions. It's hard to renegotiate within an existing employment; much easier to do this when preparing to sign a new contract.

Some employers expect you to fill in a document disclosing 'prior inventions' you have worked on. Depending on how it's worded, this may require the employer to negotiate with you again whenever they want you to work on that same software.

It's much easier for contractors to retain copyright on their work - customers expect to have a custom agreement and don't expect to get copyright on contractor's software.

posted at: 23:54 | path: / | permanent link to this entry

Mon, 14 Nov 2016

Debian LTS work, October 2016

I was assigned 13.75 hours of work by Freexian's Debian LTS initiative and worked all of them.

I reviewed the fix for CVE-2016-7796 in wheezy's systemd, which needed substantial changes and a few iterations to get right.

I updated linux to the 3.2.82 stable release (and 3.2.82-rt119 for PREEMPT_RT), and added fixes for several security issues including CVE-2016-5195 "Dirty Cow". I uploaded and issued DLA-670-1.

In my role as Linux 3.2 stable maintainer, I made a 3.2.83 release fixing just that issue, and started to prepare a 3.2.84 release with many more fixes.

I cleaned up my work on imagemagick, but didn't go further through the backlog of issues. I put the partly updated package on people.debian.org for another LTS maintatainer to pick up.

posted at: 17:03 | path: / | permanent link to this entry

Thu, 06 Oct 2016

Debian LTS work, September 2016

I was assigned 12.3 hours of work by Freexian's Debian LTS initiative and carried over 1.45 from last month. I was unwell for much of this month and only worked 6 hours on LTS. I returned 7 hours to the pool and carry over 0.75 hours.

I wrote and sent the DLA for linux 3.2.81-2, and I discussed various handling of various issues on the debian-lts mailing list. Most of my time was spent working on the long backlog of security issues in imagemagick. I hope to complete this and upload a fixed version this month.

posted at: 18:39 | path: / | permanent link to this entry

Tue, 27 Sep 2016

Debian LTS work, August 2016

I was assigned 14.75 hours of work by Freexian's Debian LTS initiative and carried over 0.7 from last month. I worked a total of 14 hours, carrying over 1.45 hours.

I finished preparing and finally uploaded an update for linux (3.2.81-2). This took longer than expected due to the difficulty of reproducing CVE-2016-5696 and verifying the backported fix. I also released an upstream stable update (3.2.82) which will go into the next update in wheezy LTS.

I discussed a few other security updates and issues on the debian-lts mailing list.

posted at: 11:41 | path: / | permanent link to this entry