Here's the last chunk of notes I took at Linux Plumbers Conference earlier this month. See part 1 and part 2 if you missed them.

Real-time track

Etherpad: https://etherpad.net/p/LPC2019_Real_Time/timeslider#4945

Core scheduling for RT

Speaker: Peter Zijlstra

Details: https://linuxplumbersconf.org/event/4/contributions/417/

LWN article: https://lwn.net/Articles/799454/

This was about restricting which tasks share a core on CPUs with SMT/hyperthreading. There is current interest in doing this as a mitigation for speculation leaks, instead of disabling SMT altogether.

SMT also makes single-thread processing speed quite unpredictable, which is bad for RT, so it would be useful to prevent scheduling any other tasks on the same core as an RT task.

Gen-Z Linux Sub-system

Speakers: Jim Hull and Betty Dall of HPE

Details: https://linuxplumbersconf.org/event/4/contributions/301/

Summary
  • New interconnect protocol developed by large consortium
  • Memory-like fabric scalable to large numbers of components
  • Multiple PHY types supported (PCIe gen4, 25/50 Gbit Ethernet PHYs) for different reach/bandwidth/latency trade-offs
  • Can support unmodified OS through "logical PCI devices" and ACPI device description

Connections are point-to-point between "components". Switch components provide fan-out.

Components can be subdivided into "resources" and also have "interfaces".

No requirement for a single root (like typical PCIe) and there can be redundant connections forming a mesh.

Fabric can span multiple logical computers (OS instances). Fabric manager assigns components and resources to them, and configures routing.

Protocol is reliable; all writes are acknowledged (by default). However it is not ordered by default.

Components have single control space (like config space?) and single data space (up to 2⁶⁴ bytes). Control space has a fixed header and then additional structures for optional and per-interface registers.

Each component has 12-bit component ID (CID) which may be combined with 16-bit subnet ID (SID) for 28-bit global component ID (GCID).

Coherence is managed by software.

Bridge from CPU to Gen-Z needs MMUs to map between local physical address space and fabric address space. Normally also has DMA engines ("data movers") that can send and receive all types of Gen-Z packets and not just read/write. These bridges are configured by the local OS instance, not the fabric manager.

Adding a Gen-Z subsystem

Needed to:

  • Enable native device drivers that know how to share resources
  • Enable user-space fabric managers and local management service

Should behave similarly to PCI and USB, so far as possible. Leave policy to user-space. Deal with the fact that most features are optional.

The Gen-Z subsystem needs to provide APIs for tracking PASIDs in IOMMU and ZMMU. Similar requirements in PCIe; should this be generic?

How can Gen-Z device memories be mapped with huge pages?

Undecided whether a generic kernel API for data movers is desirable. This would help kernel I/O drivers but not user-space I/O (like RDMA).

Interrupts work very differently from MSI. Bridge may generate interrupts for explicit interrupt packets, data mover completions, and Unsolicited Event Packets (link change, hotplug, …).

Device discovery

All nodes run local management services. On Linux these will be in user-space (LLaMaS).

(This means LLaMaS will need to be included in the initramfs if the boot device is attached through Gen-Z.)

Manager will use netlink to announce when resource has been assigned to the local node. Kernel then creates kernel device for it.

Live patching

Etherpad: https://etherpad.net/p/LPC2019_Live_Patching/timeslider#3799

Do we need a Livepatch Developers Guide?

Moderator: Joe Lawrence

Details: https://linuxplumbersconf.org/event/4/contributions/512/

Reflections on kernel development process, quality and testing

Speaker: Dmitry Vyukov

Details: https://linuxplumbersconf.org/event/4/contributions/554/

Slides: https://linuxplumbersconf.org/event/4/contributions/554/attachments/353/584/Reflections__Kernel_Summit_2019.pdf

Dmitry outlined how the current kernel development processes are failing:

  • Processes are inconsistent between subsystems, and often undocumented
  • Regressions don't consistently get fixed even when they are reported
  • Test coverage is poor and there are several independent automated testing initiatives, that partially overlap
  • Important fixes don't always get backported to the stable branches that need them

It takes a long time for new developers to become productive, or for developers to contribute to unfamiliar subsystems.

(None of this was new to me, but spelling out all these issues definitely had an impact.)

He advocates more consolidation and consistency, so that:

  • Tools can work with and report on proposed/committed changes across the kernel
  • Developers see all test results for a change in one place
  • There is less duplicated work on tools, testing, reporting

There was further discussion of this at the Kernel Maintainer Summit, reported in https://lwn.net/Articles/799134/.