Looking back on my time at Solarflare, part 1
Friday was my last day at Solarflare Communications, which I joined in 2006 (originally as Level 5 Networks).
When I started, Level 5 Networks (L5N) had a 2-port gigabit Ethernet controller called EF1 and a user-level network stack called EtherFabric which ran on Linux. It had nearly completed the Falcon project to develop a 10-gigabit controller (also supporing a 2-port gigabit mode) and was porting EtherFabric to Windows. Later that year, however, L5N merged with Solarflare Communications (SF), which was developing a 10GBASE-T PHY (for 10-gigabit Ethernet over twisted-pair cables). The investors and management saw the opportunity to create an integrated 10GBASE-T LAN-on-motherboard chip that would be attractive to OEMs and so had the potential for mass sales. EtherFabric was put on the back-burner, but regular net drivers for Linux, Windows and other operating systems became more important.
In a change from my previous jobs, my initial role at L5N was to maintain and extend the in-house test automation software, Runbench, written in Python. I worked with Dickon Reed who had written most of Runbench up to then. One of my first major tasks was to extend it to cover DUTs running Windows. Aside from changing Runbench itself, I wrote a simple SSH server to be installed on DUTs, using the Twisted.Python framework. (Cygwin's port of OpenSSH wasn't suitable for running a lot of native Windows commands.) I rewrote the test-installation script for the Windows driver itself, converting it to Python (with a C extension to get around the WOW64 nonsense). As Windows does not allow loading a driver just once (like insmod in Linux), except by using the kernel debugger, I added a time check in the driver so the test-installation script can make it fail early when reloaded after a reboot, so that a crasher bug in the probe path would usually be recoverable.
In 2007 I persuaded management in Cambridge to loan several older servers to DebConf. I took them there and back by car, and they were used to convert video for live streams and recordings. (This was repeated in 2009.)
I was also asked to work on other small development tasks, including some cleanup on the Linux net driver (sfc). In mid-2007, when SF decided it was time to get sfc upstream (in-tree), I joined Steve Hodgson and Robert Stonehouse in working on that.
Out-of-tree (separately distributed) drivers for Linux normally need a large number of preprocessor conditions for compatibility with Linux's kernel module API across a range of supported versions. They may also implement some features with a driver-specific interface that should be replaced with a standard interface (possibly including the work to define that standard). They may not comply with the kernel coding style or unwritten conventions for kernel code. All of these problems were present and had to be dealt with. As part of this work, I extended unifdef so that it could be used to remove all of the backward-compatibility and not-for-upstream code without introducing extra blank lines. I could then export the out-of-tree driver from CVS(!) into a kernel source tree automatically and turn it into a patch or patch series that would stand a chance of being acceptable upstream.
Our first few attempts got little response, but eventually we newbies got the message that the driver was simply too big for a single submission and that putting patches on a web site because they're too big for the list was not a solution. In the next week, I removed about half of the code (on a git branch), resulting in a relatively lean driver that could be sent to the netdev list. And finally the 9th (I think) submission was accepted.
Following this, I continued to work on sfc but was also brought into the Siena project. Siena (SFL9021) was the LAN-on-motherboard chip that had been planned following the merger. It combined a dual-port controller based on Falcon with a 10GBASE-T PHY and a management controller (MC) to support Lights-Out Management (LOM). (Since 10GBASE-T has not taken off, it is now mostly sold as the SFC9020 variant in which the PHY is disabled.) I worked on some of the test framework and test cases for validating the controller design in software simulation and FPGA. In the course of this I learned to read and (somewhat) understand the chip design written in Verilog, but only made a single trivial fix to it.
In August 2009, the first Siena ASICs came back from the fab. There were a few ASIC-specific bugs but all of them were quickly worked around in firmware, so it would soon be ready for production. But there was much work still to be done on the driver and firmware. The firmware had until then been running in a smaller block of RAM with no peripherals to manage, while sfc had earlier been modified just enough to configure a sim or FPGA for running a userland test application. sfc now had many special cases scattered around it, and would tell the MC firmware to peek and poke registers that were no longer directly accessible from the host.
The software and firmware developers had agreed that, in order to support LOM, the MC would be responsible for managing the ports and all peripherals and drivers would send higher-level requests to the MC. As a bonus, this removed the need for the driver to know the details of each new board, so that it would not be necessary to backport sfc into distribution kernels very often. Over the next few months, the firmware team did an excellent job of defining and implementing those operations, including writing the driver-side functions to invoke them. Meanwhile, Steve and I concentrated on refactoring sfc so most of the Falcon/Siena differences could be abstracted through a few structures with function pointers.
The refactoring and new code for Siena were completed and submitted upstream in several large patch series in October and November. In fact, there were so many patches that Solarflare appeared in LWN's table of who wrote Linux 2.6.33 (as did I, thanks also to another patch series adding firmware metadata to many drivers). Sadly we had missed Linux 2.6.32 which was a longterm stable branch and used in many distributions, but I was able to get this version backported into all the major distributions over the next year.
(To be continued.)