If anyone knows Linux kernel driver development, it’s Greg Kroah-Hartman, who’s been working deep in Linux for over a decade. In this interview, Greg talks about how the Linux project has accommodated the accelerating rate of change for the kernel, and offers some insight on where Linux is headed.
- How development on the Linux kernel has changed in the past several years
- Drawing in the larger community with the Linux Driver Project
- The breadth of uses for and contributors to the kernel
- The future trajectory of Linux, including functional balance with hardware
- Overcoming the challenges of maintaining universal hardware device support
- The range of change and growth in the kernel
Scott Swigart: Can you please introduce yourself and your association with Linux?
Greg Kroah-Hartman: Sure. I’m a Novell fellow, and I’m working on the Linux kernel with the SUSE Labs division at Novell.
I’ve been doing Linux kernel work for more than 10 years now. I’m a maintainer of several different subsystems, including USB, the driver core, and some other minor things. I’m also responsible for releasing the stable Linux kernels. I used to be a maintainer of a lot of other driver subsystems and such over the years, but I’ve thankfully handed them off to other people.
Scott: Let me start with a really big-picture question, since you’ve been involved in Linux for a long time. Obviously, there’s still lots of stuff happening on the mailing list, but fundamentally, the way code makes it into the kernel hasn’t really changed, and yet the scale of development is much larger than it was, say, five years ago.
How have you seen Linux kernel development change to handle the scale of the effort that’s going on today?
Greg: I’ve been tracking this for over five years now, and we’ve gotten a lot better. There are a lot more people doing the work.
To give an example, for the 2.5 to 2.6 kernel development series, which took about two years, the top 30 people did 80 percent of the work. Now, the top 30 people do 30 percent of the work. The sheer number of developers has also increased. We were running a couple hundred developers, and now we’re running a couple thousand.
Scott: How does that get managed? Are there more layers in the hierarchy?
Greg: We’ve invented some tools to help us out a lot. It used to be that Linus was the only one who could commit anything, or who would accept anything. It was all done through email.
And then BitKeeper was developed, and a number of us started using that. It was very nice, because we had a much faster feedback loop with Linus. I could have him pull 50 patches, see that he pulled them, and then I could hit him with another 50 patches, if I wanted to, within a day. Before, we had a lag of a couple of weeks.
Then, when we couldn’t use BitKeeper anymore, Linus wrote Git, and then that just increased it even more, because everybody can use Git. Our development model increased its capacity, and part of that was that we started trusting more people. Now we have subsystem maintainers and such.
As I said, I maintain the subsystems such as USB, and I have people who I trust enough that if they send me a patch, I’ll take it, no questions asked. Because the most important thing is I know that they will still be around in case there’s a problem with it. [laughs]
And then I send stuff off to Linus. So, Linus trusts 10 to 15 people, and I trust 10 to 15 people. And I’m one of the subsystem maintainers. So, it’s a big, giant web of trust helping this go on.
We’ve developed some procedures that help guide how and when we do merges and releases. And we’ve a very regular release schedule: every three months, we do a release. We have a two-week merge window, and all those patches in that merge window have to have been tested. We’ve gotten the development process down really well over the past four years.
We’re also increasing the rate of change in our development. The same amount of work one of the top 10 developers did last year wouldn’t have even made it into the top 20 this year. Our individual developers have got the work flow down, so we can actually contribute more, to an extent that’s amazing.
Scott: I understand that one of the things you’re involved with is the Linux Driver Project. It seems like there was a certain tension between Linux and certain hardware vendors, in the sense that, for a long time, hardware vendors had been really used to not shipping driver code.
They were used to shipping for Windows, and they didn’t have to be involved with drivers. They just provided an installer. And then, as Linux started to become a lot more popular in the server space and make some inroads in the desktop space, they wanted access to that market.
There was kind of a culture shift in a lot of companies to realize how to provide things like drivers in a way that matched with what the Linux community expected. Take the example of some hypothetical Acme hardware vendor that wants to submit a driver but has never done it before. What does the process look like?
Greg: First, people need to realize that our driver model is different than other operating systems, because all our drivers ship with the kernel. The license requires our drivers to be open, so everything is in the main kernel tree.
Because of our huge rate of change, they pretty much have to be in the kernel tree. Otherwise, keeping a driver outside the kernel is technically a very difficult thing to do, because our internal kernel APIs change very, very rapidly.
In other operating systems, APIs change more slowly, since their development process is comparatively very slow. So, it makes sense, from a technical standpoint and just to save money, to get your code into the kernel. It is maintained that way, and when changes happen within the kernel, your driver code will be fixed for you. If I have to change an API, I change it everywhere.
That approach also lets us see commonality. If see that a driver’s doing the same thing as another driver, I might merge them together in the shared core. That way, everybody benefits. The new driver gets smaller and easier to maintain.
That’s happened a lot. We’ve merged a lot of drivers from a lot of different companies, and everybody’s happy about that. The companies are happy, the users are happy with the smaller code base, and it’s easier to maintain.
The Linux Driver Project started out because of the perception that Linux doesn’t support many devices. It turns out that Linux supports all devices out there. There’s really nothing manufactured today that Linux doesn’t support, in a major consumer market. There are some one-offs, like some small video-capture devices I know of that we don’t have support for, but people are actually working on those.
The initial goal of the Linux Driver Project was to remove all barriers that could possibly be there, which were mostly managerial. To attain that goal, we said that we will write and maintain any driver for free for any company.
It turned out that not very many companies really needed that. A few niche markets needed it, and we’re writing some drivers for them, but most companies had existing drivers floating around inside their company. Their challenge was typically that they needed to get it into the kernel tree, and they didn’t know how.
So, the Linux Driver Project, over the past couple years, has just been a big educational thing. I work with the companies and show them how to get the code in. I maintain the code, massage it and get it cleaned up, and then merge it into the kernel tree. That’s been the majority of the work we’ve been doing over the past two years.
We’ve also now codified a few ways that we can play in the kernel. We can accept code into the kernel that doesn’t meet our normal standards into a staging area. Everybody can clean it up there, and then the code graduates into the real stuff. That’s worked out really well.
Scott: That makes good sense, because someone coming in from the outside isn’t necessarily going to know everything from coding standards to the best practices for doing things the right way.
And so what you’re saying is that you’ve got a place to get that code in, where people who know the right way to do things can look for things that can be factored out of drivers, drivers that can be merged, and that kind of stuff.
It doesn’t force every hardware vendor or everybody who needs to get a driver into the kernel to be a Linux driver development expert. Do I have that right?
Greg: Yeah, although we do have lots of documentation now. We have free books on how to write Linux drivers, and we have documentation of things like our coding style and how to submit code. If people don’t know where it is, we’ll point them in the right direction. In case anyone’s curious, it all starts in a file called HOWTO in the kernel, so they can start there.
Even though we do have those education tools, though, we will also work with companies that want it. We have lots of people who have done this work before and who will do it for you.
It’s a great place for people wanting to get involved in the kernel to start out, because they can just run a script to find some errors and go fix them. We have people that have started off just wanting to get involved who have ended up maintaining whole drivers and subsystems over time.
Still, in the end, my goal is actually to work with engineers from those companies and have them maintain and own it, so they become full-fledged members of the kernel community. That’s the only way we’re going to grow, and it’s working. The number of companies involved in the kernel has grown year over year.
Companies want to get the most value out of Linux, so I counsel them that they should drive the development of their driver and of Linux as a whole in the direction that they think makes the most sense. If they rely on Linux and feel that Linux is going to be part of their business, I think they should become involved so they can help change it for the better.
Scott: That makes sense. In a different area, I saw somewhere that around half of the Linux code that’s being contributed nowadays is driver code. Do I have that right?
Greg: That’s the percentage. Like I said, everything’s in the kernel itself. We’re at something like six or seven million lines of code, and over 50 percent of those lines of code are drivers.
I think 30 percent is architecture-specific stuff, for things like processors and networking. The core kernel is like five percent of the overall code. Those numbers have stayed pretty much the same over the past four or five years.
We change something like 5,000 lines a day, which is just scary. Fifty percent of that change will be in the drivers, and five percent will be in the core kernel. In other words, the kernel is being modified everywhere at that rate of change.
Scott: Do I understand correctly that the core kernel is memory management, process scheduling, and those kinds of fundamental things that an OS has to do?
Greg: Exactly. Basic system calls, memory management, scheduling, and inter-process communications.
And the only reason we’re changing is because people want that change. It’s not like we’re sitting there and going, “Hey! Let’s rewrite the scheduler again!”
We’re doing this stuff because we have to in order to survive, because people want it and because people need it. We do it for fun, but we don’t gratuitously change things. A big part of what drives that change is that what Linux is being used for is evolving. We’re the only operating system in something like 85 percent of the world’s top 500 supercomputers today, and we’re also in the number-one-selling phone for the past year, by quantity.
It’s the same exact kernel, and the same exact code base, which is pretty amazing. Nobody’s ever created something like this before. And we’re doing it in a way that doesn’t follow the traditional software design methodologies, which is fun.
I go and talk at a lot of colleges, and they’re changing their education system based on how the kernel has changed development models for large-scale systems.
Scott: Like you mentioned, the kernel runs on some of the smallest devices, as well as some of the largest. What makes that work is a lot of stuff that is or isn’t included, whether you’ve got fairly bare-metal, embedded systems, or whether you’ve got desktops and music players, or whether you’ve got systems that are optimized for number crunching.
The kernel is an interesting thing. When you talk to people who aren’t really in the Linux space, they often don’t make a big distinction between the Linux kernel and a Linux distro. There’s a huge distinction though, right? The kernel is a fairly small piece of a distro.
Greg: Sure, but consider Android, which threw away everything from Linux except the kernel, and they built something totally new on top of it. That’s a great proof point that the Linux kernel itself has to be really, really flexible to let people do something like that. It still meets the needs of a very big market, which is pretty funny to watch. From an engineering standpoint it was a pretty neat hack.
Scott: Give us some sense of the kinds of people who are contributing to the Linux kernel. Obviously, there are hardware vendors. The distros out there are contributing–Red Hat and Novell, for instance, are big contributors. Give me a sense of the categories of contributors working on the kernel.
Greg: You’ve got a pretty good short list there. Some distros contribute; some do not, and the same is true of hardware companies. Intel has come on like crazy this past couple of years, and they’re now one of the major contributors to Linux. Consulting groups and educational institutions play a big role as well.
There are also a lot of companies that just modify code for one-off little things, making support better in their hardware. Linux runs on yahts for a lot of automatic pilot steering things, and we’ve had some contributions from there.
More than five or ten percent of the US’s power is generated on turbines that are controlled by systems based on Linux, and those guys contribute some changes they need. It’s also interesting that 20-25 percent of all of our contributions by quantity are done by people that have no affiliation with anything.
Scott: The proverbial guy in his basement or dorm room or whatever.
Greg: Yeah. They needed something fixed, so they did it and moved on. The quantity there is still large, but on the other hand, 75 to 80 percent is done by people who are getting paid to do it.
Scott: What do you feel like is driving change? You mentioned Intel, and Linux seems kind of tailor made for a chip company, because it provides a great opportunity to put new things in a processor or a chip set. Contributing to Linux lets them build support for those kinds of things and to see how they are behaving in a real production operating system.
I can kind of see that new hardware architectures and capabilities drive a certain amount of change in the Linux kernel. I can just see the creation of new peripherals driving a certain amount of change. What other types of change drive the Linux kernel forward?
Greg: Well, those two are major. New hardware accounts for the majority of our changes. We get more processors with more and more cores. We had to do a lot of work to make, 8-, 16-, and 32-way machines run really well, and now we’re running 8000 processor machines really well. The scalability is there, and I actually know of people who have booted larger ones, but we’re not allowed to talk about it.
USB 3.0 was sponsored by Intel, and it was shown on Linux first–the first implementation of any operating system. Intel gets their hardware working well fast, and working with the Linux community lets them get it out to developers and other people who are making devices fast. The fact that they can do it for Linux much faster than they can for other operating systems is a huge driving factor for what we need to do for things like 10 Gigabit Ethernet or 40 Gigabit Ethernet.
Some of those speeds have required major changes to the way we do networking and the way we handle processes or data flowing through the kernel. We also have to accommodate workload issues, like real-time workloads, which required big changes to the scheduler and the way we do locking and the way we do other things merged in.
Note that those are all requirements driven by people who are using Linux. The real-time guys want to use operating systems like Linux for their stuff. It also drives Wall Street and the Chicago Mercantile Exchange and the Tokyo Exchange and the London Stock Market.
Scott: What do you see on the horizon, in terms of hardware use cases that aren’t quite mainstream or available yet but that are going to drive some additional kind of significant change all the way down to the level of the kernel?
Greg: 40 Gigabit Ethernet is going to need some changes, and the networking guys are actually working on that already. There are also going to be more and more cores, better real time operation, and USB 3.0. The USB 3.0 devices aren’t out yet, so once we start seeing more of those, we’ll start seeing optimizing things there. SSDs are great; they’re growing fast.
People seem to forget that changes they want to the kernel that it seems like no one else will want very often end up being valuable to a lot of people. For example, maybe the embedded group makes some power management changes, and it turns out that the server guys really want that, too. Then the server guys want better scalability, and it turns out that the embedded guys love those scalability changes, because they’re running dual-core ARM systems now.
Part of what’s helped us make things work is that it turns out that a lot of changes benefit everybody. We’ve become such a flexible operating system by requiring changes to work for everything, and it’s made everything work even better.
There are lots of power management changes in the pipeline to be reworked that help address a lot of the issues the embedded guys have come up with that servers traditionally don’t have. In servers, you can’t shut off that many discrete components, whereas in embedded, you can pretty much shut off every single discrete component in your system, depending how much you want to do with it.
Scott: I remember when we talked to the OLPC guys, one of their requirements was to be able to go to sleep, but not lose the network connection.
Greg: Mesh networking, right?
Scott: Yes. And they also needed super-aggressive power management, because they wanted to get the most battery life possible, because power is such a big issue in developing countries. If I remember correctly, they were actually putting the processor in a sleep state between keystrokes.
Greg: That’s right; they were, but it turns out that the hardware guys can do it better. If you look at more modern processors, the hardware is putting itself to sleep faster. In modern PCs and these new netbooks, the OS really doesn’t have control of the power management stuff anymore, because the hardware can do it better.
It turns out it’s better to run as fast as possible to get your work done, then go to sleep, than to run your CPU at medium speed. The hardware guys realize this, and they know they can handle their workloads better than the operating system can.
Scott: I don’t want to mischaracterize it, and I am intentionally overstating this to some extent, but when you talk to people at Intel, they always say, “Well, that should be in silicon.” It’s kind of like there’s this drive to put more and more and more in silicon, but at the same time, obviously software is growing at a geometric rate.
Talk a little bit about that balance between what ought to be in hardware and what ought to be in software, as well as the notion that something that ought to be in one place today should maybe be in the other a year or two from now.
Greg: The hardware guys are always like, “Oh, I can solve that in hardware. Let me do that. You software guys aren’t ever going to get around to doing this.” And part of that is true.
If you look at the software adoption rates of some operating systems, it’s very, very slow. So, the fact that they make a change to the hardware, and they can save power today in the new processor, means that overall, the amount of power consumption in the world goes down.
People run old versions of Linux and other operating systems very frequently, so there’s sometimes an advantage to solving a problem in both places. They can add it to the hardware, and then they also add it to the software.
In the kernel, we know we typically can’t change user space applications, so we have to make changes in order to accommodate how they want to do things, even if we feel that they’ve done it the wrong way. So, there’s also a tension there.
Some companies actually talk to the engineers really, really well. Intel and IBM are very good at this. They sit down, and they talk to the kernel people once a year. “Hey, this is what we’re thinking of doing. How should we do this? How are we going to implement this?” And they get feedback from us directly. And that’s turned out to be a very, very valuable feedback loop.
It often works out very well in terms of how chips are designed, and how the operating system will take advantage of that. Hopefully, we’re all talking to all the right people, and everybody’s interacting well.
Scott: Let me spin back to something you said very early on, which is that basically that all hardware is supported in the Linux kernel, even though there seems to be the perception that not all the hardware works.
Certainly, one of the things that gets brought up is that if I buy a new device, it’ll come with a driver disk that probably won’t have Linux drivers on it. Talk a little bit about where do you feel like this disconnect comes from between the people working on the drivers in the Linux kernel and those incorrect user perceptions, even from people who are fairly experienced with Linux.
Greg: That perception has existed for many, many years. Back in 2006, I gave a big keynote showing how it was all wrong. Everything is supported, but, yes, that complaint comes up a lot. I joke with Jim Zemlin of the Linux Foundation about this issue all the time. If he publicly says, “We’re still working on drivers,” I always say, “No, we’re done.”
So, there’s a disconnect there. Part of the issue is that we support more devices than any other operating system ever has. That’s a fact that’s been verified by other companies. The problem is that people only care about the devices that they have. Therefore, if your device doesn’t work for some reason, you don’t care how many thousands of other devices out there work.
When I started the Linux driver project, I’d been hearing this from all the major companies that were shipping Linux and cared about Linux. So, I went around to them individually and said, “OK, what do you need me to do? What needs to be worked on?”
Every single major hardware company, said, “Hey, you’re right. Everything works on Linux. We’re fine.” That’s shown by these companies that ship Linux on their machines. Dell, HP, and all the big hardware companies and laptop manufacturers now ship Linux, and we’re working with them.
Some of the disconnect is if you buy a machine that doesn’t have Linux on it, getting it to work sometimes requires weird BIOS tricks to get all the function keys working on a laptop, and things like that. New hardware or hardware when you’re updating a system should work. If not, then we messed up.
The graphics guys have issues that they’ve been addressing to help make that work much better. Intel hardware works wonderfully now on Linux. Their developers are very good. ATI and NVIDIA both ship closed source drivers that work on all their devices. Those all work and have very good support systems. The drivers aren’t in the kernel, but they’re out there.
Anything storage related or networking related has been working with Linux forever, and all new devices support Linux too. They have to, because that’s what those markets are going for.
Then you get to the tiny consumer devices, which is where I like having fun, because I do USB. I work really hard to get everything supported, and we don’t know of anything these days that isn’t. I was in Tokyo the other day for the Kernel Summit and walking around Akihabara and trying to find devices that we don’t support. We had all the kernel developers there and we couldn’t find anything.
I know there’s a disconnect, but if you look on a mailing list for the Windows guys, they have a lot of driver problems, where things aren’t working. Mac OS X does not support very many devices at all.
If issues arise with exotic configurations, I’ll be glad to work through them all. If something doesn’t work, tell me so I know to fix it.
Scott: Is my perception correct that when hardware companies get into Linux, they sometimes start out with a bare-bones driver? It seems that often, the piece of hardware will work, but it won’t necessarily handle the full spectrum of services like power management with the device. Then they’ll provide more capabilities, gradually contributing code to use the device to its full capabilities.
Am I at all accurate in that perception, or is it more often the case that the driver they contribute offers the same level of functionality as on any other OS?
Greg: You’re right. Some things don’t work, although we’ve made it easier for companies to write a driver that will do everything, because we put core functionality like power management into the core. Power management is now handled by the core of the kernel, so you don’t have to add that. You just have to add a few hooks in your driver and then you’re done.
The average driver for Linux is about one third the size of an equivalent driver for another operating system, so you have less code to write and maintain. Still, to your point, there are some new devices that come out whose value add is that they do X and Y differently.
Sometimes we don’t support that yet, because the manufacturer didn’t tell us how to do it or we have to figure out how to fit it into the Linux framework to support it properly. Those little webcams are a very good example. New devices come out that can support features like additional bit rates, frame sizes, or screen sizes, and we need to go and add support to the core of the kernel to support those.
So, your statement is very fair, and that contributes to the disgruntlement you’ll see on my face at times.
If a brand new device that says it can do X, Y, and Z is only doing X on Linux, we immediately start to try to figure out how to get it to do the other things. Sometimes, we just have to work with the manufacturers to do that.
A lot of companies are getting better at working with us to come out with drivers at the same time the hardware comes out. Sometimes we lag by six months just by virtue of that fact that Linux isn’t the first system they care about, so it takes us a while to get it going. Also, if you’re using an older distro, you may need to switch versions.
There are all sorts of configuration issues there, which is why Jim Zemlin of the Linux Foundation continues to say we need to work on driver development, and I agree with him there. There’s always going to be work, so I think I have a lifetime job here.
Scott: Correct me if I’m wrong, but drivers never really leave the kernel, right? Is it more or less true that that’s one of the reasons that Linux supports old hardware so well? That once it’s in the kernel, it’s probably going to be in the kernel for a very long time, so I can grab a six-year-old laptop or six-year-old server, throw Linux at it, and have pretty good confidence it’s going to work?
Greg: Yes, and actually that’s turned out to be a problem. Because our kernel size is growing so much, we want to make changes, and we’re running into places where we have drivers for hardware that we know hasn’t been made in the past ten years. It can be complicated to excise those pieces.
That came up at the kernel summit last week, and actually, I am going to be working on that. We’re going to move drivers that we think are broken if we are pretty sure that nobody out there uses them anymore. We’ll move those into the staging tree, keep them there for about a year, and if nobody complains, then we’ll remove them.
On the other hand, if anybody shows up and says they need one of those drivers, we can use our source code tool to restore and test it. If there is at least one user of an old device in the world, I will gladly maintain the support.
Scott: We’ve covered a lot of great stuff, and now I want to turn to a lazy interviewer trick and ask you what really interesting things I haven’t asked about. In other words, I’d like for you to supply both the insightful question and the brilliant answer.
Greg: Well, just to touch back on that rate of change that I mentioned before, I just looked it up, and we add 11,000 lines, remove 5500 lines, and modify 2200 lines every single day.
People ask whether we can you keep that up, and I have to tell you that every single year, I say there’s no way we can go any faster than this. And then we do. We keep growing, and I don’t see that slowing down at all anywhere.
I mean, the giant server guys love us, the embedded guys love us, and there are entire processor families that only run Linux, so they rely on us. The fact that we’re out there everywhere in the world these days is actually pretty scary from an engineering standpoint. And even at that rate of change, we maintain a stable kernel.
It’s something that no one company can keep up with. It would actually be impossible at this point to create an operating system to compete against us. You can’t sustain that rate of change on your own, which makes it interesting to consider what might come after Linux.
For my part, I think the only thing that’s going to come after Linux is Linux itself, because we keep changing so much. I don’t see how companies can really compete with that.
Scott: Like you said, a lot of the rate of change in Linux as a whole is due to changes in the drivers. It’s fairly unique to Linux that it carries the drivers in the kernel, so as the ecosystem grows, you’ve got more and more people contributing code to Linux.
That’s not the case with other operating systems.
Greg: Right. And while that rate of change is consistent across the whole thing, it’s also easier to write drivers. As I said before, your driver for Linux is one third the size of your driver for Windows, so even at this rate of change, writing a driver for Linux is less work than it is for other operating systems.
In Linux, we’ve re-written our USB stack three or four times. Windows has done the same thing, but they had to keep their old USB stack and a lot of their old codes in order to work for those old drivers. So, their maintenance burden goes up over time while ours doesn’t.
Also, as people change jobs, they generally stay with the Linux kernel community. I’ve been doing USB work for over 10 years now for Linux, and that’s a big body of knowledge. Within other companies, engineers usually move around, and that body of knowledge doesn’t necessarily stick around.
For driver maintainers and driver authors within Linux, it does stick around. The networking guys have been doing that work for 15 years with Linux, and they know this stuff in and out. They can tell you what you need to change in your driver to make it work really well.
They’re unique, and they’ll help out any company.
Scott: Thanks; we’re out of time, and that’s a good place to close.
Greg: All right. Thanks a lot; this was fun.