Cross-compile Linux kernel for ARM and run on QEMU

In the process of trying to get Linux to boot on my Surface, I wanted to first get a kernel to boot on QEMU as to reduce as much of the work and investigation on the actual device.

Clearly that was a wise choice, as I obviously have no idea what I am doing. This post is to document it and hopefully save others time that might come here with the same questions I initially had.

This blog post got me the furthest, but it is dated, it seems, due to the device tree file changes that happened in the kernel since then (I talk about it down below).

To begin with, cross-compilation is the idea of compiling a piece of software to a different architecture than the one the compilation takes place. In my case, I am compiling the kernel to ARM, but doing it on a x86 machine. For the rest of the post, I'll referencing to the GNU toolchain/binutils. If you want to use something different, this may not apply.

It seems that on most distributions, binutils (assembler, linker, etc) are built with only support for compiling to the host architecture and you may need to get a tool chain for the desired target architecture. In Arch that was very easy. When you start looking at cross-compilation tool chains you will see that they follow a specific naming convention:

Toolchain naming convention

arch-vendor-(os-)abi

arch is the architecture, (e.g. arm), vendor is the organization that provides the it, os is the platform that the compiled program is expected to run on and abi is the application binary interface, this is, what is the convention for calling functions, which registers are used for what, how syscalls should be made and so on.

EABI vs. GNUABI

In my case, I am using arm-none-eabi. That is, a tool chain for compiling freestanding (some times called bare metal) ARM using EABI. That means that the software being compile is expected to run on an ARM architecture, without any supervising operating system, using the ARM ABI convention. The bare-metal/freestanding part, also interestingly described here means that there is no operating system supervising the application. What does that imply? Well, if you are compiling a piece of code that uses the standard C library, that library cannot expect to call the operating system to do things like allocate a new memory page or other kernel services. And imagine, if you were compiling the same application for different operating systems, that standard library would likely have to behave differently depending on which operating system it will have to interact with.

Now, what you should be thinking is "why bother? The kernel doesn't use libc". While that's correct, there is this piece of relevant detail: this Linaro FAQ talks about that EABI is to be used with bare-metal, whereas GNUEABI with Linux. This is not only about the standard library differences, but it mentions about some actual size differences on certain types, like wchar_t.

Based off that, I am inclined to believe that if you were to compile your kernel using arm-none-eabi, and then your user-space applications with arm-linux-gnueabi you could face some issues with things like the size of a wide char, as mentioned in the Linaro FAQ. So in short, if you have to pick your toolchain for compiling the kernel, go with the gnueabi.

Against my own advice, I'll stick with the none-eabi because, well, that's what I got originally when I compiled GRUB, and because I do want to see what happens when I compile my user-space applications with the gnueabi and run those on a kernel compiled with eabi. Then I can come back here and update this post.

Besides all the above, there is this great presentation about cross-compilation toolchains.

Cross compiling the kernel

git clone https://github.com/torvalds/linux.git kernel  
# git checkout <your_desire_version_tag>

cd kernel

# you can get a list of predefined configs for ARM under arch/arm/configs/
# this configures the kernel compilation parameters
make ARCH=arm versatile_defconfig

# this compiles the kernel, add "-j <number_of_cpus>" to it to use multiple CPUs to reduce build time
make ARCH=arm CROSS_COMPILE=arm-none-eabi-  

Once it is done, you should have the self decompressing gzip image on arch/arm/boot/zImage and arch/arm/boot/Image is the decompressed image.

Running the kernel on QEMU

The naive way
qemu-system-arm -M versatilepb -kernel ./zImage -nographic -append "ignore_loglevel log_buf_len=10M print_fatal_signals=1 LOGLEVEL=8 earlyprintk=vga,keep sched_debug"  

Got me this:

Uncompressing Linux... done, booting the kernel.

Error: unrecognized/unsupported machine ID (r1 = 0x00000183).

Available machine support:

ID (hex)    NAME  
ffffffff    Generic DT based system  
ffffffff    ARM-Versatile (Device Tree Support)

Please check your kernel config and/or bootloader.

That's because the kernel I don't really have support for the SoC I told QEMU to emulate (versatilepb). But how come, if we configure the kernel explicitly for this, before compiling it?

Device tree file

The reasons are described in this presentation by Thomas Petazzoni. In short, in the past they used to have all the SoC board code inside the kernel, which became a maintenance hell (as you can image by searching online the available number of ARM boards out there). Then what the ARM kernel maintainers decided to do was to split the SoC hardware description into device tree files, out of the kernel binary.

The device tree files will be found in the arch/arm/boot/dts folder. The .dtb ones are the binary format that the boot loader is supposed to supply the kernel with. Since we are not using a explicit boot loader with QEMU right now, you can use its -dtb argument to pass the device tree binary file to the kernel.

Running QEMU with a device tree file
qemu-system-arm -M versatilepb -kernel ./zImage -dtb ./dts/versatile-pb.dtb -nographic -append "ignore_loglevel log_buf_len=10M print_fatal_signals=1 LOGLEVEL=8 earlyprintk=vga,keep sched_debug"  

And now you should see a lot of traces, before the expected kernel panic due to missing rootfs.

Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)  
CPU: 0 PID: 1 Comm: swapper Not tainted 4.12.0-rc4+ #3  
Hardware name: ARM-Versatile (Device Tree Support)  
[<c001b8f8>] (unwind_backtrace) from [<c0018b1c>] (show_stack+0x10/0x14)
[<c0018b1c>] (show_stack) from [<c00757e0>] (panic+0xb8/0x230)
[<c00757e0>] (panic) from [<c043715c>] (mount_block_root+0x1a8/0x294)
[<c043715c>] (mount_block_root) from [<c0437438>] (mount_root+0xf4/0x120)
[<c0437438>] (mount_root) from [<c04375bc>] (prepare_namespace+0x158/0x1ac)
[<c04375bc>] (prepare_namespace) from [<c0436da4>] (kernel_init_freeable+0x17c/0x1c4)
[<c0436da4>] (kernel_init_freeable) from [<c035a880>] (kernel_init+0x8/0xf0)
[<c035a880>] (kernel_init) from [<c0015290>] (ret_from_fork+0x14/0x24)
---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
random: fast init done  

In the next post, I'll go over the roofts creation.

Note on vexpress-a9

Since vexpress-a9 uses Cortex-A9, which is the same processor as Surface's Tegra 3 SoC, it seemed a step closer to run a kernel on such emulated board. The differences to get that done were:

  • Configure the kernel for vexpress_defconfig
  • Use vexpress-v2p-ca9.dtb device tree
  • Change the -M argument to vexpress-a9
  • Append console=ttyAMA0 to kernel parameters (or remove the -nographic QEMU argument)

The last item made me curious. The best I could come across was this forum post.