Speeding up AOSP compilation

Lately, I worked on accelerating the compilation speed of an AOSP based OS. The topics I’ll discuss are compilation times, memory consumption, distcc, ccache.

I won’t share any details about my project as this is confidential data. However, everything here is applicable to anyone working in a similar environment.

Project specifics

Let’s start with explaining the size of the project and the required resources.

Disk requirements

Source code size89GB
Compiled code size82GB
ccache code size13GB
Combined185GB

The compiled project has 1 197 527 files and 198 307 directories.

Several compilations per day result in increasing space requirements by several 10s of GBs. Add a typical Linux installation of 10GB-15GB, 10GBs for IntelliJ IDEA index, some free space to protect from file fragmentation. In my experience the lower safe requirement is about 280GB.

Of course I’m using an SSD and the space requirements require at least a 500GB one.

RAM requirements

The Android Jack compiler works with a 4GB Java heap, but it’s faster with a larger one. 6GB seems like the sweet spot. IntelliJ IDEA also seems to work well with a 5GB heap. Strictly speaking a 16GB machine is sufficient, but if you want to compile, use the IDE and browse the Internet at the same time without using any swap, 32GB RAM is the sweet spot.

CPU requirements

The faster the better. I have experience with several CPUs for this project:

Intel Core i7-6700 CPU @ 3.4GHz (Turbo 4GHz), 4C/8T, 8M cache

Intel Core i7-4770 CPU @ 3.4GHz (Turbo 3.9GHz), 4C/8T, 8M cache

Intel Core i9-9900K CPU @ 3.6GHz (Turbo 5.0GHz), 8C/16T, 16M cache

Older Xeon 6C/12T whose parameters I won’t share

Linux distro and Docker

The project is built in a docker container using a base image featuring a particular Ubuntu version. It receives the project source as a mounted volume and executes a custom build script written by me taking many things into consideration. This is required as the developer machines and the automated build nodes may use different distributions and a stable and predictable build environment is required.

Starting point

When I started to work on this project, it took 3 hours to compile on any of the automated CI/CD build nodes with HDD drives. Compiling locally on a relatively recent CPU with an SSD took probably 2.5 hours for a clean compilation and something like 15 minutes for a simple change.

As you can see the development process was inefficient as even a small bugfix required a half working day to produce an image for QA to test and validate.

Analysis

I came to the conclusion that:

  1. /tmp/aosp folder mapped inside Docker should use tmpfs
  2. The build wasn’t started with the optimal number of threads
  3. The Jack compiler had a low amount of heap on lower memory machines, resulting in lower performance.
  4. distcc wasn’t used
  5. ccache wasn’t used

Let’s get to work.

Improving compilation speed

tmpfs

Mounting /tmp with tmpfs seems like a pretty good idea on the host as well. Later the Docker image will receive its own folder /tmp/aosp on top of tmps.

So, add this line in /etc/fstab and reboot:

tmpfs /tmp tmpfs rw,nosuid,nodev

Multithreaded compilation parameters

In the custom build script, I saw make -j6 which is far from optimal. As the Bash script is executed on a variety of machines, the best way is to auto-detect the number of threads the CPU is capable of, increment the number by one and set the result. So if you have a 4C/8T CPU, the optimal number of compilation threads would be 9:

LOGICAL_CPUS=`grep -c ^processor /proc/cpuinfo`
COMPILATION_THREADS=$((LOGICAL_CPUS+1))
make -j${COMPILATION_THREADS}

Jack compiler heap

The AOSP version I’m working on uses the Google custom-developed Jack compiler. Basically it starts a server that accepts requests for compilation. It keeps running in the background in warmed up JVM and it no requests arrive for some time, it quits.

What’s funny is the way JVM detects the maximum heap a Java app can use: by inspecting the physical memory of the system and using 25% of this value. When I decided to run Linux in a VM with “only” 8GB of RAM, the maximum heap was even less than 2GB and the compilation failed with the Jack compiler running out of memory. I detected this problem in my other post AOSP development from Windows.

For some time I forced Jack’s heap to 4GB until I realized, again by accident, that it compiles successfully, but increasing this number to 8GB results in a faster compilation! Wow, what a mess. Fortunately in a 32GB RAM machine the default heap is around 8GB, so this isn’t an issue.

I should add an option in ~/.jack-settings to override the -Xmx JVM value. As this file doesn’t exist in the Docker image, I have to “install” Jack to create it and populate it with some important variables and then append the override. Failing to do that and just creating ~/.jack-settings prevents Jack from working correctly.

prebuilts/sdk/tools/jack-admin install-server prebuilts/sdk/tools/jack-launcher.jar prebuilts/sdk/tools/jack-server-4.11.ALPHA.jar 2>&1 > /dev/null
echo "JACK_SERVER_VM_ARGUMENTS=\"-Dfile.encoding=UTF-8 -XX:+TieredCompilation -Xmx8G\"" >> ~/.jack-settings

distcc

The AOSP source code is mostly C/C++ and Java. To speed up the initial C/C++ code I configured distcc. This is a compilation server/client framework which allows remote computers to participate in local compilation jobs. The remote machines should have the same toolchain installed in the same location.

I figured out that the VMs used as CI/CD build nodes are the perfect candidates for distcc servers. As AOSP uses prebuilt gcc/clang compilators (in /prebuilts directory), it was even better as I only needed to symlink this folder inside the Docker image I built for each distcc node.

distcc server configuration

I created a new project with the following Dockerfile configuration:

FROM ubuntu:eoan
RUN apt-get update -y && apt-get upgrade -y

ENV DEBIAN_FRONTEND noninteractive

RUN DEBIAN_FRONTEND='noninteractive' apt-get install -o Dpkg::Options::='--force-confdef' -o Dpkg::Options::='--force-confold' -y bison build-essential curl flex git gnupg gperf liblz4-tool libncurses5-dev libsdl1.2-dev libwxgtk3.0-dev libxml2 libxml2-utils lzop maven openjdk-8-jdk pngcrush schedtool squashfs-tools xsltproc zip zlib1g-dev bc g++-multilib gcc-multilib lib32ncurses5-dev lib32readline6-dev lib32z1-dev python-yaml python-lxml git-core libc6-dev-i386 lib32ncurses5-dev x11proto-core-dev libx11-dev lib32z-dev libgl1-mesa-dev unzip ccache distcc libncurses5

ENV USER root

USER root

WORKDIR /

ENTRYPOINT [\
  "distccd", \
  "--daemon", \
  "--no-detach", \
  "--port", "3632", \
  "--stats", \
  "--stats-port", "3633", \
  "--log-stderr", \
  "--allow", "0.0.0.0/0", \
  "--listen", "0.0.0.0", \
  "--enable-tcp-insecure" \
]

I used a newer Ubuntu distribution as a base image as it has the latest distcc server, compatible with a distcc client from the latest Ubuntu as well. This is needed for one of my use cases, described later.

The daemon runs as root which isn’t optimal and you should definitely fix that. You should create a user, restrict allowed IPs and disable --enable-tcp-insecure depending on your use case and security requirements. I’m giving a configuration that “just works” without the complicated security measures afterward.

Next is the docker-compose.yaml:

version: '3'
services:
  distcc:
    container_name: distccnode
    image: distccnode:latest
    ports:
      - '3632:3632'
      - '3633:3633'
    restart: always
    volumes:
      - /tmp/distcc:/tmp
      - /root/path-to-actual-aosp-project/prebuilts:/tmp/prebuilts

It was very important to map the AOSP prebuilts directory to /tmp/prebuilts as that’s the expected path, apparently.

Now the only think you need to do on the distcc server node is to run:

docker-compose up -d

This will start the container with distccd inside it in the background and because of restart: always the container will always start even after the system reboots.

distcc client configuration

The client configuration is much simpler. To fully understand it, you should read the next section about ccache. Distcc can be used without ccache, but since I’m using them together, I’ll explain this scenario.

sudo apt update && apt install -y distcc
export CCACHE_PREFIX=distcc
export DISTCC_HOSTS="--randomize 192.168.1.2,lzo 192.168.1.3,lzo 192.168.1.4,lzo"

Of course substitute the IP addresses of the hosts with the ones of the server you configured in the previous part.

In my case this is part of a script executed inside the Docker image as those two variables should be exported inside before launching the compilation.

ccache

With configured distcc each time the source code is compiled, the files will be compiled on remote hosts as well as the local one. This will accelerate a lot the compilation, but we can do even better.

The tool ccache allows caching of each compiled file locally so it doesn’t have to be recompiled at all. With distcc and cache we get the best of both worlds. Each time a new workstation needs to compile the source code the first time, the distcc nodes will help. This will populate the local ccache and in the future the distcc nodes won’t be needed at all.

Configuring it is easy with one big caveat. It turns out that the build-in ccache support in AOSP relies on a precompiled ccache which uses different configuration file from the actual one installed in the system (which we don’t need at all). Let’s configure it:

export USE_CCACHE=1
export CCACHE_DIR=/ccache
export PATH=/usr/lib/ccache/bin/:$PATH
# Android's ccache, the only one needed actually
prebuilts/misc/linux-x86/ccache/ccache -M20G 

I’m using a 20GB maximum cache size. Please note that enabling ccache + distcc requires just exporting one variable, as shown in the previous section:

export CCACHE_PREFIX=distcc

Again this is executed from the Docker image, expecting mapped /ccache directory.

Configuring Docker and the build script

I’m not sharing the full Dockerfile, the Python script outside the container or the build script inside the container as they are quite complex and my employer’s property, but the Docker command is the following, executed from the AOSP source and assuming that ccache directory is located at ~/ccache:

time docker run --rm -it -w /src --volume $PWD:/src --volume $HOME/ccache/:/ccache --volume /tmp/aosp:/tmp --name build aospimage ionice -c 3 nice -n19 ./build/tools/build.sh

I’m using time to benchmark the time it requires to execute the build, ionice -c 3 to lower the I/O priority so it doesn’t interfere with my system as well as nice -n19 to lower the CPU priority for the same reasons.

The custom build script build/tools/build.sh is huge, accepts many options and autodetects a lot of stuff, but a simplified looks like this:

CCACHE_SIZE_GB=20
export USE_CCACHE=1
export CCACHE_DIR=/ccache
export PATH=/usr/lib/ccache/bin/:$PATH
prebuilts/misc/linux-x86/ccache/ccache -M${CCACHE_SIZE_GB}G > /dev/null 2>&1
export DISTCC_HOSTS="--randomize 192.168.1.2,lzo 192.168.1.3,lzo 192.168.1.4,lzo"
export CCACHE_PREFIX=distcc
# Install Jack in order to create ~/.jack-settings
prebuilts/sdk/tools/jack-admin install-server prebuilts/sdk/tools/jack-launcher.jar prebuilts/sdk/tools/jack-server-4.11.ALPHA.jar 2>&1 > /dev/null
echo "JACK_SERVER_VM_ARGUMENTS=\"-Dfile.encoding=UTF-8 -XX:+TieredCompilation -Xmx8G\"" >> ~/.jack-settings
LOGICAL_CPUS=`grep -c ^processor /proc/cpuinfo`
COMPILATION_THREADS=$((LOGICAL_CPUS+1))
source build/envsetup.sh
lunch my-device
make -j${COMPILATION_THREADS}

The result

Well, the result on two different machines with i7-6700 and i7-4770 CPUs and SSDs with 2-3 minutes difference and starting from an empty out directory while keeping several hidden directories inside the folder for caching purposes went down from 3 hours to roughly 44 minutes and 43 seconds = 400% speedup!

Bonus

If you want even more speed, I have even more bonus tips:

Disable kernel security mitigations

Do not do this on your personal machine from which you are using your online banking or anything important for you! This is applicable only to build farm nodes or a Linux in a VM used only for building the source.

It turns out that disabling Linux kernel security vulnerabilities discovered lately (and the list is growing) results in a huge speed up!

Add mitigations=off to the relevant line in /etc/default/grub:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash mitigations=off"

This results in a speed up from 44 minutes and 43 seconds to 37 minutes and 20 seconds on an Intel Core i7-6700 CPU = 17% speedup.

Use XFS instead of ext4

Although I tested XFS as part of my experiments inside a VM, the speedup was a healthy 15%, which would have brought the compilation time theoretically (not tested directly) from 37 minutes and 20 seconds down to 31 minutes and 44 seconds.

Ditch Docker

Again, as part of my VM tests for unknown reasons building without Docker results in another 9% speedup. I still can’t explain it, but it is a fact.

Use faster CPU and storage

Everything until now was running on a SATA SSD and a Intel Core i7-6700 CPU. Well, let’s upgrade those! NVMe storage + Intel Core i9-9900K CPU, still using Docker and ext4 accomplishes this task in 19 minutes using Docker, ext4 and without disabling security mitigations. Well, for a project of this size that’s fast!

Final thoughts

Including the bonus, but still using Docker on the original hardware the final time for compilation is 37 minutes and 20 seconds. This is a speedup of 5.67x, which is… well… awesome. Of course, we switched to a faster machine and now the build takes 19 minutes. While technically not part of the software optimizations I performed, the final result is 3 hours down to 19 minutes or 9.47x faster.

I guess I have no more excuses like this:

Leave a Reply

Your email address will not be published. Required fields are marked *