zerforschen.plus/content/posts/rust-tiny-binaries.md
2025-04-09 08:30:05 +02:00

9.4 KiB

+++ date = '2025-04-07T20:29:48+02:00' draft = true title = 'Debloating your rust binary' tags = ['rust', 'servicepoint'] +++

In CCC Berlin, there is a big pixel matrix hanging on the wall that we call "ServicePoint display". Anyone in the local network can send UDP packets containing commands that the display will execute. The commands are sent in a binary data structure and contain things like very basic text rendering and overwriting parts of the pixel buffer. I wrote (most of) the rust library servicepoint, which implements serialisation and deserialisation of those packets. There are also bindings for other languages, including C.

Some weeks ago, the only user of those C bindings I know said they'll stop using it, with a big grin on their face. While I know from experience that writing such a library is great fun (and thus does not need another reason), I immediately wanted to know why. The main reason they cited was binary size, and while there's probably something wrong with your computer if you do not have 1MB to spare, I agreed that it was too big for what it does. Thus, I was immediatedly nerd-sniped and I could not think about anything else in my spare time for a whole week. I had to find out why it was so big, and there would have to be a way to fix it.

This is part one, where I optimize the core library for size. In a future post, I also want to document how I got the C bindings smaller, as those use all features by default. There are also probably some additional challenges like ABI for shared libraries worth facing.

Most of what I cover here is descibed in Minimizing Rust Binary Size, though I hope the specific example I provide makes the topic more interesting.

Starting point

The commit I started on was fe67160974d9fed542eb37e5e9a202eaf6fe00dc, which is not part of main as of the writing of this post.

As I needed some binary to compare, I chose the example announce:

//! An example for how to send text to the display.

/// [1]
use clap::Parser;
use servicepoint::{
    CharGrid, CharGridCommand, ClearCommand, Connection, UdpConnection,
    TILE_WIDTH,
};

/// [2]
#[derive(Parser, Debug)]
struct Cli {
    #[arg(
        short,
        long,
        default_value = "localhost:2342",
        help = "Address of the display"
    )]
    destination: String,
    #[arg(short, long, num_args = 1.., value_delimiter = '\n',
        help = "Text to send - specify multiple times for multiple lines")]
    text: Vec<String>,
    #[arg(
        short,
        long,
        default_value_t = true,
        help = "Clear screen before sending text"
    )]
    clear: bool,
}

/// example: `cargo run -- --text "Hallo" --text "CCCB"`
fn main() {
    /// [3]
    let mut cli = Cli::parse();
    if cli.text.is_empty() {
        cli.text.push("Hello, CCCB!".to_string());
    }

    /// [4]
    let connection = UdpConnection::open(&cli.destination)
        .expect("could not connect to display");

    /// [5]
    if cli.clear {
        connection.send(ClearCommand).expect("sending clear failed");
    }

    let text = cli.text.join("\n"); /// [6]
    let command: CharGridCommand = CharGrid::wrap_str(TILE_WIDTH, &text).into(); /// [7]
    connection.send(command).expect("sending text failed"); /// [8]
}

Let's just run you through the program quickly.

  1. Some imports of the used libraries.
  2. The structure Cli is defined to hold the command line arguments. clap is used to automatically derive a Parser from the attributes on the fields.
  3. The command line arguments are parsed and a default value for the text to send is set.
  4. A UDP connection is opened1
  5. Depending on the arguments, the screen is cleared.
  6. All text snippets provided as an argument are concatenated with newlines in between. --text "Hallo" --text "CCCB" turns into Hallo\nCCCB.
  7. The string is wrapped to the width of the display, resulting in a CharGrid, which is then immediately turned into a CharGridCommand. No fields are changed after this, so the text will be rendered in the top left of the screen when executed on the display.
  8. The command is sent to the display.

At some steps, the program panics with an error message in case something went wrong.

I started with rustc 1.82.0 (f6e511eec 2024-10-15) from nixpkgs 0ff09db9d034a04acd4e8908820ba0b410d7a33a. For compiling the example, I just used the usual cargo build --release --example announce and checked the binary size with ll -B target/release/examples.

The resulting size was 1.1 MB, which should be easy enough to beat.

Low hanging fruits

The first thing that came to mind was -Os, so compiling for binary size. The rust equivalent is opt-level = "s", or z to also disable loop vectorization.

Option size in isolation (change) size cumulative (change)
baseline 1.137.384 1.137.384
opt-level = 'z' 1.186.104 1.186.104
opt-level = 's' 1.120.416 1.120.416
lto = true 914.496 808.528
codegen-units = 1 982.904 775.888
panic = 'abort' 979.840 703.096
strip = true 915.944 580.056
switching back to opt-level = 'z' 555.480

So it turns out, if you want to halve your binary size, a few flags are enough in stable rust. Also, the combinations of those settings do not work linearly, and sometimes what resulted in a smaller binary before now increased the size.

To only compile like this in specific szenarios, you can add a new profile to a crates Cargo.toml like this:

[profile.size-optimized]
inherits = "release"
opt-level = 's'     # Optimize for size
lto = true          # Enable link-time optimization
codegen-units = 1   # Reduce number of codegen units to increase optimizations
panic = 'abort'     # Abort on panic
strip = true        # Strip symbols from binary

The profile can be used by passing --profile=size-optimized instead of --release to cargo build. Because of the different profile, the binary ends up in a different folder (ll -B target/size-optimized/examples to check size).

Digging deeper

While this was a big improvement already, this was still 50 times the size of the C program.

If it was this easy halving it, can I do that a second time?

Everything from here on required unstable features of the rust flake for RedoxOS-development. The version I ended up with was rustc 1.88.0-nightly (5e17a2a91 2025-04-05). The executables I got with the unstable version were already a bit smaller again (546.528 bytes).

The first thing I noticed was that I got some new warnings when compiling, all of which I fixed immediately. As it was mostly inside of the documentation, I did not expect this to affect file size.

Next up, I added cargo-bloat to my flake. This tool can show you which functions take up most of the space in your binary. The invocation is similar to building - cargo bloat --example announce --profile=size-optimized resulted in the following output:

File  .text     Size        Crate Name
 1.0%   5.5%  21.0KiB clap_builder clap_builder::parser::parser::Parser::get_matches_with
 0.9%   5.3%  20.5KiB          std std::backtrace_rs::symbolize::gimli::Cache::with_global
 0.6%   3.3%  12.6KiB          std std::backtrace_rs::symbolize::gimli::Context::new
 0.4%   2.4%   9.2KiB          std gimli::read::dwarf::Unit<R>::new
 0.4%   2.1%   7.9KiB          std addr2line::line::LazyLines::borrow
 0.3%   2.0%   7.5KiB     announce announce::main
 0.3%   1.8%   7.1KiB          std miniz_oxide::inflate::core::decompress
 0.3%   1.6%   6.3KiB          std addr2line::unit::ResUnit<R>::find_function_or_location::{{closure}}
 0.3%   1.5%   5.6KiB clap_builder clap_builder::builder::command::Command::_build_self
 0.2%   1.4%   5.3KiB clap_builder clap_builder::output::help_template::HelpTemplate::write_templated_help
 0.2%   1.3%   5.1KiB clap_builder clap_builder::error::Error<F>::print
 0.2%   1.3%   4.9KiB clap_builder clap_builder::parser::parser::Parser::react
 0.2%   1.2%   4.8KiB clap_builder clap_builder::output::help_template::HelpTemplate::write_args
 0.2%   1.2%   4.6KiB          std gimli::read::unit::parse_attribute
 0.2%   1.1%   4.4KiB          std addr2line::function::Function<R>::parse_children
 0.2%   1.0%   3.7KiB clap_builder clap_builder::output::help_template::HelpTemplate::write_subcommands
 0.2%   1.0%   3.7KiB clap_builder clap_builder::output::usage::Usage::write_arg_usage
 0.2%   1.0%   3.7KiB          std gimli::read::rnglists::RngListIter<R>::next
 0.1%   0.8%   3.1KiB          std std::backtrace_rs::symbolize::gimli::elf::<impl std::backtrace_rs::symbolize::gimli::Mapping>::new_debug
 0.1%   0.8%   3.0KiB clap_builder clap_builder::parser::parser::Parser::match_arg_error
10.8%  61.8% 237.3KiB              And 993 smaller methods. Use -n N to show more.
17.5% 100.0% 384.2KiB              .text section size, the file size is 2.1MiB

Wait what? Why is the binary 2.1MB now? ll -B target/size-optimized/examples


  1. Yes, I know UDP does not have connections. Internally, this just opens a UDP socket. ↩︎