+++ date = '2025-04-07T20:29:48+02:00' draft = true title = 'Debloating your rust binary' tags = ['rust', 'servicepoint'] +++ In [CCC Berlin](https://berlin.ccc.de/), there is a big pixel matrix hanging on the wall that we call "ServicePoint display". Anyone in the local network can send UDP packets containing commands that the display will execute. The commands are sent in a binary data structure and contain things like very basic text rendering and overwriting parts of the pixel buffer. I wrote (most of) the rust library [servicepoint](https://crates.io/crates/servicepoint), which implements serialisation and deserialisation of those packets. There are also bindings for other languages, [including C](https://git.berlin.ccc.de/servicepoint/servicepoint-binding-c). Some weeks ago, the only user of those C bindings I know said they'll stop using it, with a big grin on their face. While I know from experience that writing such a library is great fun (and thus does not need another reason), I immediately wanted to know why. The main reason they cited was binary size, and while there's probably something wrong with your computer if you do not have 1MB to spare, I agreed that it was too big for what it does. Thus, I was immediatedly nerd-sniped and I could not think about anything else in my spare time for a whole week. I _had_ to find out why it was so big, and there would _have_ to be a way to fix it. This is part one, where I optimize the core library for size. In a future post, I also want to document how I got the C bindings smaller, as those use all features by default. There are also probably some additional challenges like ABI for shared libraries worth facing. Most of what I cover here is descibed in [Minimizing Rust Binary Size](https://github.com/johnthagen/min-sized-rust), though I hope the specific example I provide makes the topic more interesting. ## Starting point The commit I started on was [fe67160974d9fed542eb37e5e9a202eaf6fe00dc](https://git.berlin.ccc.de/servicepoint/servicepoint/src/tag/tiny-rust-binaries-before), which is not part of `main` as of the writing of this post. As I needed some binary to compare, I chose the example [announce](https://git.berlin.ccc.de/servicepoint/servicepoint/src/tag/tiny-rust-binaries-before/examples/announce.rs): ```rust //! An example for how to send text to the display. /// [1] use clap::Parser; use servicepoint::{ CharGrid, CharGridCommand, ClearCommand, Connection, UdpConnection, TILE_WIDTH, }; /// [2] #[derive(Parser, Debug)] struct Cli { #[arg( short, long, default_value = "localhost:2342", help = "Address of the display" )] destination: String, #[arg(short, long, num_args = 1.., value_delimiter = '\n', help = "Text to send - specify multiple times for multiple lines")] text: Vec, #[arg( short, long, default_value_t = true, help = "Clear screen before sending text" )] clear: bool, } /// example: `cargo run -- --text "Hallo" --text "CCCB"` fn main() { /// [3] let mut cli = Cli::parse(); if cli.text.is_empty() { cli.text.push("Hello, CCCB!".to_string()); } /// [4] let connection = UdpConnection::open(&cli.destination) .expect("could not connect to display"); /// [5] if cli.clear { connection.send(ClearCommand).expect("sending clear failed"); } let text = cli.text.join("\n"); /// [6] let command: CharGridCommand = CharGrid::wrap_str(TILE_WIDTH, &text).into(); /// [7] connection.send(command).expect("sending text failed"); /// [8] } ``` Let's just run you through the program quickly. 1. Some imports of the used libraries. 2. The structure `Cli` is defined to hold the command line arguments. [clap](https://crates.io/crates/clap) is used to automatically derive a `Parser` from the attributes on the fields. 3. The command line arguments are parsed and a default value for the text to send is set. 4. A UDP connection is opened[^1] 5. Depending on the arguments, the screen is cleared. 6. All text snippets provided as an argument are concatenated with newlines in between. `--text "Hallo" --text "CCCB"` turns into `Hallo\nCCCB`. 7. The string is wrapped to the width of the display, resulting in a `CharGrid`, which is then immediately turned into a `CharGridCommand`. No fields are changed after this, so the text will be rendered in the top left of the screen when executed on the display. 8. The command is sent to the display. At some steps, the program panics with an error message in case something went wrong. I started with `rustc 1.82.0 (f6e511eec 2024-10-15)` from nixpkgs `0ff09db9d034a04acd4e8908820ba0b410d7a33a`. For compiling the example, I just used the usual `cargo build --release --example announce` and checked the binary size with `ll -B target/release/examples`. The resulting size was 1.1 MB, which should be easy enough to beat. ## Low hanging fruits The first thing that came to mind was `-Os`, so compiling for binary size. The rust equivalent is `opt-level = "s"`, or `z` to also disable loop vectorization. | Option | size in isolation (change) | size cumulative (change) | | - | - | - | | baseline | 1.137.384 | 1.137.384 | | opt-level = 'z' | 1.186.104 | 1.186.104 | | opt-level = 's' | 1.120.416 | 1.120.416 | | lto = true | 914.496 | 808.528 | | codegen-units = 1 | 982.904 | 775.888 | | panic = 'abort' | 979.840 |703.096| | strip = true | 915.944 | 580.056 | | switching back to opt-level = 'z' | | 555.480 | So it turns out, if you want to halve your binary size, a few flags are enough in stable rust. Also, the combinations of those settings do not work linearly, and sometimes what resulted in a smaller binary before now increased the size. To only compile like this in specific szenarios, you can add a new profile to a crates `Cargo.toml` like this: ```toml [profile.size-optimized] inherits = "release" opt-level = 's' # Optimize for size lto = true # Enable link-time optimization codegen-units = 1 # Reduce number of codegen units to increase optimizations panic = 'abort' # Abort on panic strip = true # Strip symbols from binary ``` The profile can be used by passing `--profile=size-optimized` instead of `--release` to `cargo build`. Because of the different profile, the binary ends up in a different folder (`ll -B target/size-optimized/examples` to check size). ## Digging deeper While this was a big improvement already, this was still 50 times the size of the C program. _If it was this easy halving it, can I do that a second time?_ Everything from here on required unstable features of the rust [flake for RedoxOS-development](https://gitlab.redox-os.org/redox-os/redox/-/blob/cb34b9bd862f46729c0082c37a41782a3b1319c3/flake.nix#L38). The version I ended up with was `rustc 1.88.0-nightly (5e17a2a91 2025-04-05)`. The executables I got with the unstable version were already a bit smaller again (546.528 bytes). The first thing I noticed was that I got some new warnings when compiling, all of which I fixed immediately. As it was mostly inside of the documentation, I did not expect this to affect file size. Next up, I added cargo-bloat to my flake. This tool can show you which functions take up most of the space in your binary. The invocation is similar to building - `cargo bloat --example announce --profile=size-optimized` resulted in the following output: ``` File .text Size Crate Name 1.0% 5.5% 21.0KiB clap_builder clap_builder::parser::parser::Parser::get_matches_with 0.9% 5.3% 20.5KiB std std::backtrace_rs::symbolize::gimli::Cache::with_global 0.6% 3.3% 12.6KiB std std::backtrace_rs::symbolize::gimli::Context::new 0.4% 2.4% 9.2KiB std gimli::read::dwarf::Unit::new 0.4% 2.1% 7.9KiB std addr2line::line::LazyLines::borrow 0.3% 2.0% 7.5KiB announce announce::main 0.3% 1.8% 7.1KiB std miniz_oxide::inflate::core::decompress 0.3% 1.6% 6.3KiB std addr2line::unit::ResUnit::find_function_or_location::{{closure}} 0.3% 1.5% 5.6KiB clap_builder clap_builder::builder::command::Command::_build_self 0.2% 1.4% 5.3KiB clap_builder clap_builder::output::help_template::HelpTemplate::write_templated_help 0.2% 1.3% 5.1KiB clap_builder clap_builder::error::Error::print 0.2% 1.3% 4.9KiB clap_builder clap_builder::parser::parser::Parser::react 0.2% 1.2% 4.8KiB clap_builder clap_builder::output::help_template::HelpTemplate::write_args 0.2% 1.2% 4.6KiB std gimli::read::unit::parse_attribute 0.2% 1.1% 4.4KiB std addr2line::function::Function::parse_children 0.2% 1.0% 3.7KiB clap_builder clap_builder::output::help_template::HelpTemplate::write_subcommands 0.2% 1.0% 3.7KiB clap_builder clap_builder::output::usage::Usage::write_arg_usage 0.2% 1.0% 3.7KiB std gimli::read::rnglists::RngListIter::next 0.1% 0.8% 3.1KiB std std::backtrace_rs::symbolize::gimli::elf::::new_debug 0.1% 0.8% 3.0KiB clap_builder clap_builder::parser::parser::Parser::match_arg_error 10.8% 61.8% 237.3KiB And 993 smaller methods. Use -n N to show more. 17.5% 100.0% 384.2KiB .text section size, the file size is 2.1MiB ``` Wait what? Why is the binary 2.1MB now? `ll -B target/size-optimized/examples ` [^1]: Yes, I know UDP does not have connections. Internally, this just opens a UDP socket.