diff --git a/content/posts/rust-tiny-binaries.md b/content/posts/rust-tiny-binaries.md new file mode 100644 index 0000000..9e992b0 --- /dev/null +++ b/content/posts/rust-tiny-binaries.md @@ -0,0 +1,178 @@ ++++ +date = '2025-04-07T20:29:48+02:00' +draft = true +title = 'Debloating your rust binary' +tags = ['rust', 'servicepoint'] ++++ + +In [CCC Berlin](https://berlin.ccc.de/), there is a big pixel matrix hanging on the wall that we call "ServicePoint display". +Anyone in the local network can send UDP packets containing commands that the display will execute. +The commands are sent in a binary data structure and contain things like very basic text rendering and overwriting parts of the pixel buffer. +I wrote (most of) the rust library [servicepoint](https://crates.io/crates/servicepoint), which implements serialisation and deserialisation of those packets. +There are also bindings for other languages, [including C](https://git.berlin.ccc.de/servicepoint/servicepoint-binding-c). + +Some weeks ago, the only user of those C bindings I know said they'll stop using it, with a big grin on their face. +While I know from experience that writing such a library is great fun (and thus does not need another reason), I immediately wanted to know why. +The main reason they cited was binary size, and while there's probably something wrong with your computer if you do not have 1MB to spare, I agreed that it was too big for what it does. +Thus, I was immediatedly nerd-sniped and I could not think about anything else in my spare time for a whole week. +I _had_ to find out why it was so big, and there would _have_ to be a way to fix it. + +This is part one, where I optimize the core library for size. +In a future post, I also want to document how I got the C bindings smaller, as those use all features by default. +There are also probably some additional challenges like ABI for shared libraries worth facing. + +Most of what I cover here is descibed in [Minimizing Rust Binary Size](https://github.com/johnthagen/min-sized-rust), though I hope the specific example I provide makes the topic more interesting. + +## Starting point + +The commit I started on was [fe67160974d9fed542eb37e5e9a202eaf6fe00dc](https://git.berlin.ccc.de/servicepoint/servicepoint/src/tag/tiny-rust-binaries-before), which is not part of `main` as of the writing of this post. + +As I needed some binary to compare, I chose the example [announce](https://git.berlin.ccc.de/servicepoint/servicepoint/src/tag/tiny-rust-binaries-before/examples/announce.rs): + +```rust +//! An example for how to send text to the display. + +/// [1] +use clap::Parser; +use servicepoint::{ + CharGrid, CharGridCommand, ClearCommand, Connection, UdpConnection, + TILE_WIDTH, +}; + +/// [2] +#[derive(Parser, Debug)] +struct Cli { + #[arg( + short, + long, + default_value = "localhost:2342", + help = "Address of the display" + )] + destination: String, + #[arg(short, long, num_args = 1.., value_delimiter = '\n', + help = "Text to send - specify multiple times for multiple lines")] + text: Vec, + #[arg( + short, + long, + default_value_t = true, + help = "Clear screen before sending text" + )] + clear: bool, +} + +/// example: `cargo run -- --text "Hallo" --text "CCCB"` +fn main() { + /// [3] + let mut cli = Cli::parse(); + if cli.text.is_empty() { + cli.text.push("Hello, CCCB!".to_string()); + } + + /// [4] + let connection = UdpConnection::open(&cli.destination) + .expect("could not connect to display"); + + /// [5] + if cli.clear { + connection.send(ClearCommand).expect("sending clear failed"); + } + + let text = cli.text.join("\n"); /// [6] + let command: CharGridCommand = CharGrid::wrap_str(TILE_WIDTH, &text).into(); /// [7] + connection.send(command).expect("sending text failed"); /// [8] +} +``` + +Let's just run you through the program quickly. + +1. Some imports of the used libraries. +2. The structure `Cli` is defined to hold the command line arguments. [clap](https://crates.io/crates/clap) is used to automatically derive a `Parser` from the attributes on the fields. +3. The command line arguments are parsed and a default value for the text to send is set. +4. A UDP connection is opened[^1] +5. Depending on the arguments, the screen is cleared. +6. All text snippets provided as an argument are concatenated with newlines in between. `--text "Hallo" --text "CCCB"` turns into `Hallo\nCCCB`. +7. The string is wrapped to the width of the display, resulting in a `CharGrid`, which is then immediately turned into a `CharGridCommand`. No fields are changed after this, so the text will be rendered in the top left of the screen when executed on the display. +8. The command is sent to the display. + +At some steps, the program panics with an error message in case something went wrong. + +I started with `rustc 1.82.0 (f6e511eec 2024-10-15)` from nixpkgs `0ff09db9d034a04acd4e8908820ba0b410d7a33a`. +For compiling the example, I just used the usual `cargo build --release --example announce` and checked the binary size with `ll -B target/release/examples`. + +The resulting size was 1.1 MB, which should be easy enough to beat. + +## Low hanging fruits + +The first thing that came to mind was `-Os`, so compiling for binary size. The rust equivalent is `opt-level = "s"`, or `z` to also disable loop vectorization. + +| Option | size in isolation (change) | size cumulative (change) | +| - | - | - | +| baseline | 1.137.384 | 1.137.384 | +| opt-level = 'z' | 1.186.104 | 1.186.104 | +| opt-level = 's' | 1.120.416 | 1.120.416 | +| lto = true | 914.496 | 808.528 | +| codegen-units = 1 | 982.904 | 775.888 | +| panic = 'abort' | 979.840 |703.096| +| strip = true | 915.944 | 580.056 | +| switching back to opt-level = 'z' | | 555.480 | + +So it turns out, if you want to halve your binary size, a few flags are enough in stable rust. Also, the combinations of those settings do not work linearly, and sometimes what resulted in a smaller binary before now increased the size. + +To only compile like this in specific szenarios, you can add a new profile to a crates `Cargo.toml` like this: + +```toml +[profile.size-optimized] +inherits = "release" +opt-level = 's' # Optimize for size +lto = true # Enable link-time optimization +codegen-units = 1 # Reduce number of codegen units to increase optimizations +panic = 'abort' # Abort on panic +strip = true # Strip symbols from binary +``` + +The profile can be used by passing `--profile=size-optimized` instead of `--release` to `cargo build`. +Because of the different profile, the binary ends up in a different folder (`ll -B target/size-optimized/examples` to check size). + +## Digging deeper + +While this was a big improvement already, this was still 50 times the size of the C program. + +_If it was this easy halving it, can I do that a second time?_ + +Everything from here on required unstable features of the rust [flake for RedoxOS-development](https://gitlab.redox-os.org/redox-os/redox/-/blob/cb34b9bd862f46729c0082c37a41782a3b1319c3/flake.nix#L38). The version I ended up with was `rustc 1.88.0-nightly (5e17a2a91 2025-04-05)`. The executables I got with the unstable version were already a bit smaller again (546.528 bytes). + +The first thing I noticed was that I got some new warnings when compiling, all of which I fixed immediately. As it was mostly inside of the documentation, I did not expect this to affect file size. + +Next up, I added cargo-bloat to my flake. This tool can show you which functions take up most of the space in your binary. +The invocation is similar to building - `cargo bloat --example announce --profile=size-optimized` resulted in the following output: + +``` +File .text Size Crate Name + 1.0% 5.5% 21.0KiB clap_builder clap_builder::parser::parser::Parser::get_matches_with + 0.9% 5.3% 20.5KiB std std::backtrace_rs::symbolize::gimli::Cache::with_global + 0.6% 3.3% 12.6KiB std std::backtrace_rs::symbolize::gimli::Context::new + 0.4% 2.4% 9.2KiB std gimli::read::dwarf::Unit::new + 0.4% 2.1% 7.9KiB std addr2line::line::LazyLines::borrow + 0.3% 2.0% 7.5KiB announce announce::main + 0.3% 1.8% 7.1KiB std miniz_oxide::inflate::core::decompress + 0.3% 1.6% 6.3KiB std addr2line::unit::ResUnit::find_function_or_location::{{closure}} + 0.3% 1.5% 5.6KiB clap_builder clap_builder::builder::command::Command::_build_self + 0.2% 1.4% 5.3KiB clap_builder clap_builder::output::help_template::HelpTemplate::write_templated_help + 0.2% 1.3% 5.1KiB clap_builder clap_builder::error::Error::print + 0.2% 1.3% 4.9KiB clap_builder clap_builder::parser::parser::Parser::react + 0.2% 1.2% 4.8KiB clap_builder clap_builder::output::help_template::HelpTemplate::write_args + 0.2% 1.2% 4.6KiB std gimli::read::unit::parse_attribute + 0.2% 1.1% 4.4KiB std addr2line::function::Function::parse_children + 0.2% 1.0% 3.7KiB clap_builder clap_builder::output::help_template::HelpTemplate::write_subcommands + 0.2% 1.0% 3.7KiB clap_builder clap_builder::output::usage::Usage::write_arg_usage + 0.2% 1.0% 3.7KiB std gimli::read::rnglists::RngListIter::next + 0.1% 0.8% 3.1KiB std std::backtrace_rs::symbolize::gimli::elf::::new_debug + 0.1% 0.8% 3.0KiB clap_builder clap_builder::parser::parser::Parser::match_arg_error +10.8% 61.8% 237.3KiB And 993 smaller methods. Use -n N to show more. +17.5% 100.0% 384.2KiB .text section size, the file size is 2.1MiB +``` + +Wait what? Why is the binary 2.1MB now? `ll -B target/size-optimized/examples ` + +[^1]: Yes, I know UDP does not have connections. Internally, this just opens a UDP socket. \ No newline at end of file diff --git a/content/projects/servicepoint/servicepoint.md b/content/projects/servicepoint/servicepoint.md index 1b58710..580bdf3 100644 --- a/content/projects/servicepoint/servicepoint.md +++ b/content/projects/servicepoint/servicepoint.md @@ -3,6 +3,3 @@ date = '2025-04-06T12:24:08+02:00' draft = true title = 'servicepoint' +++ - - -this is servicepoint