178 lines
9.4 KiB
Markdown
178 lines
9.4 KiB
Markdown
+++
|
|
date = '2025-04-07T20:29:48+02:00'
|
|
draft = true
|
|
title = 'Debloating your rust binary'
|
|
tags = ['rust', 'servicepoint']
|
|
+++
|
|
|
|
In [CCC Berlin](https://berlin.ccc.de/), there is a big pixel matrix hanging on the wall that we call "ServicePoint display".
|
|
Anyone in the local network can send UDP packets containing commands that the display will execute.
|
|
The commands are sent in a binary data structure and contain things like very basic text rendering and overwriting parts of the pixel buffer.
|
|
I wrote (most of) the rust library [servicepoint](https://crates.io/crates/servicepoint), which implements serialisation and deserialisation of those packets.
|
|
There are also bindings for other languages, [including C](https://git.berlin.ccc.de/servicepoint/servicepoint-binding-c).
|
|
|
|
Some weeks ago, the only user of those C bindings I know said they'll stop using it, with a big grin on their face.
|
|
While I know from experience that writing such a library is great fun (and thus does not need another reason), I immediately wanted to know why.
|
|
The main reason they cited was binary size, and while there's probably something wrong with your computer if you do not have 1MB to spare, I agreed that it was too big for what it does.
|
|
Thus, I was immediatedly nerd-sniped and I could not think about anything else in my spare time for a whole week.
|
|
I _had_ to find out why it was so big, and there would _have_ to be a way to fix it.
|
|
|
|
This is part one, where I optimize the core library for size.
|
|
In a future post, I also want to document how I got the C bindings smaller, as those use all features by default.
|
|
There are also probably some additional challenges like ABI for shared libraries worth facing.
|
|
|
|
Most of what I cover here is descibed in [Minimizing Rust Binary Size](https://github.com/johnthagen/min-sized-rust), though I hope the specific example I provide makes the topic more interesting.
|
|
|
|
## Starting point
|
|
|
|
The commit I started on was [fe67160974d9fed542eb37e5e9a202eaf6fe00dc](https://git.berlin.ccc.de/servicepoint/servicepoint/src/tag/tiny-rust-binaries-before), which is not part of `main` as of the writing of this post.
|
|
|
|
As I needed some binary to compare, I chose the example [announce](https://git.berlin.ccc.de/servicepoint/servicepoint/src/tag/tiny-rust-binaries-before/examples/announce.rs):
|
|
|
|
```rust
|
|
//! An example for how to send text to the display.
|
|
|
|
/// [1]
|
|
use clap::Parser;
|
|
use servicepoint::{
|
|
CharGrid, CharGridCommand, ClearCommand, Connection, UdpConnection,
|
|
TILE_WIDTH,
|
|
};
|
|
|
|
/// [2]
|
|
#[derive(Parser, Debug)]
|
|
struct Cli {
|
|
#[arg(
|
|
short,
|
|
long,
|
|
default_value = "localhost:2342",
|
|
help = "Address of the display"
|
|
)]
|
|
destination: String,
|
|
#[arg(short, long, num_args = 1.., value_delimiter = '\n',
|
|
help = "Text to send - specify multiple times for multiple lines")]
|
|
text: Vec<String>,
|
|
#[arg(
|
|
short,
|
|
long,
|
|
default_value_t = true,
|
|
help = "Clear screen before sending text"
|
|
)]
|
|
clear: bool,
|
|
}
|
|
|
|
/// example: `cargo run -- --text "Hallo" --text "CCCB"`
|
|
fn main() {
|
|
/// [3]
|
|
let mut cli = Cli::parse();
|
|
if cli.text.is_empty() {
|
|
cli.text.push("Hello, CCCB!".to_string());
|
|
}
|
|
|
|
/// [4]
|
|
let connection = UdpConnection::open(&cli.destination)
|
|
.expect("could not connect to display");
|
|
|
|
/// [5]
|
|
if cli.clear {
|
|
connection.send(ClearCommand).expect("sending clear failed");
|
|
}
|
|
|
|
let text = cli.text.join("\n"); /// [6]
|
|
let command: CharGridCommand = CharGrid::wrap_str(TILE_WIDTH, &text).into(); /// [7]
|
|
connection.send(command).expect("sending text failed"); /// [8]
|
|
}
|
|
```
|
|
|
|
Let's just run you through the program quickly.
|
|
|
|
1. Some imports of the used libraries.
|
|
2. The structure `Cli` is defined to hold the command line arguments. [clap](https://crates.io/crates/clap) is used to automatically derive a `Parser` from the attributes on the fields.
|
|
3. The command line arguments are parsed and a default value for the text to send is set.
|
|
4. A UDP connection is opened[^1]
|
|
5. Depending on the arguments, the screen is cleared.
|
|
6. All text snippets provided as an argument are concatenated with newlines in between. `--text "Hallo" --text "CCCB"` turns into `Hallo\nCCCB`.
|
|
7. The string is wrapped to the width of the display, resulting in a `CharGrid`, which is then immediately turned into a `CharGridCommand`. No fields are changed after this, so the text will be rendered in the top left of the screen when executed on the display.
|
|
8. The command is sent to the display.
|
|
|
|
At some steps, the program panics with an error message in case something went wrong.
|
|
|
|
I started with `rustc 1.82.0 (f6e511eec 2024-10-15)` from nixpkgs `0ff09db9d034a04acd4e8908820ba0b410d7a33a`.
|
|
For compiling the example, I just used the usual `cargo build --release --example announce` and checked the binary size with `ll -B target/release/examples`.
|
|
|
|
The resulting size was 1.1 MB, which should be easy enough to beat.
|
|
|
|
## Low hanging fruits
|
|
|
|
The first thing that came to mind was `-Os`, so compiling for binary size. The rust equivalent is `opt-level = "s"`, or `z` to also disable loop vectorization.
|
|
|
|
| Option | size in isolation (change) | size cumulative (change) |
|
|
| - | - | - |
|
|
| baseline | 1.137.384 | 1.137.384 |
|
|
| opt-level = 'z' | 1.186.104 | 1.186.104 |
|
|
| opt-level = 's' | 1.120.416 | 1.120.416 |
|
|
| lto = true | 914.496 | 808.528 |
|
|
| codegen-units = 1 | 982.904 | 775.888 |
|
|
| panic = 'abort' | 979.840 |703.096|
|
|
| strip = true | 915.944 | 580.056 |
|
|
| switching back to opt-level = 'z' | | 555.480 |
|
|
|
|
So it turns out, if you want to halve your binary size, a few flags are enough in stable rust. Also, the combinations of those settings do not work linearly, and sometimes what resulted in a smaller binary before now increased the size.
|
|
|
|
To only compile like this in specific szenarios, you can add a new profile to a crates `Cargo.toml` like this:
|
|
|
|
```toml
|
|
[profile.size-optimized]
|
|
inherits = "release"
|
|
opt-level = 's' # Optimize for size
|
|
lto = true # Enable link-time optimization
|
|
codegen-units = 1 # Reduce number of codegen units to increase optimizations
|
|
panic = 'abort' # Abort on panic
|
|
strip = true # Strip symbols from binary
|
|
```
|
|
|
|
The profile can be used by passing `--profile=size-optimized` instead of `--release` to `cargo build`.
|
|
Because of the different profile, the binary ends up in a different folder (`ll -B target/size-optimized/examples` to check size).
|
|
|
|
## Digging deeper
|
|
|
|
While this was a big improvement already, this was still 50 times the size of the C program.
|
|
|
|
_If it was this easy halving it, can I do that a second time?_
|
|
|
|
Everything from here on required unstable features of the rust [flake for RedoxOS-development](https://gitlab.redox-os.org/redox-os/redox/-/blob/cb34b9bd862f46729c0082c37a41782a3b1319c3/flake.nix#L38). The version I ended up with was `rustc 1.88.0-nightly (5e17a2a91 2025-04-05)`. The executables I got with the unstable version were already a bit smaller again (546.528 bytes).
|
|
|
|
The first thing I noticed was that I got some new warnings when compiling, all of which I fixed immediately. As it was mostly inside of the documentation, I did not expect this to affect file size.
|
|
|
|
Next up, I added cargo-bloat to my flake. This tool can show you which functions take up most of the space in your binary.
|
|
The invocation is similar to building - `cargo bloat --example announce --profile=size-optimized` resulted in the following output:
|
|
|
|
```
|
|
File .text Size Crate Name
|
|
1.0% 5.5% 21.0KiB clap_builder clap_builder::parser::parser::Parser::get_matches_with
|
|
0.9% 5.3% 20.5KiB std std::backtrace_rs::symbolize::gimli::Cache::with_global
|
|
0.6% 3.3% 12.6KiB std std::backtrace_rs::symbolize::gimli::Context::new
|
|
0.4% 2.4% 9.2KiB std gimli::read::dwarf::Unit<R>::new
|
|
0.4% 2.1% 7.9KiB std addr2line::line::LazyLines::borrow
|
|
0.3% 2.0% 7.5KiB announce announce::main
|
|
0.3% 1.8% 7.1KiB std miniz_oxide::inflate::core::decompress
|
|
0.3% 1.6% 6.3KiB std addr2line::unit::ResUnit<R>::find_function_or_location::{{closure}}
|
|
0.3% 1.5% 5.6KiB clap_builder clap_builder::builder::command::Command::_build_self
|
|
0.2% 1.4% 5.3KiB clap_builder clap_builder::output::help_template::HelpTemplate::write_templated_help
|
|
0.2% 1.3% 5.1KiB clap_builder clap_builder::error::Error<F>::print
|
|
0.2% 1.3% 4.9KiB clap_builder clap_builder::parser::parser::Parser::react
|
|
0.2% 1.2% 4.8KiB clap_builder clap_builder::output::help_template::HelpTemplate::write_args
|
|
0.2% 1.2% 4.6KiB std gimli::read::unit::parse_attribute
|
|
0.2% 1.1% 4.4KiB std addr2line::function::Function<R>::parse_children
|
|
0.2% 1.0% 3.7KiB clap_builder clap_builder::output::help_template::HelpTemplate::write_subcommands
|
|
0.2% 1.0% 3.7KiB clap_builder clap_builder::output::usage::Usage::write_arg_usage
|
|
0.2% 1.0% 3.7KiB std gimli::read::rnglists::RngListIter<R>::next
|
|
0.1% 0.8% 3.1KiB std std::backtrace_rs::symbolize::gimli::elf::<impl std::backtrace_rs::symbolize::gimli::Mapping>::new_debug
|
|
0.1% 0.8% 3.0KiB clap_builder clap_builder::parser::parser::Parser::match_arg_error
|
|
10.8% 61.8% 237.3KiB And 993 smaller methods. Use -n N to show more.
|
|
17.5% 100.0% 384.2KiB .text section size, the file size is 2.1MiB
|
|
```
|
|
|
|
Wait what? Why is the binary 2.1MB now? `ll -B target/size-optimized/examples `
|
|
|
|
[^1]: Yes, I know UDP does not have connections. Internally, this just opens a UDP socket. |