20 KiB
+++ date = '2025-04-07T20:29:48+02:00' draft = true title = 'Debloating your rust binary' tags = ['rust', 'servicepoint'] +++
In CCC Berlin, there is a big pixel matrix hanging on the wall that we call "ServicePoint display". Anyone in the local network can send UDP packets containing commands that the display will execute. The commands are sent in a binary data structure and contain things like very basic text rendering and overwriting parts of the pixel buffer. I wrote (most of) the rust library servicepoint, which implements serialisation and deserialisation of those packets. There are also bindings for other languages, including C.
Some weeks ago, the only user of those C bindings I know said they'll stop using it, with a big grin on their face. While I know from experience that writing such a library is great fun (and thus does not need another reason), I immediately wanted to know why. The main reason they cited was binary size, and while there's probably something wrong with your computer if you do not have 1MB to spare, I agreed that it was too big for what it does. Thus, I was immediatedly nerd-sniped and I could not think about anything else in my spare time for a whole week. I had to find out why it was so big, and there would have to be a way to fix it.
This is part one, where I optimize the core library for size. The order in which I tried all the options is changed for a better text structure, but the results are re-created in the order they appear using the stated tools. In a future post, I also want to document how I got the C bindings smaller, as those use all features by default.
Most of the techniques I used are descibed in Minimizing Rust Binary Size, though I hope the specific example I provide makes the topic interesting to readers not writing rust code.
Let's get hacking!
Starting point
The commit I started on was fe67160974d9fed542eb37e5e9a202eaf6fe00dc, which is not part of main
as of the writing of this post.
As I needed some binary to compare, I chose the example announce:
//! An example for how to send text to the display.
/// [1]
use clap::Parser;
use servicepoint::{
CharGrid, CharGridCommand, ClearCommand, Connection, UdpConnection,
TILE_WIDTH,
};
/// [2]
#[derive(Parser, Debug)]
struct Cli {
#[arg(short, long, default_value = "localhost:2342",
help = "Address of the display")]
destination: String,
#[arg(short, long, num_args = 1.., value_delimiter = '\n',
help = "Text to send - specify multiple times for multiple lines")]
text: Vec<String>,
#[arg(short, long, default_value_t = true,
help = "Clear screen before sending text")]
clear: bool,
}
/// example: `cargo run -- --text "Hallo" --text "CCCB"`
fn main() {
/// [3]
let mut cli = Cli::parse();
if cli.text.is_empty() {
cli.text.push("Hello, CCCB!".to_string());
}
/// [4]
let connection = UdpConnection::open(&cli.destination)
.expect("could not connect to display");
/// [5]
if cli.clear {
connection.send(ClearCommand).expect("sending clear failed");
}
let text = cli.text.join("\n"); /// [6]
let command: CharGridCommand = CharGrid::wrap_str(TILE_WIDTH, &text).into(); /// [7]
connection.send(command).expect("sending text failed"); /// [8]
}
Let's just run you through the program quickly.
- Some imports of the used libraries.
- The structure
Cli
is defined to hold the command line arguments. clap is used to automatically derive aParser
from the attributes on the fields. - The command line arguments are parsed and a default value for the text to send is set.
- A UDP connection is opened1
- Depending on the arguments, the screen is cleared.
- All text snippets provided as an argument are concatenated with newlines in between.
--text "Hallo" --text "CCCB"
turns intoHallo\nCCCB
. - The string is wrapped to the width of the display, resulting in a
CharGrid
, which is then immediately turned into aCharGridCommand
. No fields are changed after this, so the text will be rendered in the top left of the screen when executed on the display. - The command is sent to the display.
At some steps, the program panics with an error message in case something went wrong.
I started with rustc 1.82.0 (f6e511eec 2024-10-15)
from nixpkgs 0ff09db9d034a04acd4e8908820ba0b410d7a33a
.
For compiling the example, I just used the usual cargo build --release --example announce
and checked the binary size with ll -B target/release/examples
.
The resulting size was 1.1 MB, which should be easy enough to beat.
Low hanging fruits
Compiler options
The first thing that came to mind was -Os
, so compiling for binary size. The rust equivalent is opt-level = "s"
, or z
to also disable loop vectorization.
Option | size in isolation (change) | size cumulative (change) |
---|---|---|
baseline | 1.137.384 | 1.137.384 |
opt-level = 'z' | 1.186.104 | 1.186.104 |
opt-level = 's' | 1.120.416 | 1.120.416 |
lto = true | 914.496 | 808.528 |
codegen-units = 1 | 982.904 | 775.888 |
panic = 'abort' | 979.840 | 703.096 |
strip = true | 915.944 | 580.056 |
switching back to opt-level = 'z' | 555.480 |
So it turns out, if you want to halve your binary size, a few flags are enough in stable rust. Also, the combinations of those settings do not work linearly, and sometimes what resulted in a smaller binary before now increased the size. The only compromise apart from compilation time is the change in panic behavior, as this means no stack traces on crash2.
To only compile like this in specific szenarios, you can add a new profile to a crates Cargo.toml
like this:
[profile.size-optimized]
inherits = "release"
opt-level = 's' # Optimize for size
lto = true # Enable link-time optimization
codegen-units = 1 # Reduce number of codegen units to increase optimizations
panic = 'abort' # Abort on panic
strip = true # Strip symbols from binary
The profile can be used by passing --profile=size-optimized
instead of --release
to cargo build
.
Because of the different profile, the binary ends up in a different folder (ll -B target/size-optimized/examples
to check size).
Features
Rust has a very handy way to represent variability in a library called features.
The servicepoint
library has the following declaration in it's Cargo.toml
:
[features]
default = ["compression_lzma", "protocol_udp", "cp437"]
compression_zlib = ["dep:flate2"]
compression_bzip2 = ["dep:bzip2"]
compression_lzma = ["dep:rust-lzma"]
compression_zstd = ["dep:zstd"]
all_compressions = ["compression_zlib", "compression_bzip2", "compression_lzma", "compression_zstd"]
rand = ["dep:rand"]
protocol_udp = []
protocol_websocket = ["dep:tungstenite"]
cp437 = ["dep:once_cell"]
Line two means by default, cargo will enable LZMA compression, sending via UDP sockets and conversion between CP-437 and UTF-8. Each of those features pulls in an optional dependency (which is why I made those features toggleable in the first place). In the code, CP-437 and compression are not needed3, but UDP is obviously used.
Features can be toggled on the command line, which means the invocation can be changed to the following: cargo build --example announce --profile=size-optimized --no-default-features --features=protocol_udp
4.
The result is a 555.480 Byte binary, which is exactly the same as without those flags. This is not really surprising, as we enabled a bunch of compiler options that help remove whole sections of code that are not needed, especially link time optimization.
In the rest of this post, I will omit those parameters, probably to the detriment of compilation time.
Digging deeper
While this was a big improvement already, this was still 50 times the size of the C program.
If it was this easy halving it, can I do that a second time?
Everything from here on required unstable features of the rust flake for RedoxOS-development. The version I ended up with was rustc 1.88.0-nightly (5e17a2a91 2025-04-05)
.
In my environment, I had to call nightly cargo with rustup run nightly cargo
, but that part is not included in the rest of the commands.
The executables I got with the unstable version were already a bit smaller again (546.528 bytes).
The first thing I noticed was that I got some new warnings when compiling, all of which I fixed immediately. As it was mostly inside of the documentation, I did not expect this to affect file size.
Next up, I added cargo-bloat to my flake. This tool can show you which functions take up most of the space in your binary.
The invocation is similar to building - cargo bloat --example announce --profile=size-optimized
resulted in the following output:
File .text Size Crate Name
1.0% 5.5% 21.0KiB clap_builder clap_builder::parser::parser::Parser::get_matches_with
0.9% 5.3% 20.5KiB std std::backtrace_rs::symbolize::gimli::Cache::with_global
0.6% 3.3% 12.6KiB std std::backtrace_rs::symbolize::gimli::Context::new
0.4% 2.4% 9.2KiB std gimli::read::dwarf::Unit<R>::new
0.4% 2.1% 7.9KiB std addr2line::line::LazyLines::borrow
0.3% 2.0% 7.5KiB announce announce::main
0.3% 1.8% 7.1KiB std miniz_oxide::inflate::core::decompress
0.3% 1.6% 6.3KiB std addr2line::unit::ResUnit<R>::find_function_or_location::{{closure}}
0.3% 1.5% 5.6KiB clap_builder clap_builder::builder::command::Command::_build_self
0.2% 1.4% 5.3KiB clap_builder clap_builder::output::help_template::HelpTemplate::write_templated_help
0.2% 1.3% 5.1KiB clap_builder clap_builder::error::Error<F>::print
0.2% 1.3% 4.9KiB clap_builder clap_builder::parser::parser::Parser::react
0.2% 1.2% 4.8KiB clap_builder clap_builder::output::help_template::HelpTemplate::write_args
0.2% 1.2% 4.6KiB std gimli::read::unit::parse_attribute
0.2% 1.1% 4.4KiB std addr2line::function::Function<R>::parse_children
0.2% 1.0% 3.7KiB clap_builder clap_builder::output::help_template::HelpTemplate::write_subcommands
0.2% 1.0% 3.7KiB clap_builder clap_builder::output::usage::Usage::write_arg_usage
0.2% 1.0% 3.7KiB std gimli::read::rnglists::RngListIter<R>::next
0.1% 0.8% 3.1KiB std std::backtrace_rs::symbolize::gimli::elf::<impl std::backtrace_rs::symbolize::gimli::Mapping>::new_debug
0.1% 0.8% 3.0KiB clap_builder clap_builder::parser::parser::Parser::match_arg_error
10.8% 61.8% 237.3KiB And 993 smaller methods. Use -n N to show more.
17.5% 100.0% 384.2KiB .text section size, the file size is 2.1MiB
From the table, we can already see some interesting stuff.
- For some reason, the
.text
section (the machine code) is only a small part of the executable, and the total size increased by a factor of 4. - The biggest function and a bunch of other big ones are from
clap_builder
, a crate that is part of the command line argument parser. std
thakes up most of the rest.servicepoint
does not even show up in the top list.
Let's cover those points in order.
1. Unexpected binary size when building via cargo-bloat
Using GNU size
, we can check the size per section in the ELF binary.
Using -G
or -B
output formats does not work for this, as it will only show the .text
and .data
section, which in this case only make up around 500KB.
Thus the command I used was size -A --common target/size-optimized/examples/announce
, giving the following result:
section size addr
.dynsym 1680 856
.dynstr 1198 3500
.rela.dyn 22800 4704
.gcc_except_table 3728 27552
.rodata 36592 31280
.eh_frame_hdr 8116 67872
.eh_frame 52488 75992
.text 393449 132576
.data.rel.ro 18760 530248
.relro_padding 2888 550072
.data 2400 554168
.debug_abbrev 1810 0
.debug_info 525404 0
.debug_aranges 6256 0
.debug_ranges 157856 0
.debug_str 726991 0
.debug_line 149936 0
Total 2115147
(I filtered out the rows <1KB for brevity)
Turns out cargo-bloat
disables symbol stripping, because it needs those to show to the user.
It's not even the symbols that are included in release builds by default - all the debugging information is included.
That means, I can ignore that problem and focus on the .text
size.
2. Removing clap
While clap is super handy, it looks like the code needed to parse two simple arguments blows up the executable. That's probably a mix of complex parsing logic with error handling, constant strings and data-dependent code paths the compiler cannot detect as not being used. As the C program I was comparing against had all the parameters hard-coded, I just ripped out the dependency and hard-coded the values I needed.
The result is the first version of tiny_announce
, as I did not want to change the existing example.
//! An example for how to send text to the display.
use servicepoint::{
CharGrid, CharGridCommand, ClearCommand, Connection, UdpConnection,
TILE_WIDTH,
};
/// example: `cargo run -- --text "Hallo" --text "CCCB"`
fn main() {
let text = "Hello, CCCB!";
let connection = UdpConnection::open("127.0.0.1:2342")
.expect("could not connect to display");
connection.send(ClearCommand).expect("sending clear failed");
let command: CharGridCommand = CharGrid::wrap_str(TILE_WIDTH, &text).into();
connection.send(command).expect("sending text failed");
}
The command to compile changed slightly because of the new name. cargo build --example tiny_announce --profile=size-optimized && ll -B target/size-optimized/examples/tiny_announce
gave me the new binary size.
Drumroll... 324.624 Bytes!
40% of the binary was argument parsing.
This also makes the main disappear from the top sized functions.
While removing a library you do not really need is also available in stable rust, I was only able to notice that with tooling only available on nightly, so I am putting it into that category.
3. build-std
Looking at the biggest functions again (now cargo bloat --example tiny_announce --profile=size-optimized
) showed that all the big functions left were from std
.
Most of that looked like stack unwinding and debug data parsing, which is odd as we added panic = 'abort'
in the first chapter.
As it turns out, as an optimization for the development workflow, by default cargo does not recompile the standard library.
Instead, a prebuilt version included in the toolchain is used.
This is possible, because the compiler knows stdlib
is compiled with the exact same version as the user's program is, otherwise the missing ABI stability in Rust comes into play.
The compiler arguments for that are fixed, and to change that we neeed the unstable option -Zbuild-std
and have to list which sub-crates we want to build (which is pretty much all of them).
Because we also have panic = "abort"
set, we need to also pass in -Zbuild-std-features="panic_immediate_abort"
so there is no compilation error.
cargo build --example tiny_announce --profile=size-optimized -Zbuild-std="core,std,alloc,proc_macro,panic_abort" -Zbuild-std-features="panic_immediate_abort"
This produces a binary that is now only 30.992 bytes!
to_socket_addrs
The remaining top 3 functions were:
File .text Size Crate Name
4.4% 11.0% 2.0KiB std <&T as std::net::socket_addr::ToSocketAddrs>::to_socket_addrs
3.8% 9.4% 1.7KiB tiny_announce tiny_announce::main
2.9% 7.3% 1.4KiB [Unknown] main
Finally our code shows up again! But what is that? 4.4% used by to_socket_addrs
?
We found the last string parsing code, this time in the standard library, to read the IP and Port from a string.
After changing it in the example, it still showed up which brings me to the first and only time I actually changed the servicepoint
library as a result from this saga.
- let socket = UdpSocket::bind("0.0.0.0:0")?;
+ let addr = SocketAddr::from(([0, 0, 0, 0], 0));
+ let socket = UdpSocket::bind(addr)?;
This also seemed to remove other functions as well, as the size was down to 17.272 bytes, nearly halving the size again. It is now smaller than this article as plain text markdown.
no_main
You'd think that now main
is the top function, but Iter::next
is now the biggest function for some reason.
Still, [Unknown] main
and the actual main take up 10% of the remaining size according to cargo bloat
.
We surely cannot reduce that, right? Wrong!
With #[no_main], you can tell rust to not add any initialization code.
This means the normal fn main()
does not get used, and the linker complains about the missing function.
To fix this, the function can be converted to a C-style main.
I also removed some more code by initializing the CharGrid directly instead of wrapping a string, which saved 400 bytes.
#![no_main]
use servicepoint::{
CharGrid, CharGridCommand, ClearCommand, Connection, UdpConnection,
};
use std::net::SocketAddr;
#[unsafe(no_mangle)]
pub extern "C" fn main(_argc: isize, _argv: *const *const u8) -> isize {
// not parsing the address from str removes 3KB
let addr = SocketAddr::from(([172, 23, 42, 29], 80));
let connection = UdpConnection::open(addr).unwrap();
connection.send(ClearCommand).unwrap(); // <--
let grid = CharGrid::from_vec(5, vec!['H', 'e', 'l', 'l', 'o', 'W', 'o', 'r', 'l', 'd']).unwrap();
connection.send(CharGridCommand::from(grid)).unwrap();
0
}
This resulted in a 8.064 byte executable, finally beating both GCC and LLVM compiling the minimal C program (around 10KB). If we were to remove the marked line and not clear the screen, we could drop it further to 7.696 bytes.
Advanced compiler abuse
There are two things left to reach the absolute bottom without ripping out the standard libary alltogether.
In rust, a function can tell the compiler to get the calling location as a parameter to the function.
With -Zlocation-detail=none
, we instruct the rust compiler to just not bother with that.
-Zfmt-debug=none
is similar but worse, because it changes all the default Debug
implementations to do nothing at all.
The change in behavior is not obvious in this example, but do this in an application that has logging and it will be horribly broken.
As an icing on the cake, those two options cannot be passed via cargo
arguments, so we have to use the environment variable RUSTFLAGS
to pass this through to when rustc
is invoked.
The final command to build the tiniest possible announce in all it's glory:
RUSTFLAGS="-Zlocation-detail=none -Zfmt-debug=none" \
cargo build \
--example tiny_announce \
--profile=size-optimized \
--no-default-features \
--features=protocol_udp \
-Zbuild-std="core,std,alloc,proc_macro,panic_abort" \
-Zbuild-std-features="panic_immediate_abort"
All of this reduces the final binary size to 7.696 bytes.
Conclusion
-
Yes, I know UDP does not have connections. Internally, this just opens a UDP socket ↩︎
-
Technically, you can catch a panic while unwinding and there may even be a weird performance argument for doing that, see ↩︎
-
Some commands can be compressed, but the text ones (both CP-437 and UTF-8) cannot. Clear is a very simple command that does not have any payload, so no compression there either. If a
BitmapCommand
was used instead, usinginto()
on aBitmap
would have hidden the fact that the default compression is used in that case. The default compression in turn is either LZMA or no compression, depending on whether the LZMA feature is enabled. ↩︎ -
This works here because
announce
is an example inside of the library itself. As an actual dpendent, you would specify this in yourCargo.toml
. ↩︎