proof read tiny rust binaries, change url

This commit is contained in:
Vinzenz Schroeter 2025-04-09 23:58:25 +02:00
parent 718163d969
commit 19a412f05b
2 changed files with 44 additions and 32 deletions

View file

@ -1,25 +1,25 @@
+++ +++
date = '2025-04-07T20:29:48+02:00' date = '2025-04-09T20:29:48+02:00'
draft = true draft = false
title = 'Tiny Rust binaries' title = 'Optimizing Rust binary size'
tags = ['rust', 'servicepoint'] tags = ['rust', 'servicepoint']
+++ +++
In [CCC Berlin](https://berlin.ccc.de/), there is a big pixel matrix hanging on the wall that we call "ServicePoint display". In [CCC Berlin](https://berlin.ccc.de/), there is a large pixel matrix hanging on the wall that we call "ServicePoint display".
Anyone in the local network can send UDP packets containing commands that the display will execute. It receives commands from the local network via UDP, which contain things like very basic text rendering and overwriting parts of the pixel buffer.
The commands are sent in a binary data structure and contain things like very basic text rendering and overwriting parts of the pixel buffer. The commands are sent in an efficient binary data structure.
I wrote (most of) the Rust library [servicepoint](https://crates.io/crates/servicepoint), which implements serialisation and deserialisation of those packets. I wrote (most of) the Rust library [servicepoint](https://crates.io/crates/servicepoint), which implements this protocol including serialisation, deserialisation and a bunch of extras for easily crating these packets.
There are also bindings for other languages, [including C](https://git.berlin.ccc.de/servicepoint/servicepoint-binding-c). There are also bindings for other languages, [including C](https://git.berlin.ccc.de/servicepoint/servicepoint-binding-c).
Some weeks ago, the only user of those C bindings I know said they'll stop using it, with a big grin on their face. A few weeks ago, the only user of those C bindings I know informed me, with a big grin on their face, that they'd stop using the library and instead wanted to write everything by hand.
While I know from experience that writing such a library is great fun (and thus does not need another reason), I immediately wanted to know why. While I know from experience that writing such a library is great fun (and thus does not need another reason), I was intrigued and wanted to know why.
The main reason they cited was binary size, and while there's probably something wrong with your computer if you do not have 1MB to spare, I agreed that it was too big for what it does. The main reason they cited was binary size, and while there's probably something wrong with your computer if you do not have 1 MB to spare, I agreed that it was too big for what it does and that I would investigate.
It was an evil nerd-snipe and I could not think about anything else in my spare time for a whole week. They knew what they were doing and it worked, I was immediately nerd-sniped and could not think about anything else in my spare time for a whole week.
I _had_ to find out why it was so big, and there would _have_ to be a way to fix it. I _had_ to find out why it was so big, and there would _have_ to be a way to fix it.
This is part one, where I optimize the core library for size. This is part one, where I optimize the core library for size for fun and experience.
The order in which I tried all the options is changed for a better text structure, but the results are re-created in the order they appear using the stated tools. The order in which I tried all the options is changed for a better text structure, but the results are re-created in the order they appear using the stated tools.
In a future post, I also want to document how I got the C bindings smaller, as those use all features by default. In a future post, I also want to document how I got the C bindings smaller, as those use all features by default and cannot be reasoned about as much by the Rust compiler.
Most of the techniques I used are descibed in [Minimizing Rust Binary Size](https://github.com/johnthagen/min-sized-rust), though I hope the specific example I provide makes the topic interesting to readers not writing Rust code. Most of the techniques I used are descibed in [Minimizing Rust Binary Size](https://github.com/johnthagen/min-sized-rust), though I hope the specific example I provide makes the topic interesting to readers not writing Rust code.
@ -100,7 +100,7 @@ The resulting size was 1.1 MB, which should be easy enough to beat.
### Compiler options ### Compiler options
The first thing that came to mind was `-Os`, so compiling for binary size. The Rust equivalent is `opt-level = "s"`, or `z` to also disable loop vectorization. The first thing that came to mind was telling the compiler to optimize for size, like with `gcc -Os`. The Rust equivalent is `opt-level = "s"`, and for even more optimization, `z` also disables loop vectorization.
| Option | size in isolation (change) | size cumulative (change) | | Option | size in isolation (change) | size cumulative (change) |
| - | - | - | | - | - | - |
@ -113,10 +113,13 @@ The first thing that came to mind was `-Os`, so compiling for binary size. The R
| strip = true | 915.944 | 580.056 | | strip = true | 915.944 | 580.056 |
| switching back to opt-level = 'z' | | 555.480 | | switching back to opt-level = 'z' | | 555.480 |
So it turns out, if you want to halve your binary size, a few flags are enough in stable Rust. Also, the combinations of those settings do not work linearly, and sometimes what resulted in a smaller binary before now increased the size. So it turns out, if you want to halve your binary size, a few flags are enough in stable Rust.
The only compromise apart from compilation time is the change in panic behavior, as this means no stack traces on crash[^panic-abort]. The most significant impacts came from link time optimization (LTO) and stripping of symbols from the binary.
Interestingly, differnet combinations of these settings didn't scale the way I would have intuitively thought.
To only compile like this in specific szenarios, you can add a new profile to a crates `Cargo.toml` like this: The only compromise apart from compilation time with these settings is the change in panic behavior, as this means no stack traces on crash[^panic-abort].
To only compile like this in specific scenarios, you can add a new profile to a crates `Cargo.toml` like this:
```toml ```toml
[profile.size-optimized] [profile.size-optimized]
@ -154,10 +157,12 @@ Line two means by default, cargo will enable LZMA compression, sending via UDP s
Each of those features pulls in an optional dependency (which is why I made those features toggleable in the first place). Each of those features pulls in an optional dependency (which is why I made those features toggleable in the first place).
In the code, CP-437 and compression are not needed[^2], but UDP is obviously used. In the code, CP-437 and compression are not needed[^2], but UDP is obviously used.
Features can be toggled on the command line, which means the invocation can be changed to the following: `cargo build --example announce --profile=size-optimized --no-default-features --features=protocol_udp`[^3]. Features can be toggled on the command line, which means the invocation can be changed to the following: `cargo build --example announce --profile=size-optimized --no-default-features --features=protocol_udp`[^3].
Doing that means less library code and less dependencies are pulled into the compilation process.
The result is a 555.480 Byte binary, which is exactly the same as without those flags. The result is a 555.480 Byte binary, which is exactly the same as without those flags.
This is not really surprising, as we enabled a bunch of compiler options that help remove whole sections of code that are not needed, especially link time optimization. This is not really surprising, as we enabled a bunch of compiler options that help remove whole sections of code that are not needed, especially link time optimization.
It is cool to see that the binary is identical, though.
In the rest of this post, I will omit those parameters, probably to the detriment of compilation time. In the rest of this post, I will omit those parameters, probably to the detriment of compilation time.
@ -167,7 +172,9 @@ While this was a big improvement already, this was still 50 times the size of th
_If it was this easy halving it, can I do that a second time?_ _If it was this easy halving it, can I do that a second time?_
Everything from here on required unstable features of the rust [flake for RedoxOS-development](https://gitlab.redox-os.org/redox-os/redox/-/blob/cb34b9bd862f46729c0082c37a41782a3b1319c3/flake.nix#L38). The version I ended up with was `rustc 1.88.0-nightly (5e17a2a91 2025-04-05)`. Everything from here on required unstable features of the rust toolchain, both because tooling depends on it for more information about the program, and because the compiler options from here on are (and maybe never will be) stabilized.
The version I ended up with was `rustc 1.88.0-nightly (5e17a2a91 2025-04-05)`.
In my environment, I had to call nightly cargo with `rustup run nightly cargo`, but that part is not included in the rest of the commands. In my environment, I had to call nightly cargo with `rustup run nightly cargo`, but that part is not included in the rest of the commands.
The executables I got with the unstable version were already a bit smaller again (546.528 bytes). The executables I got with the unstable version were already a bit smaller again (546.528 bytes).
@ -202,11 +209,13 @@ File .text Size Crate Name
17.5% 100.0% 384.2KiB .text section size, the file size is 2.1MiB 17.5% 100.0% 384.2KiB .text section size, the file size is 2.1MiB
``` ```
Starting with the largest, the biggest functions in the program are shown.
From the table, we can already see some interesting stuff. From the table, we can already see some interesting stuff.
1. For some reason, the `.text` section (the machine code) is only a small part of the executable, and the total size increased by a factor of 4. 1. For some reason, the `.text` section (the machine code) is only a small part of the executable, and the total size increased by a factor of 4.
2. The biggest function and a bunch of other big ones are from `clap_builder`, a crate that is part of the command line argument parser. 2. The biggest function and a bunch of other big ones are from `clap_builder`, a crate that is part of the command line argument parser.
3. `std` thakes up most of the rest. 3. `std` thakes up most of the rest.
4. 4. `main` is unexpectedly huge?
5. `servicepoint` does not even show up in the top list. 5. `servicepoint` does not even show up in the top list.
Let's cover those points in order. Let's cover those points in order.
@ -248,7 +257,6 @@ That means, I can ignore that problem and focus on the `.text` size.
### 2. Removing clap ### 2. Removing clap
While clap is super handy, it looks like the code needed to parse two simple arguments blows up the executable. While clap is super handy, it looks like the code needed to parse two simple arguments blows up the executable.
That's probably a mix of complex parsing logic with error handling, constant strings and data-dependent code paths the compiler cannot detect as not being used.
As the C program I was comparing against had all the parameters hard-coded, I just ripped out the dependency and hard-coded the values I needed. As the C program I was comparing against had all the parameters hard-coded, I just ripped out the dependency and hard-coded the values I needed.
The result is the first version of `tiny_announce`, as I did not want to change the existing example. The result is the first version of `tiny_announce`, as I did not want to change the existing example.
@ -277,8 +285,8 @@ fn main() {
The command to compile changed slightly because of the new name. `cargo build --example tiny_announce --profile=size-optimized && ll -B target/size-optimized/examples/tiny_announce` gave me the new binary size. The command to compile changed slightly because of the new name. `cargo build --example tiny_announce --profile=size-optimized && ll -B target/size-optimized/examples/tiny_announce` gave me the new binary size.
Drumroll... 324.624 Bytes! Drumroll... 324.624 Bytes!
40% of the binary was argument parsing. With argument parsing removed, we saved 40% of the remaining binary size.
This also makes the main disappear from the top sized functions. This also makes the main disappear from the top sized functions for now.
While removing a library you do not really need is also available in stable Rust, I was only able to notice that with tooling only available on nightly, so I am putting it into that category. While removing a library you do not really need is also available in stable Rust, I was only able to notice that with tooling only available on nightly, so I am putting it into that category.
@ -289,14 +297,14 @@ Most of that looked like stack unwinding and debug data parsing, which is odd as
As it turns out, as an optimization for the development workflow, by default cargo does not recompile the standard library. As it turns out, as an optimization for the development workflow, by default cargo does not recompile the standard library.
Instead, a prebuilt version included in the toolchain is used. Instead, a prebuilt version included in the toolchain is used.
This is possible, because the compiler knows `stdlib` is compiled with the exact same version as the user's program is, otherwise the missing ABI stability in Rust comes into play. The compiler arguments for that are fixed, and to change that and get more control over how `stdlib` is compiled and linked, we neeed the unstable option `-Zbuild-std` and have to list which sub-crates we want to build (which is pretty much all of them).
The compiler arguments for that are fixed, and to change that we neeed the unstable option `-Zbuild-std` and have to list which sub-crates we want to build (which is pretty much all of them).
Because we also have `panic = "abort"` set, we need to also pass in `-Zbuild-std-features="panic_immediate_abort"` so there is no compilation error. Because we also have `panic = "abort"` set, we need to also pass in `-Zbuild-std-features="panic_immediate_abort"` so there is no compilation error.
`cargo build --example tiny_announce --profile=size-optimized -Zbuild-std="core,std,alloc,proc_macro,panic_abort" -Zbuild-std-features="panic_immediate_abort"` `cargo build --example tiny_announce --profile=size-optimized -Zbuild-std="core,std,alloc,proc_macro,panic_abort" -Zbuild-std-features="panic_immediate_abort"`
This produces a binary that is now only 30.992 bytes! This produces a binary that is now only 30.992 bytes!
### to_socket_addrs ### In-between find: to_socket_addrs
The remaining top 3 functions were: The remaining top 3 functions were:
@ -307,7 +315,7 @@ The remaining top 3 functions were:
2.9% 7.3% 1.4KiB [Unknown] main 2.9% 7.3% 1.4KiB [Unknown] main
``` ```
Finally our code shows up again! But what is that? 4.4% used by `to_socket_addrs`? Main shows up again! But what is that? 4.4% used by `to_socket_addrs`?
We found the last string parsing code, this time in the standard library, to read the IP and Port from a string. We found the last string parsing code, this time in the standard library, to read the IP and Port from a string.
After changing it in the example, it still showed up which brings me to the first and only time I actually changed the `servicepoint` library as a result from this saga. After changing it in the example, it still showed up which brings me to the first and only time I actually changed the `servicepoint` library as a result from this saga.
@ -320,7 +328,7 @@ After changing it in the example, it still showed up which brings me to the firs
This also seemed to remove other functions as well, as the size was down to 17.272 bytes, nearly halving the size _again_. This also seemed to remove other functions as well, as the size was down to 17.272 bytes, nearly halving the size _again_.
It is now smaller than this article as plain text markdown. It is now smaller than this article as plain text markdown.
### no_main ### 4. no_main
You'd think that now `main` is the top function, but `Iter::next` is now the biggest function for some reason. You'd think that now `main` is the top function, but `Iter::next` is now the biggest function for some reason.
Still, `[Unknown] main` and the actual main take up 10% of the remaining size according to `cargo bloat`. Still, `[Unknown] main` and the actual main take up 10% of the remaining size according to `cargo bloat`.
@ -382,19 +390,21 @@ cargo build \
-Zbuild-std-features="panic_immediate_abort" -Zbuild-std-features="panic_immediate_abort"
``` ```
All of this reduces the final binary size to 7.696 bytes. All of this reduces the final binary size to 8.064 bytes.
## Conclusion ## Conclusion
Through this journey, I've managed to reduce the size of the example Rust binary using the servicepoint library from an unwieldy 1.1 MB to an impressive 7.7 KB, using all the options and tools I was able to find for that, without removing the standard library. Through this journey, I've managed to reduce the size of the binary from an unwieldy 1.1 MB to an impressive 8 KB, without sacrificing all of the standard library.
For me the most unexpected was the size of `clap` code, though I learned dozens of things at every step about the intricacies of how `cargo build` produces your final binary. For me the most unexpected was the size of `clap` code, though I learned dozens of things at every step about the intricacies of how `cargo build` produces your final binary.
There is no single option that in itself is the solution, it's all about the mix. There is no single option that in itself is the solution, its a matter of experimenting with a combination of compiler flags, feature toggles, and code optimizations.
While extreme options can be great if you want to squeeze out the last bytes, it's probably not worth using those in a "normal" computer scenario. While extreme options can be great if you want to squeeze out the last bytes, it's probably not worth using those in a "normal" computer scenario.
The key takeaway is that optimizing binary size in Rust, while not always straightforward, is achievable with the right techniques. The key takeaway is that optimizing binary size in Rust, while not always straightforward, is achievable with the right techniques.
It is certainly easier to create a big binary than in C, calling Rust bloated is blaming the language a bit too much. It is certainly easier to create a big binary than in C, calling Rust bloated is blaming the language a bit too much.
Stay tuned for part two, in which I will try to do something similar with a C version of the example, using the C bindungs of the `servicepoint` crate.
[^1]: Yes, I know UDP does not have connections. Internally, this just opens a UDP socket [^1]: Yes, I know UDP does not have connections. Internally, this just opens a UDP socket
[^panic-abort]: Technically, you can catch a panic while unwinding and there may even be a weird performance argument for doing that, see <!-- TODO find article about making serde faster with panic catching --> [^panic-abort]: Technically, you can catch a panic while unwinding and there may even be a weird performance argument for doing that, see <!-- TODO find article about making serde faster with panic catching -->
[^2]: Some commands can be compressed, but the text ones (both CP-437 and UTF-8) cannot. Clear is a _very_ simple command that does not have any payload, so no compression there either. If a `BitmapCommand` was used instead, using `into()` on a `Bitmap` would have hidden the fact that the default compression is used in that case. The default compression in turn is either LZMA or no compression, depending on whether the LZMA feature is enabled. [^2]: Some commands can be compressed, but the text ones (both CP-437 and UTF-8) cannot. Clear is a _very_ simple command that does not have any payload, so no compression there either. If a `BitmapCommand` was used instead, using `into()` on a `Bitmap` would have hidden the fact that the default compression is used in that case. The default compression in turn is either LZMA or no compression, depending on whether the LZMA feature is enabled.

View file

@ -102,6 +102,8 @@ This is useful, for example, if the target environment you're developing for can
For me, the trade-offs are worth it, as they provide greater transparency and control over the flake configuration. For me, the trade-offs are worth it, as they provide greater transparency and control over the flake configuration.
That being said, I fully acknowledge that `flake-utils` can still be a great choice for many people. It simplifies things and reduces the need to write boilerplate code, which can be a big plus depending on your needs and workflow. Ultimately, it's a matter of personal preference and the specific requirements of your project. That being said, I fully acknowledge that `flake-utils` can still be a great choice for many people.
It simplifies things and reduces the need to write boilerplate code, which can be a plus depending on your needs and workflow.
Ultimately, it's also matter of personal preference.
[^1]: If you check the history, you will see I am not mentioned. I am still a bit salty about that, as it was my first contribution to a bigger OSS project. [^1]: If you check the history, you will see I am not mentioned. I am still a bit salty about that, as it was my first contribution to a bigger OSS project.