add missing deltas
This commit is contained in:
parent
19a412f05b
commit
ad8b48fbc3
|
@ -92,7 +92,7 @@ Let's just run you through the program quickly.
|
|||
At some steps, the program panics with an error message in case something went wrong.
|
||||
|
||||
I started with `rustc 1.82.0 (f6e511eec 2024-10-15)` from nixpkgs `0ff09db9d034a04acd4e8908820ba0b410d7a33a`.
|
||||
For compiling the example, I just used the usual `cargo build --release --example announce` and checked the binary size with `ll -B target/release/examples`.
|
||||
For compiling the example, I just used the usual `cargo build --release --example announce` and checked the binary size with `ll -B target/release/examples`.
|
||||
|
||||
The resulting size was 1.1 MB, which should be easy enough to beat.
|
||||
|
||||
|
@ -100,20 +100,20 @@ The resulting size was 1.1 MB, which should be easy enough to beat.
|
|||
|
||||
### Compiler options
|
||||
|
||||
The first thing that came to mind was telling the compiler to optimize for size, like with `gcc -Os`. The Rust equivalent is `opt-level = "s"`, and for even more optimization, `z` also disables loop vectorization.
|
||||
The first thing that came to mind was telling the compiler to optimize for size, like with `gcc -Os`. The Rust equivalent is `opt-level = "s"`, and for even more optimization, `z` also disables loop vectorization.
|
||||
|
||||
| Option | size in isolation (change) | size cumulative (change) |
|
||||
| - | - | - |
|
||||
| baseline | 1.137.384 | 1.137.384 |
|
||||
| opt-level = 'z' | 1.186.104 | 1.186.104 |
|
||||
| opt-level = 's' | 1.120.416 | 1.120.416 |
|
||||
| lto = true | 914.496 | 808.528 |
|
||||
| codegen-units = 1 | 982.904 | 775.888 |
|
||||
| panic = 'abort' | 979.840 |703.096|
|
||||
| strip = true | 915.944 | 580.056 |
|
||||
| switching back to opt-level = 'z' | | 555.480 |
|
||||
| Option | size in isolation (change) | size cumulative (change) |
|
||||
|-----------------------------------|----------------------------|--------------------------|
|
||||
| baseline | 1.137.384 | 1.137.384 |
|
||||
| opt-level = 'z' | 1.186.104 (+48.720) | 1.186.104 (+48.720) |
|
||||
| opt-level = 's' | 1.120.416 (-16.968) | 1.120.416 (-65.688) |
|
||||
| lto = true | 914.496 (-222.888) | 808.528 (-311.888) |
|
||||
| codegen-units = 1 | 982.904 (-154.480) | 775.888 (-32.640) |
|
||||
| panic = 'abort' | 979.840 (-157.544) | 703.096 (-72.792) |
|
||||
| strip = true | 915.944 (-221.440) | 580.056 (-123.040) |
|
||||
| switching back to opt-level = 'z' | | 555.480 (-24.576) |
|
||||
|
||||
So it turns out, if you want to halve your binary size, a few flags are enough in stable Rust.
|
||||
So it turns out, if you want to halve your binary size, a few flags are enough in stable Rust.
|
||||
The most significant impacts came from link time optimization (LTO) and stripping of symbols from the binary.
|
||||
Interestingly, differnet combinations of these settings didn't scale the way I would have intuitively thought.
|
||||
|
||||
|
@ -157,8 +157,8 @@ Line two means by default, cargo will enable LZMA compression, sending via UDP s
|
|||
Each of those features pulls in an optional dependency (which is why I made those features toggleable in the first place).
|
||||
In the code, CP-437 and compression are not needed[^2], but UDP is obviously used.
|
||||
|
||||
Features can be toggled on the command line, which means the invocation can be changed to the following: `cargo build --example announce --profile=size-optimized --no-default-features --features=protocol_udp`[^3].
|
||||
Doing that means less library code and less dependencies are pulled into the compilation process.
|
||||
Features can be toggled on the command line, which means the invocation can be changed to the following: `cargo build --example announce --profile=size-optimized --no-default-features --features=protocol_udp`[^3].
|
||||
Doing that means less library code and less dependencies are pulled into the compilation process.
|
||||
|
||||
The result is a 555.480 Byte binary, which is exactly the same as without those flags.
|
||||
This is not really surprising, as we enabled a bunch of compiler options that help remove whole sections of code that are not needed, especially link time optimization.
|
||||
|
@ -172,7 +172,7 @@ While this was a big improvement already, this was still 50 times the size of th
|
|||
|
||||
_If it was this easy halving it, can I do that a second time?_
|
||||
|
||||
Everything from here on required unstable features of the rust toolchain, both because tooling depends on it for more information about the program, and because the compiler options from here on are (and maybe never will be) stabilized.
|
||||
Everything from here on required unstable features of the rust toolchain, both because tooling depends on it for more information about the program, and because the compiler options from here on are (and maybe never will be) stabilized.
|
||||
|
||||
The version I ended up with was `rustc 1.88.0-nightly (5e17a2a91 2025-04-05)`.
|
||||
In my environment, I had to call nightly cargo with `rustup run nightly cargo`, but that part is not included in the rest of the commands.
|
||||
|
@ -209,7 +209,7 @@ File .text Size Crate Name
|
|||
17.5% 100.0% 384.2KiB .text section size, the file size is 2.1MiB
|
||||
```
|
||||
|
||||
Starting with the largest, the biggest functions in the program are shown.
|
||||
Starting with the largest, the biggest functions in the program are shown.
|
||||
From the table, we can already see some interesting stuff.
|
||||
|
||||
1. For some reason, the `.text` section (the machine code) is only a small part of the executable, and the total size increased by a factor of 4.
|
||||
|
@ -222,7 +222,7 @@ Let's cover those points in order.
|
|||
|
||||
### 1. Unexpected binary size when building via cargo-bloat
|
||||
|
||||
Using `GNU size`, we can check the size per section in the ELF binary.
|
||||
Using `GNU size`, we can check the size per section in the ELF binary.
|
||||
Using `-G` or `-B` output formats does not work for this, as it will only show the `.text` and `.data` section, which in this case only make up around 500KB.
|
||||
Thus the command I used was ` size -A --common target/size-optimized/examples/announce`, giving the following result:
|
||||
|
||||
|
@ -295,7 +295,7 @@ While removing a library you do not really need is also available in stable Rust
|
|||
Looking at the biggest functions again (now `cargo bloat --example tiny_announce --profile=size-optimized`) showed that all the big functions left were from `std`.
|
||||
Most of that looked like stack unwinding and debug data parsing, which is odd as we added `panic = 'abort'` in the first chapter.
|
||||
|
||||
As it turns out, as an optimization for the development workflow, by default cargo does not recompile the standard library.
|
||||
As it turns out, as an optimization for the development workflow, by default cargo does not recompile the standard library.
|
||||
Instead, a prebuilt version included in the toolchain is used.
|
||||
The compiler arguments for that are fixed, and to change that and get more control over how `stdlib` is compiled and linked, we neeed the unstable option `-Zbuild-std` and have to list which sub-crates we want to build (which is pretty much all of them).
|
||||
Because we also have `panic = "abort"` set, we need to also pass in `-Zbuild-std-features="panic_immediate_abort"` so there is no compilation error.
|
||||
|
@ -330,7 +330,7 @@ It is now smaller than this article as plain text markdown.
|
|||
|
||||
### 4. no_main
|
||||
|
||||
You'd think that now `main` is the top function, but `Iter::next` is now the biggest function for some reason.
|
||||
You'd think that now `main` is the top function, but `Iter::next` is now the biggest function for some reason.
|
||||
Still, `[Unknown] main` and the actual main take up 10% of the remaining size according to `cargo bloat`.
|
||||
|
||||
We surely cannot reduce that, right? Wrong!
|
||||
|
@ -355,7 +355,7 @@ pub extern "C" fn main(_argc: isize, _argv: *const *const u8) -> isize {
|
|||
|
||||
let connection = UdpConnection::open(addr).unwrap();
|
||||
connection.send(ClearCommand).unwrap(); // <--
|
||||
|
||||
|
||||
let grid = CharGrid::from_vec(5, vec!['H', 'e', 'l', 'l', 'o', 'W', 'o', 'r', 'l', 'd']).unwrap();
|
||||
connection.send(CharGridCommand::from(grid)).unwrap();
|
||||
0
|
||||
|
@ -397,7 +397,7 @@ All of this reduces the final binary size to 8.064 bytes.
|
|||
Through this journey, I've managed to reduce the size of the binary from an unwieldy 1.1 MB to an impressive 8 KB, without sacrificing all of the standard library.
|
||||
For me the most unexpected was the size of `clap` code, though I learned dozens of things at every step about the intricacies of how `cargo build` produces your final binary.
|
||||
|
||||
There is no single option that in itself is the solution, it’s a matter of experimenting with a combination of compiler flags, feature toggles, and code optimizations.
|
||||
There is no single option that in itself is the solution, it’s a matter of experimenting with a combination of compiler flags, feature toggles, and code optimizations.
|
||||
While extreme options can be great if you want to squeeze out the last bytes, it's probably not worth using those in a "normal" computer scenario.
|
||||
|
||||
The key takeaway is that optimizing binary size in Rust, while not always straightforward, is achievable with the right techniques.
|
||||
|
@ -408,4 +408,4 @@ Stay tuned for part two, in which I will try to do something similar with a C ve
|
|||
[^1]: Yes, I know UDP does not have connections. Internally, this just opens a UDP socket
|
||||
[^panic-abort]: Technically, you can catch a panic while unwinding and there may even be a weird performance argument for doing that, see <!-- TODO find article about making serde faster with panic catching -->
|
||||
[^2]: Some commands can be compressed, but the text ones (both CP-437 and UTF-8) cannot. Clear is a _very_ simple command that does not have any payload, so no compression there either. If a `BitmapCommand` was used instead, using `into()` on a `Bitmap` would have hidden the fact that the default compression is used in that case. The default compression in turn is either LZMA or no compression, depending on whether the LZMA feature is enabled.
|
||||
[^3]: This works here because `announce` is an example inside of the library itself. As an actual dpendent, you would specify this in your `Cargo.toml`.
|
||||
[^3]: This works here because `announce` is an example inside of the library itself. As an actual dpendent, you would specify this in your `Cargo.toml`.
|
||||
|
|
Loading…
Reference in a new issue