Gets rid of `orig_palette`, we now always have only 2 palettes:
1. `logical_palette`
This palette has color cycling / swapping applied but no global
effects such as brightness / fade-in.
2. `system_palette`
This palette is the actual palette used for rendering.
It is usually `logical_palette` with the global brightness setting
and fade-in/out applied.
Additionally, we now keep the k-d tree around and use it to
update single colors.
The colors that are color-cycled / swapped are never included
in the k-d tree, so the tree does not need updating on color
cycles/swaps.
Our values are `uint8_t`, so we can get the median somewhat faster
than `nth_element`.
```
Comparing build-reld-master/palette_blending_benchmark to build-reld/palette_blending_benchmark
Benchmark Time CPU Time Old Time New CPU Old CPU New
-----------------------------------------------------------------------------------------------------------------------------------
BM_GenerateBlendedLookupTable_pvalue 0.0002 0.0002 U Test, Repetitions: 10 vs 10
BM_GenerateBlendedLookupTable_mean -0.0198 -0.0198 2272848 2227790 2270291 2225348
BM_GenerateBlendedLookupTable_median -0.0199 -0.0199 2272884 2227649 2270323 2225212
BM_GenerateBlendedLookupTable_stddev +0.4575 +0.6710 536 781 396 661
BM_GenerateBlendedLookupTable_cv +0.4870 +0.7047 0 0 0 0
OVERALL_GEOMEAN -0.0198 -0.0198 0 0 0 0
```
1. Achieves near-perfect balancing on non-pathological data
by using a per-node pivot and calculating.
2. Increases the depth to 5 levels, which seems to be the
sweet spot.
New benchmark vs baseline:
```
Comparing build-reld-palette-cleanup/palette_blending_benchmark to build-reld/palette_blending_benchmark
Benchmark Time CPU Time Old Time New CPU Old CPU New
-----------------------------------------------------------------------------------------------------------------------------------
BM_GenerateBlendedLookupTable_pvalue 0.0002 0.0002 U Test, Repetitions: 10 vs 10
BM_GenerateBlendedLookupTable_mean -0.8768 -0.8767 18432956 2270752 18417846 2270141
BM_GenerateBlendedLookupTable_median -0.8767 -0.8767 18421978 2270802 18417838 2270167
BM_GenerateBlendedLookupTable_stddev -0.9860 -0.8051 33641 473 1222 238
BM_GenerateBlendedLookupTable_cv -0.8860 +0.5815 0 0 0 0
OVERALL_GEOMEAN -0.8768 -0.8767 0 0 0 0
```
Uses a k-d tree to quickly find the best match
for a color when generating the palette blending
lookup table.
https://en.wikipedia.org/wiki/K-d_tree
This is 3x faster than the previous naive approach:
```
Benchmark Time CPU Time Old Time New CPU Old CPU New
-----------------------------------------------------------------------------------------------------------------------------------
BM_GenerateBlendedLookupTable_pvalue 0.0002 0.0002 U Test, Repetitions: 10 vs 10
BM_GenerateBlendedLookupTable_mean -0.7153 -0.7153 18402641 5239051 18399111 5238025
BM_GenerateBlendedLookupTable_median -0.7153 -0.7153 18403261 5239042 18398841 5237497
BM_GenerateBlendedLookupTable_stddev -0.2775 +0.3858 2257 1631 1347 1867
BM_GenerateBlendedLookupTable_cv +1.5379 +3.8677 0 0 0 0
OVERALL_GEOMEAN -0.7153 -0.7153 0 0 0 0
```
The distribution is somewhat poor with just 3 levels, so this can be improved further.
For example, here is the leaf size distribution in the cathedral:
```
r0.g0.b0: 88
r0.g0.b1: 10
r0.g1.b0: 2
r0.g1.b1: 32
r1.g0.b0: 27
r1.g0.b1: 4
r1.g1.b0: 12
r1.g1.b1: 81
```
The previous implementation didn't behave quite like A-* is supposed to.
After trying to figure out what's causing it and giving up,
I've reimplemented it in a straightforward manner.
Now it seems to work a lot better.
Also increases maximum player path length to 100 steps.
We still only store the first 25 steps in the save file for vanilla
compatibility.
In C++, globals initialization order accross translation units is not
defined. Accessing a global via a function ensures that it is initialized.
This will be needed for #7638, which will statically initialize change
handlers after the Options object has been initialized.
1. Moves more assets-related stuff from `init` to `engine/assets`.
2. Removes `SDL_audiolib` dependency from `soundsample.h`.
3. Cleans up some unused/missing includes.
Our implementation has a more modern interface and only
supports the features that we care about.
It always outputs `\n` as newlines and does not output BOM.
The modern interface eliminates awkward `c_str()/data()` conversions.
This implementation preserves comments and the file order of sections
and keys. New keys are written in insertion order.
We now also support modifying and adding default comments,
which may be a useful thing to do for the especially tricky
ini options (this PR doesn't add any but adds the ability to do so).
Sadly, this increases the RG99 binary size by 24 KiB.
I'm guessing this is because the map implementation generates
quite a bit of code.
Note that while it might seem that using `std::string` for every key and
value would do a lot of allocations, most of these strings are
small and thus benefit from Small String Optimization (= no allocations).