This gives a very slight FPS boost.
1140 to 1143 FPS on my machine as measured by:
```bash
tools/linux_reduced_cpu_variance_run.sh tools/measure_timedemo_performance.py -n 5 --binary build-rel/devilutionx
```
@Trihedraf reported that the game crashed when killing Diablo.
1536 should be enough for Diablo and any custom sprites (if not, we can increase it further).
The new algorithm is a lot less code, slightly faster, and results
in a smaller binary (-40 KiB on rg99).
The previous algorithm filled all the pixels around every solid pixel.
The new algorithm only fills pixels that will be visible.
We first collect the outline pixels into an array (which may contain a
small amount of duplicates). Then, we render the entire array in a
single loop. This turns out to be slightly faster than rendering inline,
at the cost of ~4 KiB of stack (basically free).
To collect the pixels, we go through the CLX sprite, keeping track
of the solid runs in the current row, and the filled pixels on the line
above and the line below.
To be able to quickly test the pixels above and below, we introduce a
new data structure, `StaticBitVector`. It is similar to a bitset,
except the size is determined at runtime (capacity is fixed),
and it supports quick updates of entire subspans.
Inlines blit command parsing.
We previously had blit commands because we supported rendering multiple
formats (CEL, CL2, CLX) but now we only ever render CLX, so this is
no longer necessary.
Original Blizzard encoder is slightly less optimal than our encoder.
While size in RAM in less of a concern for the non-`UNPACKED_MPQS`
build, smaller files are faster to render.
Savings for unpacked and minified MPQs:
* diabdat.mpq: 918,311 bytes.
* hellfire.mpq: 313,882 bytes.
Example player graphics (note that only a few are loaded at any given time for single player):
* diabdat/plrgfx/warrior/: 366,564 bytes.
Example monster graphics savings:
* diabdat/monsters/skelbow: 5,391 bytes.
Based on the implementation from https://github.com/diasurgical/devilutionx-graphics-tools/pull/6
The format is almost identical to CL2, except it uses the frame header
to store frame width and height instead of 5 32-line offsets.
This means we always have access to frame dimensions, so we can use it
as an on-disk format for our graphics as well.
Additionally, we may be able to optimize the rendering even more
in the future now that we have guaranteed knowledge of frame dimensions.
Convert CEL files to CL2 at load time. CL2 format is more efficient and is about as fast to render.
CEL vs CL2 sizes, on dLvl 5: https://gist.github.com/glebm/9bbdd76962abcd4fd2405ecd3379af97
Memory:
* Peak memory (while loading): -300 KiB
* Memory in-game (dLvl5): -700 KiB
* RG99 binary size: -15 KiB (1333096 -> 1317192)
Performance on rg99:
* On average, -1 FPS in town.
* Same FPS in dungeon (20 FPS on dLvl 1).
This OOB happened when rendering a sprite so that it is exactly
off-screen (touching the border but not visible) on top/bottom
while also being only partly off-screen on the left or right.
Makes `CelSprite` unowned and adds a new `OwnedCelSprite` class for
owned sprites.
This clarifies ownership and makes the code cleaner in a number of
places.
Additionally, because the `CelSprite` class is now tiny (1 less
pointer), we can pass it by-value instead of by-reference, removing a
pointer indirection in the rendering functions.
1. Move `SetPixel` definition to the header to make it easier for the
compiler to inline (make it inlinable even without LTO).
2. Add an `operator[](Point)` overload to `CelOutputBuffer`.
Uses integer math only: This speeds up the rendering and eliminates some
zoom artifacts.
Improves player indicator look -- it's now symmetric and more legible.
Tthis gives us the option to specify what type a file should be loaded
as, avoidng the need to case it and does some automatic checks on the
fitness of the data, while making the process simpler.
If no type is given then the type will be set to std::byte which limit
what operations can be performed on the data.
`std::make_unique<T[]>(size)` always zero-initalizes the array.
C++20 has `std::make_unique_for_overwrite` to avoid that, while
older C++ versions can use the approach applied here.
Instead of passing the CEL sprite width when drawing, store the CEL
width at load time in the new `CelSprite` struct.
Implemented for most sprites except towners, missiles, or monsters.