Triangles in the dungeon CEL data have two redundant 0x00 pixels every other row.
Re-encodes the dungeon CEL data to remove these pixels in order to save RAM and simplify the rendering code.
Example RAM savings:
```
VERBOSE: Re-encoding dungeon CELs: 1,119 frames, 738,836 bytes
VERBOSE: Re-encoded dungeon CELs: 1,119 frames, 722,552 bytes
```
Performance remains the same. The rendering code is now a bit simpler.
4 options args are a bit unwieldy, especially when you want
to pass only the first and the last one.
With a struct, there is no need to specify the default values
for the args in between.
1. Unifies the underlying CLX and dun_render blitters.
2. Optimizes them by unrolling loops and using pointer comparison rather
than length comparison (saves a length decrement).
3. In `dun_render`, extracts `RenderLineTransparent/Opaque` branches into
functions via explicit template specialization.
Example RG-99 FPS (non-PGO'd): 17.4->18.4
As we recently confirmed, Square and Left/RightTriangle primitives
never use masks other than Transparent and Solid.
Simplify the code to take advantage of that.
We notice that masks can be described by 2 parameters:
1. Whether they have 0 or 1 as their high bits.
2. Whether they shift to the left or to the right on the next line.
Describing masks this way allows us to lift them to template variables and simplify the code.
We also avoid handling the mask in the `RenderLine` loop entirely.
Also fixes a foliage rendering bug: Transparent foliage pixels were previously blended but they should have been simply skipped.
Turns RenderLine line branches into template parameters, allowing the
compiler to eliminate the branches and also fully inline it.
Example FPS change
* In dungeon: 1450 -> 1530
* In town: 1655 -> 1700
Also splits RenderLine into 3 functions
Easier to read and also gives more useful profiling.
Apparently the most time is spent in `RenderLineOpaque`.
Also, mark them `inline` because that apparently hints GCC to inline
the function (in a later refactoring we can introduce attribute
always_inline instead where supported).