As we recently confirmed, Square and Left/RightTriangle primitives
never use masks other than Transparent and Solid.
Simplify the code to take advantage of that.
We notice that masks can be described by 2 parameters:
1. Whether they have 0 or 1 as their high bits.
2. Whether they shift to the left or to the right on the next line.
Describing masks this way allows us to lift them to template variables and simplify the code.
We also avoid handling the mask in the `RenderLine` loop entirely.
Also fixes a foliage rendering bug: Transparent foliage pixels were previously blended but they should have been simply skipped.
Turns RenderLine line branches into template parameters, allowing the
compiler to eliminate the branches and also fully inline it.
Example FPS change
* In dungeon: 1450 -> 1530
* In town: 1655 -> 1700
Also splits RenderLine into 3 functions
Easier to read and also gives more useful profiling.
Apparently the most time is spent in `RenderLineOpaque`.
Also, mark them `inline` because that apparently hints GCC to inline
the function (in a later refactoring we can introduce attribute
always_inline instead where supported).
This is part of the work to allow us to eliminate buffer padding.
As this is a hotspot, we have 4 separate functions for each non-square
primitive, resulting in quite a bit of code:
1. Unclipped ("Full")
2. Vertical-only clip
3. Vertical + Left clip
4. Vertical + Right clip
FPS at 640x480: 1420 -> 1530
Instead of passing the CEL sprite width when drawing, store the CEL
width at load time in the new `CelSprite` struct.
Implemented for most sprites except towners, missiles, or monsters.
`CelOutputBuffer` now contains an `SDL_Surface` and an `SDL_Rect`.
We now have access to SDL surface manipulation functions.
`gpBuffer` and `gpBufEnd` are completely gone 🧹
This results in some FPS loss (250 -> 195) recovered in a subsequent
commit.