Turns RenderLine line branches into template parameters, allowing the
compiler to eliminate the branches and also fully inline it.
Example FPS change
* In dungeon: 1450 -> 1530
* In town: 1655 -> 1700
Also splits RenderLine into 3 functions
Easier to read and also gives more useful profiling.
Apparently the most time is spent in `RenderLineOpaque`.
Also, mark them `inline` because that apparently hints GCC to inline
the function (in a later refactoring we can introduce attribute
always_inline instead where supported).
This is part of the work to allow us to eliminate buffer padding.
As this is a hotspot, we have 4 separate functions for each non-square
primitive, resulting in quite a bit of code:
1. Unclipped ("Full")
2. Vertical-only clip
3. Vertical + Left clip
4. Vertical + Right clip
FPS at 640x480: 1420 -> 1530
Instead of passing the CEL sprite width when drawing, store the CEL
width at load time in the new `CelSprite` struct.
Implemented for most sprites except towners, missiles, or monsters.
`CelOutputBuffer` now contains an `SDL_Surface` and an `SDL_Rect`.
We now have access to SDL surface manipulation functions.
`gpBuffer` and `gpBufEnd` are completely gone 🧹
This results in some FPS loss (250 -> 195) recovered in a subsequent
commit.
* Improvements to the `RenderLine()` function
- Simplify by using indices instead of incrementing pointers
- Improve performance in the case where mask != -1 by only processing the bits that are set
- Add file documentation to about 1/4 of the files in Source
- Copy over a lot of the documentation from the sanctuary/notes repo
- Standardise all the existing documentation
- Create a configuration for Doxygen
- Add more documentation (engine.cpp is now fully documented)