As we recently confirmed, Square and Left/RightTriangle primitives
never use masks other than Transparent and Solid.
Simplify the code to take advantage of that.
We notice that masks can be described by 2 parameters:
1. Whether they have 0 or 1 as their high bits.
2. Whether they shift to the left or to the right on the next line.
Describing masks this way allows us to lift them to template variables and simplify the code.
We also avoid handling the mask in the `RenderLine` loop entirely.
Also fixes a foliage rendering bug: Transparent foliage pixels were previously blended but they should have been simply skipped.
A popup-like error dialog in selhero resulted in a heap-use-after-free:
https://gist.github.com/glebm/f014bd87f066d2b79965b7c48bd8f6d7
This is because the popup's `Deinit()` freed the background art.
The fix is simply to not free the background art.
This is OK because the popup never has a background.
It used to load an empty background in the past just to load the
palette but luckily it no longer does (otherwise this would require more
work).
Also, fixes dialog rendering:
1. Fixes what is rendered behind the dialog.
2. Draws the mouse (if possible) regardless of whether the background is
present.
3. Clears the screen if the background doesn't cover it completely.
Fixes#4195
1. Fixes the return value (bytes rendered).
2. Fixes line wrapping / end-of-rendering based on the given rectangle:
1. Accounts for `BaseLineOffset`.
2. Fixes an off-by-one error for the y coordinate.
3. Wraps the cursor when needed.
3. Fix chat input box dimensions (height is 3 * line height).
4. Set the hint that indicates that we do not render the current
IME suggestion (SDL_TEXTEDITING). This indicates to IME
that it should render the suggestion instead.
1. Unifies SDL1 and SDL2 text input handling.
2. Moves game-specific text input handling out of misc_msg.
3. Disables Unicode processing when not inputting text in SDL1.
This fixes gold withdrawal in SDL1.
Example:
```
$ tools/measure_timedemo_performance.py -n 32 --binary build-reld-sdl1-nonet-nosound-unpk/devilutionx
Run 1 of 32: 6.02 seconds 170.8 FPS
Run 2 of 32: 6.22 seconds 165.5 FPS
Run 3 of 32: 6.01 seconds 171.1 FPS
Run 4 of 32: 6.03 seconds 170.6 FPS
Run 5 of 32: 6.05 seconds 170.2 FPS
Run 6 of 32: 6.04 seconds 170.3 FPS
Run 7 of 32: 6.03 seconds 170.5 FPS
Run 8 of 32: 6.03 seconds 170.5 FPS
Run 9 of 32: 6.01 seconds 171.1 FPS
Run 10 of 32: 6.04 seconds 170.3 FPS
Run 11 of 32: 6.03 seconds 170.7 FPS
Run 12 of 32: 6.03 seconds 170.7 FPS
Run 13 of 32: 6.04 seconds 170.3 FPS
Run 14 of 32: 6.03 seconds 170.6 FPS
Run 15 of 32: 6.04 seconds 170.3 FPS
Run 16 of 32: 6.04 seconds 170.4 FPS
Run 17 of 32: 6.03 seconds 170.6 FPS
Run 18 of 32: 6.03 seconds 170.5 FPS
Run 19 of 32: 6.06 seconds 169.9 FPS
Run 20 of 32: 6.04 seconds 170.3 FPS
Run 21 of 32: 6.03 seconds 170.7 FPS
Run 22 of 32: 6.02 seconds 171.0 FPS
Run 23 of 32: 6.02 seconds 170.8 FPS
Run 24 of 32: 6.02 seconds 170.8 FPS
Run 25 of 32: 6.04 seconds 170.3 FPS
Run 26 of 32: 6.03 seconds 170.8 FPS
Run 27 of 32: 6.04 seconds 170.4 FPS
Run 28 of 32: 6.07 seconds 169.4 FPS
Run 29 of 32: 6.03 seconds 170.7 FPS
Run 30 of 32: 5.99 seconds 171.7 FPS
Run 31 of 32: 6.01 seconds 171.1 FPS
Run 32 of 32: 6.06 seconds 169.9 FPS
6.038 ± 0.037 seconds, 170.400 ± 0.988 FPS
```
`-O2` increases the binary size by 0.5 MiB which we can now afford.
Increasing buffer size to `768` improves performance (seems to be the
sweet spot).
15-20 FPS in dungeon on the max-mem OD fork.
When using `UNPACKED_MPQS`, avoid all the SDL machinery for reading
files.
This is beneficial not only due to reduced indirection but also because
we can test for the file's existence and get the file size without
opening it, which is much faster.