We know that length is never 0.
Letting the compiler know that allows it to optimize one instruction
away.
Moreover, for Fill runs, we also know that the length is at least 2.
Inlines blit command parsing.
We previously had blit commands because we supported rendering multiple
formats (CEL, CL2, CLX) but now we only ever render CLX, so this is
no longer necessary.
1. Unifies the underlying CLX and dun_render blitters.
2. Optimizes them by unrolling loops and using pointer comparison rather
than length comparison (saves a length decrement).
3. In `dun_render`, extracts `RenderLineTransparent/Opaque` branches into
functions via explicit template specialization.
Example RG-99 FPS (non-PGO'd): 17.4->18.4
The format is almost identical to CL2, except it uses the frame header
to store frame width and height instead of 5 32-line offsets.
This means we always have access to frame dimensions, so we can use it
as an on-disk format for our graphics as well.
Additionally, we may be able to optimize the rendering even more
in the future now that we have guaranteed knowledge of frame dimensions.