1. Unifies the underlying CLX and dun_render blitters.
2. Optimizes them by unrolling loops and using pointer comparison rather
than length comparison (saves a length decrement).
3. In `dun_render`, extracts `RenderLineTransparent/Opaque` branches into
functions via explicit template specialization.
Example RG-99 FPS (non-PGO'd): 17.4->18.4
The format is almost identical to CL2, except it uses the frame header
to store frame width and height instead of 5 32-line offsets.
This means we always have access to frame dimensions, so we can use it
as an on-disk format for our graphics as well.
Additionally, we may be able to optimize the rendering even more
in the future now that we have guaranteed knowledge of frame dimensions.