The new algorithm is a lot less code, slightly faster, and results
in a smaller binary (-40 KiB on rg99).
The previous algorithm filled all the pixels around every solid pixel.
The new algorithm only fills pixels that will be visible.
We first collect the outline pixels into an array (which may contain a
small amount of duplicates). Then, we render the entire array in a
single loop. This turns out to be slightly faster than rendering inline,
at the cost of ~4 KiB of stack (basically free).
To collect the pixels, we go through the CLX sprite, keeping track
of the solid runs in the current row, and the filled pixels on the line
above and the line below.
To be able to quickly test the pixels above and below, we introduce a
new data structure, `StaticBitVector`. It is similar to a bitset,
except the size is determined at runtime (capacity is fixed),
and it supports quick updates of entire subspans.