Enable these options for all platforms that use GCC.
They do not seem to significantly affect build times on our codebase,
while resulting in a slight 2-3% performance increase on low-end devices,
such as RG-99.
Another option that improves performance. From GCC documentation:
> Stream extra information needed for aggressive devirtualization when running the link-time optimizer in local transformation mode. This option enables more devirtualization but significantly increases the size of streamed data. For this reason it is disabled by default.
From GCC documentation:
> Perform interprocedural pointer analysis and interprocedural modification and reference analysis.
> This option can cause excessive memory and compile-time usage on large compilation units.
> It is not enabled by default at any optimization level.
Also forces the CMake generation to `make` because `ninja` gets into an
infinite loop for some reason on my laptop.
A PGO'd binary can be built as follows:
1. Build with `-DDEVILUTIONX_PROFILE_GENERATE=ON`.
3. Run the timedemo.
4. Build with `-DDEVILUTIONX_PROFILE_USE=ON`.
By default, the profile directory is at `${HOME}/devilutionx-profile`
Example for the RG99:
```bash
# Build the OPK for profiling data collection:
TOOLCHAIN=/opt/rs90-toolchain Packaging/OpenDingux/build.sh rg99 --profile-generate
# Copy the OPK to RG99:
scp -O build-rg99/devilutionx-rg99.opk rg99:/media/sdcard/apps
# Now, run the OPK. It will run the timedemo instead of the actual game and will take a couple of hours.
# ☕☕☕
# Copy the profiling data from RG99
scp -r -O rg99:/media/data/local/home/devilutionx-profile /tmp/devilutionx-profile
# Build the OPK use the collected profiling data:
TOOLCHAIN=/opt/rs90-toolchain Packaging/OpenDingux/build.sh rg99 --profile-use --profile-dir /tmp/devilutionx-profile
# Copy the resulting binary back to RG99
scp -O build-rg99/devilutionx-rg99.opk rg99:/media/sdcard/apps
```
`-O2` increases the binary size by 0.5 MiB which we can now afford.
Increasing buffer size to `768` improves performance (seems to be the
sweet spot).
15-20 FPS in dungeon on the max-mem OD fork.