Compiler Flags: `-Ofast` and `-flto=full` #106

Nathan-MV · 2023-10-26T22:17:59Z

Add -pipe to CFLAGS
Replace -O3 with -Ofast
Stop using -flto only for Ruby and use flto=full for everything

Nathan-MV · 2023-10-27T04:46:53Z

I won't touch the Mac's compilation flags anymore, as it's too easy to make the compilation fail; it will be a second-class citizen until #99 is implemented

Ancurio · 2023-10-27T08:00:24Z

What is the intention behind each of these flags?

Nathan-MV · 2023-10-27T11:54:09Z

What is the intention behind each of these flags?

I was going to clean it up after i woke up, i was using some to try to reduce the size of the binary, but they didn't prove effective in mkxp-z.

-pipe is my attempt to reduce the compilation time, as i thought that, after using -flto=full and -Ofast, the compilation time would be longer
https://wiki.gentoo.org/wiki/GCC_optimization#-pipe

Ancurio · 2023-10-27T16:05:18Z

-pipe is my attempt to reduce the compilation time, as i thought that, after using -flto=full and -Ofast, the compilation time would be longer
https://wiki.gentoo.org/wiki/GCC_optimization#-pipe

And, did it reduce the compilation times? If yes, by how much?

Nathan-MV · 2023-10-28T02:21:24Z

And, did it reduce the compilation times? If yes, by how much?

I'll just remove it.

Splendide-Imaginarius · 2023-11-02T09:54:03Z

Hi, thanks for the PR, really appreciate it. It's fine to ignore macOS for the time being, I've done the same elsewhere.

One trivial concern, and one bigger one (but both of them should be resolvable):

These changes don't appear to cover the mkxp-z executable itself (which is handled by the Meson config rather than the Makefile). This will prevent the -Ofast from covering the mkxp-z repo's code, and I think it will prevent the LTO changes from having any effect at all. I also see that CXXFLAGS isn't set, which I believe will prevent these optimizations from having any effect on C++ projects (there aren't a lot of them, but there are some). Let me know if you think I'm mistaken on any of this (I might be!).
And the bigger concern: according to Red Hat (who, in my experience, generally have their shit together on this kind of thing), different applications exhibit faster behavior with different optimization levels, because sometimes the instruction cache effect of the code size actually makes a bigger difference than how fast the code runs. So, there are 4 different optimization levels that might actually be the fastest in practice: -O2, -O3, -Os, and -Ofast. My educated guess is that -Ofast will probably be the fastest for mkxp-z, since it uses SDL and I've seen people saying on the SDL issue tracker that it works great there (and I'd expect a lot of other mkxp-z deps, and the mkxp-z repo itself, to have workloads that resemble SDL). However, I am not comfortable with making changes that might actually make things worse until I see some benchmark results indicating that it yields some kind of improvement (I don't trust my educated guesses that much here). Ideally these benchmark results would at least need to have some resemblance to something a real-world game would do. If it's in a real-world game, I'd be happier if it's a game that I can get for free (legally). If it's a simulated benchmark, providing a customScript that I can run would be fine, as long as it has some kind of explanation for why it resembles something a real game would do.

Let me know if you have any trouble resolving either of the above things, I'm happy to help as needed.

(And sorry it took a while for me to reply.)

Splendide-Imaginarius · 2023-12-15T00:46:17Z

And the bigger concern: according to Red Hat (who, in my experience, generally have their shit together on this kind of thing), different applications exhibit faster behavior with different optimization levels, because sometimes the instruction cache effect of the code size actually makes a bigger difference than how fast the code runs. So, there are 4 different optimization levels that might actually be the fastest in practice: -O2, -O3, -Os, and -Ofast. My educated guess is that -Ofast will probably be the fastest for mkxp-z, since it uses SDL and I've seen people saying on the SDL issue tracker that it works great there (and I'd expect a lot of other mkxp-z deps, and the mkxp-z repo itself, to have workloads that resemble SDL). However, I am not comfortable with making changes that might actually make things worse until I see some benchmark results indicating that it yields some kind of improvement (I don't trust my educated guesses that much here). Ideally these benchmark results would at least need to have some resemblance to something a real-world game would do. If it's in a real-world game, I'd be happier if it's a game that I can get for free (legally). If it's a simulated benchmark, providing a customScript that I can

It looks like threaded surfaces are primarily CPU-bound and are a major bottleneck in some use cases (e.g. mine). I'm planning to test them to see whether tweaking the compiler optimization flags makes any difference there. I'm also planning to test whether GCC vs Clang makes a major difference. (As noted, my prediction is that LTO + -Ofast will be a clear winner; I also suspect, with lower confidence, that Clang will beat GCC.)

Splendide-Imaginarius · 2024-01-14T03:44:44Z

In addition to threaded surfaces (as noted above), we probably also should test #148 with the various optimization flags, since that PR is likely to expose a lot of previously unnoticed CPU bottlenecks.

Nathan-MV changed the title ~~Stop using -flto only for Ruby and add some new flags~~ Compiler Flags Oct 26, 2023

Nathan-MV force-pushed the pr-flags branch 4 times, most recently from a747950 to d0f7778 Compare October 27, 2023 04:45

Nathan-MV force-pushed the pr-flags branch 2 times, most recently from 9b878eb to 3b1d4a2 Compare October 27, 2023 11:53

Nathan-MV force-pushed the pr-flags branch from 9c5d55c to 64d3643 Compare October 28, 2023 02:22

flags

5420136

Nathan-MV force-pushed the pr-flags branch from 122bace to 5420136 Compare November 2, 2023 16:27

Nathan-MV marked this pull request as draft November 22, 2023 16:01

Splendide-Imaginarius changed the title ~~Compiler Flags~~ Compiler Flags: -Ofast and -flto=full Dec 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compiler Flags: `-Ofast` and `-flto=full` #106

Compiler Flags: `-Ofast` and `-flto=full` #106

Nathan-MV commented Oct 26, 2023 •

edited

Nathan-MV commented Oct 27, 2023

Ancurio commented Oct 27, 2023

Nathan-MV commented Oct 27, 2023

Ancurio commented Oct 27, 2023

Nathan-MV commented Oct 28, 2023

Splendide-Imaginarius commented Nov 2, 2023

Splendide-Imaginarius commented Dec 15, 2023

Splendide-Imaginarius commented Jan 14, 2024

Compiler Flags: -Ofast and -flto=full #106

Are you sure you want to change the base?

Compiler Flags: -Ofast and -flto=full #106

Conversation

Nathan-MV commented Oct 26, 2023 • edited

Nathan-MV commented Oct 27, 2023

Ancurio commented Oct 27, 2023

Nathan-MV commented Oct 27, 2023

Ancurio commented Oct 27, 2023

Nathan-MV commented Oct 28, 2023

Splendide-Imaginarius commented Nov 2, 2023

Splendide-Imaginarius commented Dec 15, 2023

Splendide-Imaginarius commented Jan 14, 2024

Compiler Flags: `-Ofast` and `-flto=full` #106

Compiler Flags: `-Ofast` and `-flto=full` #106

Nathan-MV commented Oct 26, 2023 •

edited