Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler Flags: -Ofast and -flto=full #106

Draft
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

Nathan-MV
Copy link

@Nathan-MV Nathan-MV commented Oct 26, 2023

Add -pipe to CFLAGS
Replace -O3 with -Ofast
Stop using -flto only for Ruby and use flto=full for everything

@Nathan-MV Nathan-MV changed the title Stop using -flto only for Ruby and add some new flags Compiler Flags Oct 26, 2023
@Nathan-MV Nathan-MV force-pushed the pr-flags branch 4 times, most recently from a747950 to d0f7778 Compare October 27, 2023 04:45
@Nathan-MV
Copy link
Author

I won't touch the Mac's compilation flags anymore, as it's too easy to make the compilation fail; it will be a second-class citizen until #99 is implemented

@Ancurio
Copy link

Ancurio commented Oct 27, 2023

What is the intention behind each of these flags?

@Nathan-MV Nathan-MV force-pushed the pr-flags branch 2 times, most recently from 9b878eb to 3b1d4a2 Compare October 27, 2023 11:53
@Nathan-MV
Copy link
Author

What is the intention behind each of these flags?

I was going to clean it up after i woke up, i was using some to try to reduce the size of the binary, but they didn't prove effective in mkxp-z.

-pipe is my attempt to reduce the compilation time, as i thought that, after using -flto=full and -Ofast, the compilation time would be longer
https://wiki.gentoo.org/wiki/GCC_optimization#-pipe

@Ancurio
Copy link

Ancurio commented Oct 27, 2023

-pipe is my attempt to reduce the compilation time, as i thought that, after using -flto=full and -Ofast, the compilation time would be longer
https://wiki.gentoo.org/wiki/GCC_optimization#-pipe

And, did it reduce the compilation times? If yes, by how much?

@Nathan-MV
Copy link
Author

And, did it reduce the compilation times? If yes, by how much?

I'll just remove it.

image
image

@Splendide-Imaginarius
Copy link

Hi, thanks for the PR, really appreciate it. It's fine to ignore macOS for the time being, I've done the same elsewhere.

One trivial concern, and one bigger one (but both of them should be resolvable):

  • These changes don't appear to cover the mkxp-z executable itself (which is handled by the Meson config rather than the Makefile). This will prevent the -Ofast from covering the mkxp-z repo's code, and I think it will prevent the LTO changes from having any effect at all. I also see that CXXFLAGS isn't set, which I believe will prevent these optimizations from having any effect on C++ projects (there aren't a lot of them, but there are some). Let me know if you think I'm mistaken on any of this (I might be!).
  • And the bigger concern: according to Red Hat (who, in my experience, generally have their shit together on this kind of thing), different applications exhibit faster behavior with different optimization levels, because sometimes the instruction cache effect of the code size actually makes a bigger difference than how fast the code runs. So, there are 4 different optimization levels that might actually be the fastest in practice: -O2, -O3, -Os, and -Ofast. My educated guess is that -Ofast will probably be the fastest for mkxp-z, since it uses SDL and I've seen people saying on the SDL issue tracker that it works great there (and I'd expect a lot of other mkxp-z deps, and the mkxp-z repo itself, to have workloads that resemble SDL). However, I am not comfortable with making changes that might actually make things worse until I see some benchmark results indicating that it yields some kind of improvement (I don't trust my educated guesses that much here). Ideally these benchmark results would at least need to have some resemblance to something a real-world game would do. If it's in a real-world game, I'd be happier if it's a game that I can get for free (legally). If it's a simulated benchmark, providing a customScript that I can run would be fine, as long as it has some kind of explanation for why it resembles something a real game would do.

Let me know if you have any trouble resolving either of the above things, I'm happy to help as needed.

(And sorry it took a while for me to reply.)

@Splendide-Imaginarius
Copy link

  • And the bigger concern: according to Red Hat (who, in my experience, generally have their shit together on this kind of thing), different applications exhibit faster behavior with different optimization levels, because sometimes the instruction cache effect of the code size actually makes a bigger difference than how fast the code runs. So, there are 4 different optimization levels that might actually be the fastest in practice: -O2, -O3, -Os, and -Ofast. My educated guess is that -Ofast will probably be the fastest for mkxp-z, since it uses SDL and I've seen people saying on the SDL issue tracker that it works great there (and I'd expect a lot of other mkxp-z deps, and the mkxp-z repo itself, to have workloads that resemble SDL). However, I am not comfortable with making changes that might actually make things worse until I see some benchmark results indicating that it yields some kind of improvement (I don't trust my educated guesses that much here). Ideally these benchmark results would at least need to have some resemblance to something a real-world game would do. If it's in a real-world game, I'd be happier if it's a game that I can get for free (legally). If it's a simulated benchmark, providing a customScript that I can

It looks like threaded surfaces are primarily CPU-bound and are a major bottleneck in some use cases (e.g. mine). I'm planning to test them to see whether tweaking the compiler optimization flags makes any difference there. I'm also planning to test whether GCC vs Clang makes a major difference. (As noted, my prediction is that LTO + -Ofast will be a clear winner; I also suspect, with lower confidence, that Clang will beat GCC.)

@Splendide-Imaginarius Splendide-Imaginarius changed the title Compiler Flags Compiler Flags: -Ofast and -flto=full Dec 15, 2023
@Splendide-Imaginarius
Copy link

In addition to threaded surfaces (as noted above), we probably also should test #148 with the various optimization flags, since that PR is likely to expose a lot of previously unnoticed CPU bottlenecks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants