Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Beware using /Os optimization on macOS/Clang #1176

Open
kode54 opened this issue Jul 24, 2022 · 6 comments
Open

[Bug] Beware using /Os optimization on macOS/Clang #1176

kode54 opened this issue Jul 24, 2022 · 6 comments
Labels

Comments

@kode54
Copy link
Contributor

kode54 commented Jul 24, 2022

There is a bug with the automatic inlining system if you use too heavy optimization. It will cause the entire VGMStream init list of functions to be compounded into a single function which has gigantic stack usage, and even a single or two level deep recursion caused by .TXTP files is enough to exhaust the 512KiB stack size limit of secondary threads. It will explode consistently at about 40% through the initializer list, as it attempts to use stack variables which are beyond the stack limit.

@kode54 kode54 added the bug label Jul 24, 2022
@kode54
Copy link
Contributor Author

kode54 commented Jul 24, 2022

I had to use /O1 on Intel initially, but as of Xcode 14, I now have to use /O1 on AArch64 as well.

@bnnm
Copy link
Collaborator

bnnm commented Jul 29, 2022

Tried using IDA with the auto-build mac version (https://github.com/vgmstream/vgmstream/actions/runs/2723884453). If you meant init_vgmstream_functions callbacks in init_vgmstream_from_STREAMFILE are getting inlined, those look as expected and not inlined. Not sure which /O it uses but there are some inlines and optimizations around.

Is this for regular CLI (with cmake) or Cog maybe? If it's the later, maybe there are some extra flags that are affecting default inlining? I see clang has stuff like inlinehint-threshold N to force massive inlines.

@kode54
Copy link
Contributor Author

kode54 commented Sep 7, 2022

Proposition for this. This happens because we use entirely too much stack. Especially for things like filename tests, which stuff a 32KB buffer on the stack every place they're used.

We should find ways to either avoid copying the full file path around everywhere it's checked for things, or we should be using heap allocated buffers for this, and cleaning up after them everywhere an error is handled and they're still allocated.

@kode54
Copy link
Contributor Author

kode54 commented Sep 7, 2022

F.EX. macOS allows main threads to have 2MiB of stack, and all secondary threads by default have up to 512KiB of stack. I've tried increasing stack size limits beyond the 512KiB for created threads manually, but App Store Review apparently configures machines to disregard attempts to do this and force the system limit regardless of what I do with NSThread creation.

@bnnm
Copy link
Collaborator

bnnm commented Sep 7, 2022

Have you tried setting PATH_LIMIT to something like 1024?
https://github.com/vgmstream/vgmstream/blob/master/src/vgmstream.h#L10

@kode54
Copy link
Contributor Author

kode54 commented Sep 7, 2022

The problem is, I don't want to break the edge case where someone actually needs a long path. I made the limit the absolute maximum for Windows UTF-16, which can actually be longer as UTF-8 in the worst case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants