Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LZ4F_compressFrame() seems to use an abnormally large amount of call stack on Arm Compiler 6.15 (armclang) #1145

Open
borbmizzet opened this issue Aug 17, 2022 · 9 comments

Comments

@borbmizzet
Copy link

borbmizzet commented Aug 17, 2022

LZ4F_compressFrame() seems to use an abnormally large amount of call stack on Arm Compiler 6.15 (armclang).

Compiler flags:
-D_NO_UNALIGNED_ACCESS = True
-D_ARM_TARGET_CPU = 'Cortex-A8'
-D_ARM_TARGET_CPU_REV = 'r3p2'
-D_ARM_TARGET_FPU = 'softvfp+vfpv3'
-D_ARMCLANG_TARGET_CPU = 'cortex-a8'
-D_ARMCLANG_TARGET_FPU = 'neon'
-D_ARMCLANG_MFLOAT_ABI = 'softfp'
-gdwarf-3
-fno-strict-aliasing
-fwrapv
-fshort-enums
-fhonor-nans
-fhonor-infinities
-O1
-std=gnu99
-g

Attempted to compress the attached binary data (the contents of the data.bin file inside the attached data.zip file) using the following code:

void compressFrame(vector<uint8_t>& dest, span<uint8_t> src) {
  LZ4F_preferences_t prefs{};
  prefs.frameInfo.contentSize = src.size();
  prefs.compressionLevel = 0;

  auto max = LZ4F_compressFrameBound(src.size(), &prefs);
  dest.resize(max);

  auto lz4 = LZ4F_compressFrame(dest.data(), dest.size(), src.data(), src.size(), &prefs); // stack overflowed in here

  if (LZ4F_isError(lz4)) {
    assert(false);
    dest.clear();
    return;
  }

  dest.resize(lz4);
}
@t-mat
Copy link
Contributor

t-mat commented Aug 19, 2022

Hi @borbmizzet, thanks for the report.

I've tried the following Cortex-A8 emulation, and it works fine on my environment.

sudo apt-get install -y qemu-system-arm qemu-user-static gcc-arm-linux-gnueabihf

make V=1 platformTest CC=arm-linux-gnueabihf-gcc QEMU_SYS="qemu-arm-static -cpu cortex-a8"

file programs/lz4
# programs/lz4: ELF 32-bit LSB executable, ARM, EABI5 version 1 (GNU/Linux),
# statically linked, BuildID[sha1]=00cd4a718dbd4007e303e484178032cd6ae70c4a,
# for GNU/Linux 3.2.0, not stripped

Could you try to build genuine lz4cli and test it?
Since it internally uses LZ4F API, if something wrong it will fail to compress your data.

@Cyan4973
Copy link
Member

LZ4F_compressFrame() seems to use an abnormally large amount of call stack on Arm Compiler 6.15 (armclang).

Could you define "abnormally large" ?

@Cyan4973
Copy link
Member

It would be interesting to know what is considered "large" for this setup.

LZ4F_compressFrame() needs space to allocate the compression state.
The state is defined here : https://github.com/lz4/lz4/blob/v1.9.4/lib/lz4frame.c#L262 .
It's not "large" by any mean, probably < 100 bytes, but not tiny either.

If that amount is considered too much, even that can be exported onto the heap,
by compiling lz4frame.c with build macro LZ4F_HEAPMODE set to 1.

@borbmizzet
Copy link
Author

I don't have any specifics on the stack usage without lz4, but our stack size for the thread in question was capped at 10KB on a 32-bit build, but even with lz4 with pretty much identical call stack up to that point, none of our other 32 or 64 bit builds even come close to hitting that cap. It seems like it would be some sort of recursive call in lz4 based on some compiler-specific code or something, because doubling the stack size for that thread on that build fixed the stack overflow, though it still topped off at 18KB stack usage.

@Cyan4973
Copy link
Member

It looks like the 16KB hash table is a candidate member of this stack budget.

But it shouldn't be: in "recent" versions, this budget has been moved to heap.

Which begs the question : which version are you using in your tests ?

@borbmizzet
Copy link
Author

1.9.3

@Cyan4973
Copy link
Member

Cyan4973 commented Sep 18, 2022

Indeed, that's where the stack memory budget is.

The solution is to compile the library with LZ4F_HEAPMODE,
which will push the stack memory budget onto the heap:
https://github.com/lz4/lz4/tree/dev/lib#build-macros .

If you can't recompile, the alternative is to not use LZ4F_compressFrame(),
and use instead any other entry points using an explicit compression context.
Alternatives include LZ4F_compressFrame_usingCDict() (static linking only) with NULL dictionary,
or the tri-functions LZ4F_compressBegin(), LZ4F_compressUpdate(), LZ4F_compressEnd().

@borbmizzet
Copy link
Author

I'll give that a shot, thanks!

@borbmizzet
Copy link
Author

borbmizzet commented Nov 14, 2022

Updating to 1.9.4 fixed the issue. However, I've looked through the commit log between 1.9.3 and 1.9.4, and have been unable to pinpoint the exact commit(s) that makes this change. Could someone point it out for me? Also, the reason our other builds didn't stack overflow was because the stack size for all threads on our other builds was, unbeknownst to me, overridden to 1MB on a lower layer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants