wrenCall -> foreign call causes memory corruption #1185

redthing1 · 2024-04-04T03:26:45Z

Issue

I want to wrenCall a function in a wren script, and get its return value. This function will call into a foreign function.

The foreign function needs to do some list/dictionary processing with nested data, so I need some more slots. I use wrenEnsureSlots.

However, this causes some pretty bad corruption. In valgrind I see invalid reads, and this also results in the foreign call not actually properly returning a value, and the wren stack being corrupted. The program also sometimes segfaults when trying to free the VM.

I understand that this is related to this previous issue.

My understanding is that Wren doesn't like when you wrenEnsureStack within a foreign call; however, I am not aware of any other way than wrenEnsureSlots to ensure I have enough slots to process nested data.

Is there a better way I can do this? Am I using the library incorrectly?

It seems like a fairly standard use case to use some additional slots in a foreign call implementation. But calling wrenEnsureSlots causes corruption.

Interestingly, if I don't call wrenEnsureSlots inside the foreign call, because I called it before wrenCall, the wren stack doesn't get corrupted and does return the bool, but valgrind of course reports a lot of miscellaneous memory corruption.

Overview of my code

The overview of the scripts is as follows:

builtin.wren:

class Test {
  foreign static cool_func_impl(name, platforms, hashes)

  static cool_func(meta) {
    var meta_name = meta.name
    var meta_platforms = meta.platforms
    var meta_hashes = meta.hashes

    return cool_func_impl(meta_name, meta_platforms, meta_hashes)
  }
}

test.wren:

import "builtin" for Test

System.print("foreign call test")

class ProgramMeta {
    construct new() { }

    name { "foreign call test" }
    platforms { ["windows", "linux", "macos"] }
    hashes {{
        "windows" : [
            "hash1",
            "hash2"
        ],
        "linux" : [
            "hash1",
            "hash2"
        ],
        "macos" : [
            "hash1",
            "hash2"
        ]
    }}
}

class Script {
    static run() {
        System.print("script is now running")

        return Test.cool_func(ProgramMeta.new())
    }
}

Implementation of the foreign call:

#define WREN_NEEDED_SLOTS 12

// ...

static void fcall_cool_func_impl(WrenVM* vm) {
  // read the arguments to make a CoolMeta struct
  // if we can parse successfully, return true

  // ensure that we have plenty of slots
  wrenEnsureSlots(vm, WREN_NEEDED_SLOTS);
  size_t slot_temp = 10;

  // get name
  size_t arg_name_slot = 1;
  assert_msg(wrenGetSlotType(vm, arg_name_slot) == WREN_TYPE_STRING, "name must be a string");
  const char* name = wrenGetSlotString(vm, arg_name_slot);

  // get platforms
  size_t arg_platforms_slot = 2;
  assert_msg(wrenGetSlotType(vm, arg_platforms_slot) == WREN_TYPE_LIST, "platforms must be a list");
  WrenHandle* platforms_list = wrenGetSlotHandle(vm, arg_platforms_slot);
  std::vector<std::string> platforms;
  size_t platforms_count = wrenGetListCount(vm, arg_platforms_slot);
  for (size_t i = 0; i < platforms_count; i++) {
    size_t platform_list_el_slot = slot_temp;
    wrenGetListElement(vm, arg_platforms_slot, i, platform_list_el_slot);
    assert_msg(wrenGetSlotType(vm, platform_list_el_slot) == WREN_TYPE_STRING, "platform must be a string");
    const char* platform = wrenGetSlotString(vm, platform_list_el_slot);
    platforms.push_back(platform);
  }

  // return true
  wrenSetSlotBool(vm, 0, true);
}

My program then does the following:

  // now, we want to call Script.run()

  wrenEnsureSlots(vm, WREN_NEEDED_SLOTS);

  // get handle to Script class
  wrenGetVariable(vm, source_module, "Script", 0);
  WrenHandle* script_class = wrenGetSlotHandle(vm, 0);
  // make call handle to run method
  WrenHandle* run_method = wrenMakeCallHandle(vm, "run()");

  // invoke Script.run()
  wrenSetSlotHandle(vm, 0, script_class);
  printf("wren: call Script.run()\n");
  wrenCall(vm, run_method);

  // get the boolean return value
  if (wrenGetSlotType(vm, 0) != WREN_TYPE_BOOL) {
    printf("wren: Script.run() must return a boolean, but it returned %d\n", wrenGetSlotType(vm, 0));
    assert_msg(false, "Script.run() returned invalid type");
  }
  bool run_result = wrenGetSlotBool(vm, 0);
  printf("wren: Script.run() returned %s\n", run_result ? "true" : "false");

Reproduction of Issue

I have made a full reproduction of the issue here: https://github.com/redthing1/wrenvm_tests/tree/f3985888d39168bd69b281083ff6c3a77dd9ec3d/foreigncall

Instructions to build the repro

install prerequisites:

c/c++ compiler
meson
ninja

get submodules (wren is a submodule):

git submodule update --init --recursive

build a demo, such as ./foreigncall:

cd ./foreigncall
meson setup build
ninja -C build
./build/foreigncall_test

The text was updated successfully, but these errors were encountered:

ruby0x1 · 2024-04-04T03:31:11Z

Quick question: Did you run in debug (assertions are currently only active in debug, which guards against some misplaced uses).

One other thing to try: call wrenEnsureSlots later. You should likely read the values for your arguments and receiver (slot 0-N) before calling ensure slots.

Edit: also thanks for the detailed post and reproduction ✨

redthing1 · 2024-04-04T03:38:55Z

Quick question: Did you run in debug (assertions are currently only active in debug, which guards against some misplaced uses).

I tried enabling debug assertions with #define DEBUG in wren_common.h:
I don't see any different results with that however. It doesn't seem like it catches any assertions.

One other thing to try: call wrenEnsureSlots later. You should likely read the values for your arguments and receiver (slot 0-N) before calling ensure slots.

I just tried this: I read all the arguments out of slots first.

But even with the list processing commented out, simply that wrenEnsureSlots call alone seems to cause all the issues.

static void fcall_cool_func_impl(WrenVM* vm) {
  // read the arguments to make a CoolMeta struct
  // if we can parse successfully, return true

  // get name
  size_t arg_name_slot = 1;
  assert_msg(wrenGetSlotType(vm, arg_name_slot) == WREN_TYPE_STRING, "name must be a string");
  const char* name = wrenGetSlotString(vm, arg_name_slot);

  // get platforms
  size_t arg_platforms_slot = 2;
  assert_msg(wrenGetSlotType(vm, arg_platforms_slot) == WREN_TYPE_LIST, "platforms must be a list");
  WrenHandle* platforms_list = wrenGetSlotHandle(vm, arg_platforms_slot);
  std::vector<std::string> platforms;

  // get hashes
  size_t arg_hashes_slot = 3;
  assert_msg(wrenGetSlotType(vm, arg_hashes_slot) == WREN_TYPE_MAP, "hashes must be a map");
  WrenHandle* hashes_map = wrenGetSlotHandle(vm, arg_hashes_slot);
  std::unordered_map<std::string, std::string> hashes;

  // ensure that we have plenty of slots
  wrenEnsureSlots(vm, WREN_NEEDED_SLOTS);
  size_t slot_temp = 10;

  // size_t platforms_count = wrenGetListCount(vm, arg_platforms_slot);
  // for (size_t i = 0; i < platforms_count; i++) {
  //   size_t platform_list_el_slot = slot_temp;
  //   wrenGetListElement(vm, arg_platforms_slot, i, platform_list_el_slot);
  //   assert_msg(wrenGetSlotType(vm, platform_list_el_slot) == WREN_TYPE_STRING, "platform must be a string");
  //   const char* platform = wrenGetSlotString(vm, platform_list_el_slot);
  //   platforms.push_back(platform);
  // }

  // release handles
  wrenReleaseHandle(vm, platforms_list);
  wrenReleaseHandle(vm, hashes_map);

  // return true
  wrenSetSlotBool(vm, 0, true);
}

Edit: also thanks for the detailed post and reproduction ✨

You're welcome! I like to make your job easier, I know what it's like to be on your end :)

mhermier · 2024-04-04T07:57:57Z

wrenEnsureSlots is forbidden in foreign functions. It break some internal assumptions about the stack, in case the stack needs to be reallocated.

ruby0x1 · 2024-04-04T08:13:41Z

Not quite right,

// When your foreign function is called, you are given one slot for the receiver
// and each argument to the method. The receiver is in slot 0 and the arguments
// are in increasingly numbered slots after that. You are free to read and
// write to those slots as you want. If you want more slots to use as scratch
// space, you can call wrenEnsureSlots() to add more.

...

// Ensures that the foreign method stack has at least [numSlots] available for
// use, growing the stack if needed.
//
// Does not shrink the stack if it has more than enough slots.
//
// It is an error to call this from a finalizer.
WREN_API void wrenEnsureSlots(WrenVM* vm, int numSlots);

This API is and has always been FOR foreign functions use to add more to the scratch space for your methods. I have several thousand foreign methods using this function this way in my engine just fine.

There are times when using it can definitely create problems, though, and needs to be addressed with either proper assertions or a fix. e.g if you call it in a finalizer and outside of a foreign.

So reproducible examples like this are plenty helpful in isolating stuff.

redthing1 · 2024-04-04T17:16:09Z

I was also under the impression that it was explicitly stated in the documentation that wrenEnsureSlots is for use in foreign functions.

In the meantime, is there an alternative way to accomplish the same thing?
One idea that looked promising as a hacky workaround was to pass in dummy parameters to provide additional temp slots. I don't like this however.

As for diagnosing this issue: Valgrind showed pretty clearly that Wren tries to reallocate stuff when I try to wrenEnsureSlots which then leads to invalid reads.

mhermier · 2024-04-04T19:24:10Z

The easiest workaround is to hack the VM configuration to increase the initial stack size so that the fiber stack never reallocate.

redthing1 · 2024-04-05T02:33:03Z

So, I did a little bit of looking into this.

Current code

I added some logs to wren, and here are the results for the current code:

wrenNewFiber: fiber->stackCapacity = 4
wrenEnsureStack: fiber->stackCapacity = 4, needed = 5
wrenEnsureStack: stack capacity is insufficient, growing
wrenEnsureStack: new capacity = 8
wrenEnsureStack: reallocating stack
wrenEnsureStack: fiber->stackCapacity = 8, needed = 6
wrenEnsureStack: fiber->stackCapacity = 8, needed = 9
wrenEnsureStack: stack capacity is insufficient, growing
wrenEnsureStack: new capacity = 16
wrenEnsureStack: reallocating stack
foreign call test
wren: ensure slots
wrenEnsureSlots: numSlots=12
wrenEnsureSlots: creating fiber
wrenNewFiber: fiber->stackCapacity = 1
wrenEnsureSlots: currentSize=0
wrenEnsureSlots: current size is insufficient, resizing
wrenEnsureSlots: wrenEnsureStack(needed=12)
wrenEnsureStack: fiber->stackCapacity = 1, needed = 12
wrenEnsureStack: stack capacity is insufficient, growing
wrenEnsureStack: new capacity = 16
wrenEnsureStack: reallocating stack
wren: get handle to Script class
wren: set slot 0 to Script class
wren: call Script.run()
wrenEnsureStack: fiber->stackCapacity = 16, needed = 2
wrenEnsureStack: fiber->stackCapacity = 16, needed = 4
wrenEnsureStack: fiber->stackCapacity = 16, needed = 6
wrenEnsureStack: fiber->stackCapacity = 16, needed = 9
script is now running
wrenEnsureStack: fiber->stackCapacity = 16, needed = 4
wrenEnsureStack: fiber->stackCapacity = 16, needed = 5
wrenEnsureStack: fiber->stackCapacity = 16, needed = 11
wrenEnsureStack: fiber->stackCapacity = 16, needed = 6
wrenEnsureStack: fiber->stackCapacity = 16, needed = 8
wrenEnsureStack: fiber->stackCapacity = 16, needed = 11
fcall_cool_func_impl: enter
wrenEnsureSlots: numSlots=12
wrenEnsureSlots: currentSize=4
wrenEnsureSlots: current size is insufficient, resizing
wrenEnsureSlots: wrenEnsureStack(needed=18)
wrenEnsureStack: fiber->stackCapacity = 16, needed = 18
wrenEnsureStack: stack capacity is insufficient, growing
wrenEnsureStack: new capacity = 32
wrenEnsureStack: reallocating stack
wren: Script.run() must return a boolean, but it returned 7
assertion failed: Script.run() returned invalid type

Notes

As we can see above, the wrenEnsureSlots(WREN_NEEDED_SLOTS) I do before the wrenCall ends up growing the stack to size 16. However, my call to the very same thing within the foreign function results in requesting a stack capacity of 18: I'm not sure what the cause of this discrepancy is, please explain it to me.

This got me thinking that if I can make my initial call allocate a large enough stack, then my wrenEnsureSlots in the foreign function won't have to reallocate the stack.

Potential workaround

So, if I do wrenEnsureSlots(WREN_NEEDED_SLOTS * 2) (or some other slack factor), I can get the initial stack reallocate prior to wrenCall to create a stack of size 32. Then, my wrenEnsureSlots call within the foreign function implementation does not need to reallocate, and no memory corruption happens.

So it looks like the issue can be worked around like this, but I would like someone to explain to me why the requested stack size differs from calls to ensure the same number of slots. Also, I think it would be beneficial to add some notes about this to the documentation. I can volunteer to do this as well, once I understand the root cause behind it.

Thank you all for your help in troubleshooting this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wrenCall -> foreign call causes memory corruption #1185

wrenCall -> foreign call causes memory corruption #1185

redthing1 commented Apr 4, 2024 •

edited

ruby0x1 commented Apr 4, 2024 •

edited

redthing1 commented Apr 4, 2024 •

edited

mhermier commented Apr 4, 2024 via email

ruby0x1 commented Apr 4, 2024 •

edited

redthing1 commented Apr 4, 2024

mhermier commented Apr 4, 2024 via email

redthing1 commented Apr 5, 2024 •

edited

wrenCall -> foreign call causes memory corruption #1185

wrenCall -> foreign call causes memory corruption #1185

Comments

redthing1 commented Apr 4, 2024 • edited

Issue

Overview of my code

Reproduction of Issue

Instructions to build the repro

ruby0x1 commented Apr 4, 2024 • edited

redthing1 commented Apr 4, 2024 • edited

mhermier commented Apr 4, 2024 via email

ruby0x1 commented Apr 4, 2024 • edited

redthing1 commented Apr 4, 2024

mhermier commented Apr 4, 2024 via email

redthing1 commented Apr 5, 2024 • edited

Current code

Notes

Potential workaround

redthing1 commented Apr 4, 2024 •

edited

ruby0x1 commented Apr 4, 2024 •

edited

redthing1 commented Apr 4, 2024 •

edited

ruby0x1 commented Apr 4, 2024 •

edited

redthing1 commented Apr 5, 2024 •

edited