Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OS_ENABLE_HW_BOUND_CHECK can cause resource leak #3320

Open
yamt opened this issue Apr 16, 2024 · 4 comments
Open

OS_ENABLE_HW_BOUND_CHECK can cause resource leak #3320

yamt opened this issue Apr 16, 2024 · 4 comments

Comments

@yamt
Copy link
Collaborator

yamt commented Apr 16, 2024

on SEGV on the guard pages, the signal handler performs longjmp to recover the control flow.
for aot frames it's fine.
however, when it's caused by native code like host functions, it can end up with resource leaks.

possible solutions:

a. consider native functions hitting guard pages a bug. insert "manual" overflow detection logic to appropriate places instead.

b. insert "resource cleanup callbacks" into the jmpbuf_node stack. execute them after a jump.

@wenyongh
Copy link
Contributor

Maybe the second way is better, since the issue not only occurs in the host function, but also may occur in runtime, .e.g. allocating argv buffer in wasm_runtime_invoke_native:

if (argc1 > sizeof(argv_buf) / sizeof(uint64)) {
size = sizeof(uint64) * (uint64)argc1;
if (!(argv1 = runtime_malloc((uint32)size, exec_env->module_inst, NULL,
0))) {
return false;
}

It didn't happen since it is allocated only when the default buffer in local variable isn't enough and by default the latter is large enough (allows to call a wasm/host function with 31 arguments).

If we only care about the memory allocated by wasm_runtime_malloc, maybe we can auto handle it wasm_runtime_malloc, wasm_runtime_free and after longjmp:

  • in wasm_runtime_malloc, check whether runtime is calling the wasm/host function (e.g. whether the global exec_env_tls is not NULL and its jmp stack isn't NULL), and if yes, allocate an extra node to take the memory allocated, and add the node into current WASMJmpBuf's node list
  • in wasm_runtime_free, remove node which has the memory pointer to free in current WASMJmpBuf's node list
  • after longjmp, free current WASMJmpBuf's node list and the memory pointers inside them

@yamt
Copy link
Collaborator Author

yamt commented Apr 17, 2024

Maybe the second way is better, since the issue not only occurs in the host function, but also may occur in runtime, .e.g. allocating argv buffer in wasm_runtime_invoke_native:

right now i'm inclined towards the first approach because those checks are necessary for OS_ENABLE_HW_BOUND_CHECK=0 anyway.

@wenyongh
Copy link
Contributor

OK, and it would be good if we also fix the argv buffer allocation issue in wasm_runtime_invoke_native.

yamt added a commit to yamt/wasm-micro-runtime that referenced this issue Apr 19, 2024
This is a test code to examine native stack overflow detection logic.

The current output on my environment (macOS amd64):

```shell
====== Interpreter
 stack size   | fail?  | leak?  | exception
---------------------------------------------------------------------------
    0 - 14704 | failed | leaked | Exception: native stack overflow
14704 - 17904 | failed |     ok | Exception: native stack overflow
17904 - 24576 |     ok |     ok |

====== AOT
 stack size   | fail?  | leak?  | exception
---------------------------------------------------------------------------
    0 - 18176 | failed | leaked | Exception: native stack overflow
18176 - 24576 |     ok |     ok |

====== AOT WAMR_DISABLE_HW_BOUND_CHECK=1
 stack size   | fail?  | leak?  | exception
---------------------------------------------------------------------------
    0 -  1968 | failed |     ok | Exception: native stack overflow
 1968 - 24576 |     ok |     ok |
```

This is a preparation to work on relevant issues, including:

bytecodealliance#3325
bytecodealliance#3320
bytecodealliance#3314
bytecodealliance#3297
wenyongh pushed a commit that referenced this issue Apr 19, 2024
This is a test code to examine native stack overflow detection logic.

The current output on my environment (macOS amd64):

```shell
====== Interpreter
 stack size   | fail?  | leak?  | exception
---------------------------------------------------------------------------
    0 - 14704 | failed | leaked | Exception: native stack overflow
14704 - 17904 | failed |     ok | Exception: native stack overflow
17904 - 24576 |     ok |     ok |

====== AOT
 stack size   | fail?  | leak?  | exception
---------------------------------------------------------------------------
    0 - 18176 | failed | leaked | Exception: native stack overflow
18176 - 24576 |     ok |     ok |

====== AOT WAMR_DISABLE_HW_BOUND_CHECK=1
 stack size   | fail?  | leak?  | exception
---------------------------------------------------------------------------
    0 -  1968 | failed |     ok | Exception: native stack overflow
 1968 - 24576 |     ok |     ok |
```

This is a preparation to work on relevant issues, including:

#3325
#3320
#3314
#3297
victoryang00 pushed a commit to victoryang00/wamr-aot-gc-checkpoint-restore that referenced this issue May 1, 2024
This is a test code to examine native stack overflow detection logic.

The current output on my environment (macOS amd64):

```shell
====== Interpreter
 stack size   | fail?  | leak?  | exception
---------------------------------------------------------------------------
    0 - 14704 | failed | leaked | Exception: native stack overflow
14704 - 17904 | failed |     ok | Exception: native stack overflow
17904 - 24576 |     ok |     ok |

====== AOT
 stack size   | fail?  | leak?  | exception
---------------------------------------------------------------------------
    0 - 18176 | failed | leaked | Exception: native stack overflow
18176 - 24576 |     ok |     ok |

====== AOT WAMR_DISABLE_HW_BOUND_CHECK=1
 stack size   | fail?  | leak?  | exception
---------------------------------------------------------------------------
    0 -  1968 | failed |     ok | Exception: native stack overflow
 1968 - 24576 |     ok |     ok |
```

This is a preparation to work on relevant issues, including:

bytecodealliance#3325
bytecodealliance#3320
bytecodealliance#3314
bytecodealliance#3297
victoryang00 pushed a commit to victoryang00/wamr-aot-gc-checkpoint-restore that referenced this issue May 2, 2024
This is a test code to examine native stack overflow detection logic.

The current output on my environment (macOS amd64):

```shell
====== Interpreter
 stack size   | fail?  | leak?  | exception
---------------------------------------------------------------------------
    0 - 14704 | failed | leaked | Exception: native stack overflow
14704 - 17904 | failed |     ok | Exception: native stack overflow
17904 - 24576 |     ok |     ok |

====== AOT
 stack size   | fail?  | leak?  | exception
---------------------------------------------------------------------------
    0 - 18176 | failed | leaked | Exception: native stack overflow
18176 - 24576 |     ok |     ok |

====== AOT WAMR_DISABLE_HW_BOUND_CHECK=1
 stack size   | fail?  | leak?  | exception
---------------------------------------------------------------------------
    0 -  1968 | failed |     ok | Exception: native stack overflow
 1968 - 24576 |     ok |     ok |
```

This is a preparation to work on relevant issues, including:

bytecodealliance#3325
bytecodealliance#3320
bytecodealliance#3314
bytecodealliance#3297
Signed-off-by: victoryang00 <victoryang00@ucsc.edu>
@yamt
Copy link
Collaborator Author

yamt commented May 2, 2024

another reason why (b) isn't enough is host-provided apis like libc functions.
when we hit the guard pages in the middle of libc, it's impossible to recover gracefully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants