Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

julia_gc: don't call jl_get_ptls_states from code running during GC #5715

Closed
wants to merge 1 commit into from

Conversation

fingolfin
Copy link
Member

It is not quite safe there when multithreading, it seems. Instead get it from the task struct.

This is a tiny but necessary step towards resolving the multi threading issues we've been seeing in GAP.jl

It is not quite safe there when multithreading, it seems.
Instead get it from the task struct.
@fingolfin fingolfin added topic: kernel release notes: not needed PRs introducing changes that are wholly irrelevant to the release notes topic: julia Julia GC integration and related matters backport-to-4.13 labels May 16, 2024
@fingolfin
Copy link
Member Author

Urgh. A good thing the tests here run with a different Julia version than what I used. But I guess I really should be running this with at least Julia 1.6 and latest Julia, and ideally all versions in between, to ensure we don't trip over some issues with Julia kernel API changes sigh

Copy link
Contributor

@ThomasBreuer ThomasBreuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Julia 1.8.5, the installation works, but I can confirm the error message from the CI test with a very old Julia version (1.4.1) that is not supported by GAP.jl; perhaps GAP's configure should better refuse such old Julia versions.

Here is the output which I get with Julia 1.4.1.

src/julia_gc.c: In function ‘SafeScanTaskStack’:
src/julia_gc.c:458:46: warning: implicit declaration of function ‘jl_get_safe_restore’; did you mean ‘jl_gc_safe_enter’? [-Wimplicit-function-declaration]
  458 |     volatile jl_jmp_buf * old_safe_restore = jl_get_safe_restore();
      |                                              ^~~~~~~~~~~~~~~~~~~
      |                                              jl_gc_safe_enter
src/julia_gc.c:458:46: warning: nested extern declaration of ‘jl_get_safe_restore’ [-Wnested-externs]
src/julia_gc.c:458:46: warning: initialization of ‘volatile struct __jmp_buf_tag (*)[1]’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion]
src/julia_gc.c:471:9: warning: implicit declaration of function ‘jl_set_safe_restore’; did you mean ‘jl_gc_safe_enter’? [-Wimplicit-function-declaration]
  471 |         jl_set_safe_restore(&exc_buf);
      |         ^~~~~~~~~~~~~~~~~~~
      |         jl_gc_safe_enter
src/julia_gc.c:471:9: warning: nested extern declaration of ‘jl_set_safe_restore’ [-Wnested-externs]
src/julia_gc.c: In function ‘ScanTaskStack’:
src/julia_gc.c:512:24: error: ‘jl_task_t’ {aka ‘struct _jl_task_t’} has no member named ‘ptls’; did you mean ‘tls’?
  512 |     MarkFromList(task->ptls, stack);
      |                        ^~~~
      |                        tls
src/julia_gc.c: In function ‘GapRootScanner’:
src/julia_gc.c:550:37: warning: implicit declaration of function ‘jl_get_current_task’; did you mean ‘jl_current_task’? [-Wimplicit-function-declaration]
  550 |     jl_task_t * task = (jl_task_t *)jl_get_current_task();
      |                                     ^~~~~~~~~~~~~~~~~~~
      |                                     jl_current_task
src/julia_gc.c:550:37: warning: nested extern declaration of ‘jl_get_current_task’ [-Wnested-externs]
src/julia_gc.c:550:24: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
  550 |     jl_task_t * task = (jl_task_t *)jl_get_current_task();
      |                        ^
src/julia_gc.c: In function ‘GapTaskScanner’:
src/julia_gc.c:604:17: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
  604 |     if (task == (jl_task_t *)jl_get_current_task())
      |                 ^
src/julia_gc.c:625:5: warning: implicit declaration of function ‘jl_active_task_stack’ [-Wimplicit-function-declaration]
  625 |     jl_active_task_stack(task, &active_start, &active_end, &total_start, &total_end);
      |     ^~~~~~~~~~~~~~~~~~~~
src/julia_gc.c:625:5: warning: nested extern declaration of ‘jl_active_task_stack’ [-Wnested-externs]
src/julia_gc.c: In function ‘InitBags’:
src/julia_gc.c:833:32: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
  833 |         RootTaskOfMainThread = (jl_task_t *)jl_get_current_task();
      |                                ^
make: *** [Makefile.rules:426: build/obj/src/julia_gc.c.o] Error 1

@fingolfin
Copy link
Member Author

We are actually testing with Julia 1.6 (and also requiring Julia >= 1.6). But it seems task->ptls was only added in Julia 1.7. I can workaround that, but upon thinking about the code in total again, I am now unsure whether this patch is even correct -- at the same time, I empirically verified that without the patch, I get hard crashes during GC, while with the patch it is much better... I think there is a deeper issue here and I need to talk to the people who implemented the Julia multi threaded GC code.

@fingolfin fingolfin marked this pull request as draft May 16, 2024 19:56
@fingolfin
Copy link
Member Author

Also, I think we need to setup a way to test such changes in GAP against GAP.jl & multiple Julia versions. Otherwise it is too easy to make bad changes here. But I don't want to burden all GAP PRs by that. One option would be to add a new workflow file which is restricted to be triggered by commits which modify src/julia_gc* (though of course technically other unrelated changes to the GAP kernel might break the Julia integration, but I think overall the risk is not that high, and in any case we'd still be better off than with the current setup).

Anyway, for now I marked this as draft.

@fingolfin fingolfin closed this May 17, 2024
@fingolfin
Copy link
Member Author

This change is wrong, but I think I now better understand the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release notes: not needed PRs introducing changes that are wholly irrelevant to the release notes topic: julia Julia GC integration and related matters topic: kernel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants