Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCC 6.2.1 breaks libmill #167

Open
raedwulf opened this issue Oct 19, 2016 · 17 comments
Open

GCC 6.2.1 breaks libmill #167

raedwulf opened this issue Oct 19, 2016 · 17 comments

Comments

@raedwulf
Copy link
Contributor

raedwulf commented Oct 19, 2016

Accidentally pressed enter when creating this issue... so you might have an empty notification!
I compiled libmill and libdill with GCC 6.2.1 and it appears the coroutines no longer work in both (it segfaults).

I've only investigated so far to see if it was only my assembly context switching that was the root of the problem and it is not. It occurs with the sigsetjmp/longjmp implementation as well - suggesting it might be something else...

@raedwulf raedwulf changed the title GCC 6.2.1 b GCC 6.2.1 breaks libmill Oct 19, 2016
@raedwulf
Copy link
Contributor Author

raedwulf commented Oct 20, 2016

I've narrowed down the issue:

#include <assert.h>
#include <stdio.h>

#include "libdill.h"

coroutine void worker(int count, const char *text) {
    int i;
    for(i = 0; i != count; ++i) {
        printf("%s\n", text);
        int rc = msleep(now() + 10);
        assert(rc == 0);
    }
}


int main() {
    int cr1;
    {
        sigjmp_buf *ctx;
        int h = dill_prologue(&ctx);
        if(h >= 0) {
            if(!sigsetjmp(*ctx, 0)) {
                int dill_anchor[dill_unoptimisable1];
                dill_unoptimisable2 = &dill_anchor;
                volatile size_t sz = (char*)&dill_anchor - (char*)hdata(h, NULL);
                char dill_filler[sz]; // this line breaks because it now tries to forcefully build the stack this size!
                dill_unoptimisable2 = dill_filler;
                worker(4, "a ");
                dill_epilogue();
            }
        }
        cr1 = h;
    }

    hclose(cr1);
    return 0;
}

@raedwulf
Copy link
Contributor Author

I've figured out what GCC 6.2.1 introduces that breaks the "black magic". In older versions of GCC, the -fstack-protector does not cover all cases. They have fixed it in 6.2.1 and so it detects when the stack allocated by char dill_filler[sz]; will overflow the stack.

A workaround is to pass -fno-stack-protector in CFLAGS. However, I believe we should find a better solution to this as -fstack-protector is a common default (how common, I'm unsure).

If you want to try it yourself, this is the smallest case I could come up with which works with GCC 4.9.3 and 5.4.0 but not with 6.2.1.

#include "libdill.h"

coroutine void hello() {
    return;
}

int main() {
    int cr1;
    {
        sigjmp_buf *ctx;
        int h = dill_prologue(&ctx);
        if(h >= 0) {
            if(!sigsetjmp(*ctx, 0)) {
                int dill_anchor[dill_unoptimisable1];
                dill_unoptimisable2 = &dill_anchor;
                volatile size_t sz = (char*)&dill_anchor - (char*)hdata(h, NULL);
                char dill_filler[sz];
                dill_unoptimisable2 = dill_filler;
                hello();
                dill_epilogue();
            }
        }
        cr1 = h;
    }
    hclose(cr1);
    return 0;
}

@sustrik
Copy link
Owner

sustrik commented Oct 20, 2016

Do you have any idea what can be possibly done except of turning off stack protection? It seems to me that we are doing exactly what stack protection is supposed to prevent so there's little choice but to disable it.

@raedwulf
Copy link
Contributor Author

raedwulf commented Oct 20, 2016

My only solution in mind is inline assembly but it has the downside that it would need to be written per architecture. Making it work with -fstack-protector is desirable because we do want stack protection elsewhere as one of the main use case is for networking. I can do x86(_64) and arm (will need to setup my various SBCs and learn ARM assembly) and might be able to do powerpc (on os x) if my imac g5 still boots (dodgy hdd). I'm unable to test modern OS X. I should be able to test DragonflyBSD but I'm having difficulty with the onboard LAN card (then again I could set all the BSDs up in qemu if necessary).

@raedwulf
Copy link
Contributor Author

I've been working on a solution... and sad to say, but there's no easily-compatible solution.
The closest I arrived at was:

#define go(fn) \
    ({\
        sigjmp_buf *ctx;\
        int h = dill_prologue(&ctx);\
        if(h >= 0) {\
            if(!dill_setjmp(*ctx)) {\
                asm volatile("leaq -8(%%rax), %%rsp":: "rax" (hquery(h, NULL)));\
                fn;\
                dill_epilogue();\
            }\
        }\
        h;\
    })

This almost works but parameters can be sometimes loaded in fn before the inline assembly is executed. In which case "rsp" is used after the inline assembly to read local variables, which of, course, are invalid because the stack has been switched.

Short of a dynamic code generator, poorer syntax or reduced functionality will be introduced... sadly. C Macros are just not powerful enough.

Apart from sticking with -fno-stack-protector, the only alternative I can see is to add mandatory first parameter to the coroutine to pass the stack pointer. The coroutine will then have to execute some code to switch to the correct stack before the rest of the coroutine is executed.

@raedwulf
Copy link
Contributor Author

raedwulf commented Oct 20, 2016

I think I might have a solution! If we can override __stack_chk_fail with our own implementation, I think we might be on a winner. Then again that would introduce a lot of inefficiency.
Reference here

@raedwulf
Copy link
Contributor Author

raedwulf commented Oct 20, 2016

There's a compromise flag: -fstack-protector-explicit which allows attributes to be passed to opt in for stack protection using __attribute__((stack_protect)) but there's no equivalent opt out.

@raedwulf
Copy link
Contributor Author

raedwulf commented Oct 20, 2016

Fixed it. the unoptimisable vars and VLAs were actually doing more than I thought they were. TLDR; the VLAs are necessary to coerce the compiler to always generate a stack frame (they are unimplementable without a stack frame). The stack frame lets fn reference the local variables (EDIT: which store the function arguments) needed even when the stack pointer is changed.

This works on x86-64:

#define go(fn) \
     ({\
         sigjmp_buf *ctx;\
         int h = dill_prologue(&ctx);\
         if(h >= 0) {\
             if(!dill_setjmp(*ctx)) {\
                 int dill_anchor[dill_unoptimisable1];\
                 dill_unoptimisable2 = &dill_anchor;\
                 asm volatile("leaq -8(%%rax), %%rsp"::"rax"(hquery(h, NULL)));\
                 fn;\
                 dill_epilogue();\
             }\
         }\
         h;\
     })

When I have time, I'll port the assembly one-liners for each platform (it literally just writes to the stack pointer so that the stack protector ignores it).

@sustrik
Copy link
Owner

sustrik commented Oct 20, 2016

Ha, I wrote the code myself and only now I understand how it really works!

When making a patch, please add a comment about what the assembly line does so that people looking at the code won't have to scratch their heads over it.

@sustrik
Copy link
Owner

sustrik commented Oct 20, 2016

Also, a fallback to -fno-stack-protector would be nice.

@raedwulf
Copy link
Contributor Author

Yes, I have that in mind already :) and a comment saying it breaks if you build your application without -fno-stack-protector

@raedwulf
Copy link
Contributor Author

There is a more stable alternative to this - libcoro and lwan do coroutines through having a fixed, single, argument for each coroutine and a stack pointer that is passed through arguments. However, we lose the beauty of being able to simply pass a nested function call-style interface (although something similar could potentially be achieved through variadic macros).

@sustrik
Copy link
Owner

sustrik commented Oct 20, 2016

I think that's an easy trade-off. Stack protector may be nice to have but it is not a critical feature. A sane way to invoke coroutines, on the other hand, is huge improvement to usability/readability and I'd prefer it over stack protection any time. (Not even mentioning that you've apparently found a way to have both at the same time.)

@raedwulf
Copy link
Contributor Author

raedwulf commented Oct 20, 2016

Some of the code I came up with in one of my attempts was mostly acceptable. With some extreme macro-wizardry it might be possible to wrap libcoro's style coroutines. I'd leave this for now, but in case GCC >7.x decides to do something that breaks the current technique, this is a potential avenue to explore.

#define dill_noinl __attribute__((noinline))
#define dill_naked __attribute__((optimize("omit-frame-pointer")))
#define dill_wrap dill_noinl dill_naked
#define dill_coro dill_noinl dill_naked

#define dill_loadsp(fn, ...) dill_wrap void fn(void *s, __VA_ARGS__)\
    {asm("leaq 0(%%rax), %%rsp\njmp " #fn "_body__":: "rax" (s));}

#define coroutine(fn, ...) \
    dill_coro void fn##_body__(void *s, __VA_ARGS__);\
    dill_loadsp(fn, __VA_ARGS__)\
    dill_coro void fn##_body__(void *s, __VA_ARGS__)

/* Statement expressions are a gcc-ism but they are also supported by clang.
   Given that there's no other way to do this, screw other compilers for now.
   See https://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Statement-Exprs.html */
#define go(fn, ...) \
    ({\
        sigjmp_buf *ctx;\
        int h = dill_prologue(&ctx);\
        if(h >= 0) {\
            if(!dill_setjmp(*ctx)) {\
                fn(hquery(h, NULL), __VA_ARGS__);\
                dill_epilogue();\
            }\
        }\
        h;\
    })

Which leads to an interface:

coroutine(worker, int fd) {
     int rc = fdin(fd, -1);
     assert(rc == 0);
     event = 1;
}

int h = go(worker, fd);

P.S. This technique was promising, but haven't managed to get it to work yet - I think if I use the VLA trick from earlier, I can force the stack frame to be generated consistently. I've only managed to get this to work on -O0 so far and occasionally on earlier GCC versions.

@sustrik
Copy link
Owner

sustrik commented Oct 20, 2016

Thinking in long-term I have a better proposal: Let's infiltrate the C standardization committee and make go() part of the language. That would force compiler people to support that kind of thing in some way.

@travisdowns
Copy link

For what it's worth, it is possible to disable stack protector on a per function basis (it was mentioned above that it isn't). The trick is:

__attribute__((optimize("no-stack-protector")))

before the function definition (works from gcc 4.4 onwards or so). Mentioning it here since it may be useful to you and also because this issue appears when you search for per-function stack protection issues.

@sustrik
Copy link
Owner

sustrik commented Mar 4, 2018

Thanks, I'll look into it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants