Timers still running after main() #72

fbrausse · 2019-08-31T13:04:59Z

Hi,

I'm having trouble with object lifetimes. My (dumbed down) application looks something like this:

#include <fio.h>

struct my_context { /* ... */ };
void on_timer(void *) { /* ... */ }
void on_finish(void *udata)
{
  struct my_context *ctx = udata;
  do_some_last_thing_with(ctx);
}

int main()
{
  struct my_context ctx;
  init(&ctx);
  fio_run_every(5000, 1, on_timer, &ctx, on_finish);
  fio_start(.threads=1);
  fini(&my_context);
}

I should add, that I link it dynamically to libfacil.so.

When the application is interrupted by e.g. SIGINT, my understanding is that facil.io's custom signal handler will cause fio_start() to return to main. After main the atexit(3) handlers run, including those that call _dl_fini which calls:

facil.io/lib/facil/fio.c

Line 3531 in ef041d9

static void __attribute__((destructor)) fio_lib_destroy(void) {

Now, there is still the on_timer active and will be shutdown by calling on_finish here:

facil.io/lib/facil/fio.c

Line 3540 in ef041d9

fio_timer_clear_all();

after main already called fini() on ctx, leading to an access violation in on_finish.

Is this the intended behaviour and, if so, how am I using facil.io in the wrong way? My expectation was that on_finish (and actually the entire shutdown procedures from fio_lib_destroy) would be called before fio_start() returns and not when unloading libfacil.so.

I realize this is relatively easy to fix either by dynamic allocation of ctx and something like reference-counting, or by manually dlopen/dlclose, but I am wondering whether there was a way to avoid both of these. Maybe by exposing some kind of fio_init() and fio_fini(), which I could call around fio_start()?

Please forgive me if this issue is already covered by the documentation, I couldn't find any hint regarding timers or interaction of the .on_unsubscribe callbacks with fio_start() or the shutdown procedure of facil.io.

The text was updated successfully, but these errors were encountered:

boazsegev · 2019-08-31T19:10:58Z

Mitigation

A quick and dirty solution would be to make the variable static.

i.e.:

int main()
{
  static struct my_context ctx; /* <= here */
  init(&ctx);
  fio_run_every(5000, 1, on_timer, &ctx, on_finish);
  fio_start(.threads=1);
  fini(&my_context);
}

Another approach might make the fio_timer_clear_all() function available as a public API.

Expected behavior?

Right now, timers, unlike sockets, remain "alive" between calls to fio_stop and fio_start.

This is the current "expected behavior".

I understand that it's questionable and I'm more then happy to get some feedback about this.

The reason for this design, which you might not see as much in C as you might when using facil.io in Ruby (through the iodine gem), is that fio_start may be called multiple times.

In Ruby, I might test some code in the terminal and start and stop facil.io multiple times. In fact, I sometimes do this in C when I'm testing something (though more rarely). i.e.:

facil.io/lib/facil/fio.c

Lines 9549 to 9563 in ef041d9

    
           FIO_FUNC void fio_cycle_test(void) { 
        
             fprintf(stderr, 
        
                     "=== Testing facil.io cycling logic (partial - only tests timers)\n"); 
        
             fio_mark_time(); 
        
             fio_timer_clear_all(); 
        
             struct timespec start = fio_last_tick(); 
        
             fio_run_every(1000, 1, fio_cycle_test_task, NULL, NULL); 
        
             fio_run_every(10000, 1, fio_cycle_test_task2, NULL, NULL); 
        
             fio_start(.threads = 1, .workers = 1); 
        
             struct timespec end = fio_last_tick(); 
        
             fio_timer_clear_all(); 
        
             FIO_ASSERT(end.tv_sec == start.tv_sec + 1 || end.tv_sec == start.tv_sec + 2, 
        
                        "facil.io cycling error?"); 
        
             fprintf(stderr, "* passed.\n"); 
        
           }

Changing this behavior?

I'm not sure if I can change this behavior without a major version change, since it breaks backwards compatibility / expectations (unless we consider this unexpected behavior and then it's a bug). This might only change in the 0.8.x release.

Should I change this? I don't know. I'll need to consider this. I'd love to read some opinions.

fbrausse · 2019-08-31T20:52:22Z

Thank you for the explanations and the static suggestion! I actually did that, but without reference-counting it still doesn't help, because my context also needs to not be finalized when do_some_last_thing_with() runs. So I'm counting these now but it is not quite optimal.

If I understand you correctly, fio_stop()

is equivalent to or at least implied by SIGINT
just stops the reactor from processing more events on the registered file-descriptors / timeouts / sockets
- without necessarily running any .on_unsubscribe targets, but
- with the option to resume it by a subsequent fio_start().

fio_start() therefore plays a double role: once initializing facil.io's global state (i.e. checking whether it is initialized each time it is run) and running the reactor / event-loop - while fio_stop() just has a single purpose, to stop the reactor and not perform any deinitialization.

Exposing the fio_timer_clear_all() would help with this particular case, however, I don't have the overview about all the subsystems that are (de-)initialized. Are the timer events the only ones that survive leaving fio_start()? I suspect not (you seem to have your own memory allocation pools).

In order to restore the state of the application using facil.io to something that is easy to reason about (i.e. all I ever passed to facil.io is either unsubscribed (the API mentions that things, like .on_close, are always called - especially sockets are closed on SIGINT) or never touched again), I see three options that would help (I chose names different from the public API for the new functions):

a) split fio_start() into fio_lib_init() and fio_react() and add a public fio_lib_fini():
- fio_lib_init() assumes the global state is unitialized or in the state fio_lib_fini() left it (let's call it UNINIT); it initializes the state into INIT
- fio_react() assumes fio_lib_init() was called before (i.e. the state is INIT) and no other fio_react() is running; it puts the state into RUNNING.
- fio_lib_fini() assumes fio_lib_init() was called before, but no reactor is running, i.e., the INIT state; it deinitializes the global state and leaves it in UNINIT - such that fio_lib_init()'s assumptions are met.
- fio_stop() continues to behave as it does now: RUNNING -> INIT
- fio_start() continues to behave is it does now: optionally UNINIT -> INIT followed by INIT -> RUNNING.
b) make fio_stop() put the state back to the UNINIT state as though it calls fio_lib_fini() internally.
c) decouple SIGINT from fio_close() by saying

It closes sockets, but leaves the state undefined, don't expect anything.

Is my underlying assumption correct that only these 3 global states of facil.io exist?

In any case, regarding option b), I suppose that would also break the current expectation, because then fio_stop() would call my on_finish() which in your Ruby use-case appearently is not expected. Btw., I'm exporting my API to Python :).

Option c) also does not seem to be very convenient, but at least it would warn me that I have to be careful with raised signals. On a side note: the current behaviour of just stopping the reactor and continuing "as normal" was very intuitive for me!

I suppose, I am looking for an option to change the state from INIT to UNINIT (without resorting to dlclose()). Do you think the current design would allow that? Does fio_lib_destroy() leave the memory in a state which fio_start() can deal with? In that case, maybe exposing fio_lib_destroy() with the added warning that fio_stop() has to be called before (if fio_start() was). This is similar to option a) without the explicit fio_lib_init() (which I don't see a use-case for right now).

EDIT: added fio_start() behaviour to option a) (which is my favourite).

fbrausse · 2019-08-31T21:33:33Z

Do you have any policy regarding ABI stability? If not, another option could be to pass one of your fancy optional arguments, e.g., .do_finish, to fio_stop() which would eliminate the need to expose the deinitialization function itself and retain at least API compatibility.

boazsegev · 2019-09-01T02:10:23Z

@fbrausse ,

It's a bit late here, so I'll try to be focused. Sorry if I don't make sense (was out drinking with friends).

Thank you for the explanations and the static suggestion! I actually did that, but without reference-counting it still doesn't help, because my context also needs to not be finalized when do_some_last_thing_with() runs. So I'm counting these now but it is not quite optimal.

I believe you could, should you desire, move the fini(&my_context) to the on_finish callback.

Normally, I would suggest using a state callback, adding your fini() function to the FIO_CALL_AT_EXIT callback list... however, I think you exposed a bug where timer cleanup will run after the AT_EXIT callbacks (should be fixed).

If I understand you correctly,fio_stop() ... is equivalent to or at least implied by SIGINT...

Yes, it is. In fact, SIGINT and SIGTERM simply call fio_stop.

In addition, fio_stop will disconnect any active connections. This is unavoidable, as some of the connections might be owned by a worker process (fork) which will be shut down by fio_stop.

without necessarily running any .on_unsubscribe targets

Connection subscriptions will be terminated, so their on_unsubscribe callbacks will be called.

However, global subscriptions (connectionless subscriptions) will stay valid and they will receive new messages once the reactor is restarted.

fio_start() therefore plays a double role: once initializing facil.io's global state (i.e. checking whether it is initialized each time it is run) and running the reactor / event-loop...
...
a) split fio_start() into fio_lib_init() and fio_react() and add a public fio_lib_fini()

Actually, the global state is initialized by the fio_lib_init() function.

The fio_start() function only manages the reactor and it's concurrency state, (re)initializing the reactor in case it's been active before... so I guess that option (a) is basically implemented ;)

facil.io/lib/facil/fio.c

Line 3553 in ef041d9

static void __attribute__((constructor)) fio_lib_init(void) {

make fio_stop() put the state back to the UNINIT state as though it calls fio_lib_fini() internally.

I think maybe this is actually a valid expectation. The more I think about it the more I see the logic behind a more complete cleanup.

Some things will NOT be uninitialized. The memory allocator and some other state data should remain in effect in order to allow for new timers / connections to be setup before running the reactor again... but its a fair approach.

Maybe you could add a call to fio_timer_clear_all() into your fio_worker_cleanup() implementation and we could work a general PR for it later on, figure out if it will be back ported to version 0.7.x...

Do you have any policy regarding ABI stability

Yes, I follow semantic versioning and avoid ABI breaking changes where patch versions are concerned. Minor versions might break ABI, but I would still try to avoid it.

Then again, the library is a source code library, so ABI stability is, obviously, not something developers should expect.

Btw., I'm exporting my API to Python :)

👏🏻👏🏻🎉

…#72)

boazsegev · 2019-09-01T09:03:54Z

I've added a patch for this in the 0.7.x branch. See how that works for you.

fbrausse · 2019-09-01T10:17:26Z

Thank you for the quick fix, I will report back!

Just one quick remark: I didn't look for fio_lib_init(), thanks for the pointer. Since it is marked as constructor, to me it seems that at the moment there are not 3 states, just 2: INIT and RUNNING. That might change the options.

Unfortunately, I'm quite busy now and need time to think about it. I will come back to your detailed reply above soon.

fbrausse changed the title ~~Timers still running after~~ Timers still running after main() Aug 31, 2019

fbrausse added a commit to fbrausse/ksc that referenced this issue Aug 31, 2019

ref-count and objectify stuff, see also boazsegev/facil.io#72

370f250

boazsegev added the API design Discussions about API choices label Aug 31, 2019

fbrausse added a commit to fbrausse/ksc that referenced this issue Aug 31, 2019

ref-count and objectify stuff, see also boazsegev/facil.io#72

ed0c905

boazsegev added a commit that referenced this issue Sep 1, 2019

add Timer cleanup to worker cleanup and fix signal handlers (see issue …

ab3dca2

…#72)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timers still running after main() #72

Timers still running after main() #72

fbrausse commented Aug 31, 2019 •

edited

boazsegev commented Aug 31, 2019

fbrausse commented Aug 31, 2019 •

edited

fbrausse commented Aug 31, 2019 •

edited

boazsegev commented Sep 1, 2019

boazsegev commented Sep 1, 2019

fbrausse commented Sep 1, 2019 •

edited

Timers still running after main() #72

Timers still running after main() #72

Comments

fbrausse commented Aug 31, 2019 • edited

boazsegev commented Aug 31, 2019

Mitigation

Expected behavior?

Changing this behavior?

fbrausse commented Aug 31, 2019 • edited

fbrausse commented Aug 31, 2019 • edited

boazsegev commented Sep 1, 2019

boazsegev commented Sep 1, 2019

fbrausse commented Sep 1, 2019 • edited

fbrausse commented Aug 31, 2019 •

edited

fbrausse commented Aug 31, 2019 •

edited

fbrausse commented Aug 31, 2019 •

edited

fbrausse commented Sep 1, 2019 •

edited