Skip to content
This repository has been archived by the owner on Dec 16, 2019. It is now read-only.

Workers staying idle, when pools are used #916

Open
ekr3peeK opened this issue Jan 16, 2019 · 6 comments
Open

Workers staying idle, when pools are used #916

ekr3peeK opened this issue Jan 16, 2019 · 6 comments

Comments

@ekr3peeK
Copy link

ekr3peeK commented Jan 16, 2019

I've encountered a strange behavior, and was wondering if this is a bug, or if it is intended this way. I am using the pthread build of 3.2.0 - 7.2 - ts - vc15 - x64, with php version of 7.2.14.

class Work extends Threaded {
    public $id;

    public function __construct($id) {
        $this->id = $id;
    }

    public function run() {
        if ($this->id == 0) {
            sleep(3);
            echo $this->id . " is ready\n";
            return;
        } else {
            echo $this->id . " is ready\n";
            return;
        }
    }
}

$pool = new Pool(2, 'Worker', []);
for ($i=0; $i<4; $i++) $pool->submit(new Work($i));
while ($pool->collect());
$pool->shutdown();

I've expected this code to output:

1 is ready
2 is ready
3 is ready
0 is ready

Instead of this, the output is:

1 is ready
3 is ready
0 is ready
2 is ready

For me, it seems that worker 1, gets assigned job 0, and job 2 at the get go, thus worker 2, after finishing job 1 and 3, just waits, instead of taking over job 2 from worker 1. If job 0 and job 2, both are time consuming operations, this script would run in much more time, then if when worker 2 gets idle, would take over job 2 from worker 1.

If this is the intended behavior, could someone please explain to me why it is done in this manner?

@LuckyFellow
Copy link

You are creating a pool of 2 workers.
A pool assigns jobs in a round robin manner. It does not distribute work to the next free worker.
So in your case the work assignment to workers is:

Worker1: Work#0
Worker2: Work#1
Worker1: Work#2
Worker2: Work#3

After assigning Work#0 to Worker1, it pauses for 3 seconds.
In the meantime Worker2 finishes Work#1 and Work#3.
Then (3 seconds later), Worker1 continues to finish Work#0 and processes Work#2 afterwards.

Hint:
Please don't use sleep() or usleep(). Read here why: #481

@ekr3peeK
Copy link
Author

I see that this is intended behaviour, but still, isn't this sub optimal? With the old isWorking() method, even with my sleep example, I was able to reach my desired output. sleep is only to see this in effect, I could easily replace the sleep function, with a for loop with 1.000.000 calculations, the results would still be the same.

@LuckyFellow
Copy link

LuckyFellow commented Jan 22, 2019

Personally I agree, that a round robin distrubution might be sub optimal.
If the tasks you submit have about the same calculation times/efforts, round robin might be fine.
In my current project, this is not the case though. So I refused using the Pool class and instantiate Workers myself and implemented my own distribution algorithm.

Personally I would appreciate, if there was a way to extend the Pool class to implement customized distribution algorithms.

Implementing my own distribution algorithm, made me aware though, that relying on an "isWorking"-State in a thread environment, is inaccurate. If such a method is provided, you tend to rely on it, but you can not and end up in endless loops, eventually.
Personally I would appreciate, if a worker would not remove a task from its stack, until the task has finished. Then the Worker::getStacked method would return >0 as long as the worker has tasks to finish.
This would be reliable and safe, as of my current understanding.

@dktapps
Copy link
Contributor

dktapps commented Jan 22, 2019

@LuckyFellow the problem with that proposal is unstack(), you can't safely do that and also permit unstacking a worker.

@dktapps
Copy link
Contributor

dktapps commented Jan 22, 2019

A possible method to implement this could be to have a central task stack in the Pool which all workers in the pool share and draw tasks from as they complete current tasks. I'm not sure how this would fit together with standalone workers though.

@LuckyFellow
Copy link

@dktapps you are familiar with the actual internals and I can sense, that things get tricky there and cause implications / side effects.

But I was wondering, if the workers internal stack was a Threaded object, then this should hold true:"Threaded objects, most importantly, provide implicit safety for the programmer; all operations on the object scope are safe."
Does it?

If it holds true, then maybe the worker could internally flag the task, which is going to be executed as "being processed" and all externally callable stack manipulating methods must ignore all tasks on the stack, which are flagged as "being processed".
If this is possible, it might be helpful to be provided a way to check, whether a task (the first task) is being processed. Because if Worker::getStacked returns 1 and my logic tries to Worker::unstack everything, I possibly create a crazy loop which consumes 100% CPU.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants