TypeError: Cannot read property 'run' of undefined #458

tzaitsev · 2015-06-17T20:03:53Z

We use WhenJS in a majority of our services and servers across a pretty large stack. Since upgrading to 3.7.3 we have started seeing the following log and have no idea where to start tracing this down:
To give some granularity it seems to hit constantly and throws constant errors after some amount of time After muting 1000 or so servers the remaining 5 have thrown this error 16million+ times in the last 3 days.

Uncaught exception: TypeError: Cannot read property 'run' of undefined
TypeError: Cannot read property 'run' of undefined
at Scheduler._drain (node_modules/when/lib/Scheduler.js:70:23)
at Scheduler.drain (node_modules/when/lib/Scheduler.js:27:9)
at process._tickDomainCallback (node.js:381:11)

Any help would be much appreciated.

briancavalier · 2015-06-17T21:17:54Z

Hey @tzaitsev. Hmmm, that's a serious-looking issue, but I'm not aware of any changes in 3.7.3 that would cause such a thing. What version did you upgrade from?

tzaitsev · 2015-06-17T21:31:01Z

We moved from 3.1.0 to 3.3.1 for a few days until we figured out how to fix an issue we had with going past 3.3.1

Then to 3.7.3 in the last few weeks

-- We had an issue with the promise monitor not allowing promise chains to be garbage collected and our processes dying when they ran out of memory so we wrote our own context implementation of promiseMonitor:

https://gist.github.com/tzaitsev/1f01349cc7a9f7a9b742

Which we use through:
var promiseMonitor = new PromiseMonitor(exports.logger); //exports.logger has a log function so we treat it as a reporter
promiseMonitor.monitor(exports.when.Promise);

Really there isn't much else custom to the usage of when.

briancavalier · 2015-06-17T22:38:59Z

Thanks. What version(s) of node or io.js are you seeing the problem on?

We had an issue with the promise monitor not allowing promise chains to be garbage collected

In general, the PromiseMonitor is fairly heavy-weight, and I don't recommend it for production use, only for debugging. That said, if you have more information on this, I'd def be interested in trying to track down and fix the leak.

we wrote our own context implementation of promiseMonitor

Well done! Especially considering the context bits aren't really documented at all :)

tzaitsev · 2015-06-17T23:03:48Z

Node v0.12.0

@xamado will provide more details about the promise monitor bug and what we saw.

xamado · 2015-06-19T01:24:26Z

Sorry I'm late to the party.

The problem we were having with the PromiseMonitor was that after several hours of running on our servers, we saw how the CPU gradually started spiking, up to the point where it was consuming 100% cpu and not doing any work. We also saw the heap memory usage go up gradually.

By doing some heapdumps and spending countless hours following traces, I saw that there were links between objects via a "context" path (path being a reference from one object to the other via a reference). And usually the "distance" (number of paths traversed to get from the first object to the last one) was a huge number, over 10k, clearly indicating some kind of cyclic reference.

My guess is that the cpu usage came from something trying to iterate this humongous path list to either extract a stack trace or to attempt garbage collection.

At first I thought maybe we were somehow linking Promises together, since I also saw thousands of Promise, Fullfilled and Rejected objects in the heap dump. But after removing the PromiseMonitor and replacing it with my own custom lean promise monitor (the one Tim just provided), the problem went away. This would seem to indicate that the problem is in when and/or PromiseMonitor itself.

As for the reason to wanting to use the PromiseMonitor in production, it's simply because it's useful for debugging promises (sometimes) and we rather pay the cost of needing extra infrastructure at this point than spending countless hours trying to debug something without proper context :)

Last but not least, on a kinda related note, by using the promiseMonitor we shared, we can now have a logging context that is passed on to any other promises that are fired in the chain (this would be a really useful tool to have in when itself). And it works wonderfully for the most part, but while testing I found out that it seems to fail as soon as I do a when.defer(). I haven't had the time to sit down and debug the internals yet, but it would seem that in such a case enterContext is not called, thus the next time you exitContext we end leaving the "log stack" in a weird place. And I'm wondering if this could not be related to the actual bug with the PromiseMonitor, where promises are getting linked together to every other promise because of a leftover in the executionContext.

Anyway, let me know if something didn't make sense or you would like additional information.

And yes, as you said it's not documented at all, and it took a lot of trial and error to get it to "work", and then again, maybe I'm still doing something wrong :)

Thanks!

briancavalier · 2015-06-19T01:26:18Z

@tzaitsev Thanks

A few thoughts. The particular stack trace points to when's internal task queue, the one it uses to trampoline all promise-related tasks (such as then'ed callbacks). It's probably one of the hottest code paths in the whole lib.

There haven't been any significant changes in code related to placing items into or reading them out of that queue in 3.7.0-3.7.3, and 3.7.0 was released 6 months ago. That's an extremely long time for such a potentially serious error to go unreported given that when.js is a reasonably popular lib. I'm not saying it's impossible :) just that I would have expected someone to hit an error like this pretty quickly.

That has me wondering if there is something particular about your environment or your application that is setting up a scenario in which some very subtle bug is triggered. I have no idea what those conditions might be, though.

Any chance you've been able to narrow down the situations under which it happens? Or possibly have found a way to create a reproducible test case? If we can find a way to reproduce it, that'd be a great start.

Also, you could try other when.js versions to see if there's a particular version where a bug seems to have been introduced. For example, you could try jumping back to 3.6.4 since there were changes to the Scheduler from 3.6.4 to 3.7.0. That might get us some useful info as well.

Another possibility, unfortunately, is that this could be a bug in v8. That may sound like a remote possibility, but a while back we found and reported a serious bug (see also #345 (long!) and #403) in the optimizing compiler of node 0.10's version of v8. So it is a possibility.

Thanks for your patience in working through this. The more info we can get, the better we'll be able to figure out what's going on.

briancavalier · 2015-06-19T01:42:20Z

Thanks for the detail, @xamado.

after several hours of running on our servers

Wow. PromiseMonitor def wasn't intended to be used in production for hours. Unfortunately, I don't really have any way of testing it under those conditions. Like I said, tho, if we can find a way to improve it, I'm totally game :)

I saw that there were links between objects via a "context" path

Yes, the monitor maintains a virtual call stack (composed of actual stack traces, since promises span many actual call stacks) in order to be able to stitch together long virtual stack traces when reporting errors. However, the data it keeps in the virtual call stack doesn't contain any back-references to promise objects (unless of course, there's bug :D). I'm not sure how promises themselves were being locked in memory ... curious!

A potential solution that comes to mind might be to simply cap the size of the virtual call stacks. I'll have to look into that.

by using the promiseMonitor we shared, we can now have a logging context that is passed on to any other promises that are fired in the chain

Hmmm, I'd be interested in hearing more about this. It sounds similar to what PromiseMonitor already does (using enter/exitContext) in order to collect contexts for the virtual call stack.

while testing I found out that it seems to fail as soon as I do a when.defer()

Interesting, that sounds like it could be a bug. I'll look into it.

Thanks!

briancavalier · 2015-07-12T19:08:03Z

@tzaitsev @xamado I still haven't been able to reproduce the original issue. Any word on a possible test case to reproduce? Thanks!

YouriT · 2017-07-22T11:12:01Z

Getting this exact same. Originally because of CPU usage but then I also got it on start:

TypeError: Cannot read property 'run' of undefined
    at Scheduler._drain (/app/node_modules/when/lib/Scheduler.js:70:23)
    at Scheduler.drain (/app/node_modules/when/lib/Scheduler.js:27:9)
    at _combinedTickCallback (internal/process/next_tick.js:73:7)
    at process._tickCallback (internal/process/next_tick.js:104:9)

Running on node 7.10.1 and when 3.7.8
Seems quite random, I got it many times in the past days but I can't find a way to reproduce yet. Will follow up.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: Cannot read property 'run' of undefined #458

TypeError: Cannot read property 'run' of undefined #458

tzaitsev commented Jun 17, 2015

briancavalier commented Jun 17, 2015

tzaitsev commented Jun 17, 2015

briancavalier commented Jun 17, 2015

tzaitsev commented Jun 17, 2015

xamado commented Jun 19, 2015

briancavalier commented Jun 19, 2015

briancavalier commented Jun 19, 2015

briancavalier commented Jul 12, 2015

YouriT commented Jul 22, 2017 •

edited

TypeError: Cannot read property 'run' of undefined #458

TypeError: Cannot read property 'run' of undefined #458

Comments

tzaitsev commented Jun 17, 2015

briancavalier commented Jun 17, 2015

tzaitsev commented Jun 17, 2015

briancavalier commented Jun 17, 2015

tzaitsev commented Jun 17, 2015

xamado commented Jun 19, 2015

briancavalier commented Jun 19, 2015

briancavalier commented Jun 19, 2015

briancavalier commented Jul 12, 2015

YouriT commented Jul 22, 2017 • edited

YouriT commented Jul 22, 2017 •

edited