Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mongoose Dictionary Type #5030

Closed
lefnire opened this issue Apr 15, 2015 · 12 comments
Closed

Mongoose Dictionary Type #5030

lefnire opened this issue Apr 15, 2015 · 12 comments
Assignees

Comments

@lefnire
Copy link
Contributor

lefnire commented Apr 15, 2015

To complete the tasks=>collection migration, we'll need a new Mongoose type Dictionary. Really this type would come in handy in many other situations, but it's quite required for the T=>C migration.

Here's the story. In MongoDB, storing arrays turns out to be stupid. Fine for very small arrays (eg, user.preferences.hooks); terrible for very large or commonly-accessed arrays (eg, user.habits). What happens is, in order to prevent race conditions on array indices, Mongoose adds an extra ._v parameter to the query on array updates. This slows down queries tremendously, creating major scalability bottlenecks at the DB layer. The solution is to store arrays as objects. Then in order to treat those objects as lists, you add a sort field for sorting, and some app logic (typically using Lodash's collections API). More and more you see vendors stop supporting arrays at the DB layer and instead use objects (eg, Firebase's post).

So as an example, instead of storing user.habits as an array:

habits: [{
  value: Number,
  text: String,
  ...
}]

we want to store it as an object:

habits: {
  <habit-id>: {
    value: Number,
    text: String,
    sort: Number,
    ...
  }
}

Which is fine and dandy. The problem here is that <habit-id>. Those familiar with Mongoose know you can't do that - you can't have dictionaries, where a dynamic key maps to an object with a defined schema. What you can do is have habits: {type: Schema.Types.Mixed, 'default': {} }, but then you don't get Schema validation, default values, path.markModified(), etc. and you have to validate your dictionaries manually. You can see this at play presently in a few situations: quests, filters, messages, and many more. In these situations we handle validation in app logic, and trigger .markModified() manually.

So from my searching, there's no Mongoose support for this style of Dictionary attributes, nor any existing Mongoose plugins for such (someone may wanna double check in case my Google Fu is failing me). The task at hand then is to create a new Mongoose plugin to add a new type called Dictionary. You can see how to add new types with this example. Existing types here, a likely type to copy/paste from is Array. Model defs then might look like the following:

habits: {type: Dictionary, schema: {
  value: Number,
  text: String,
  ...
}}

where the key is dynamic, the schema is handled as normally the case in Mongoose (pre-save validation, type-checking, etc), and .markModified() is handled automatically. Further, it may be useful to add support "value dictionaries" (for lack of a better word) where the dictionary doesn't map to an object, but a single value. Eg, user.filters is an object like {<uuid>:true, <uuid>:false} (code). In this case, we may extend our Dictionary type so that it would work as such:

filters: {type:Dictionary, schema:Boolean}

Ideally this dictionary type would be open-sourced to the mongoose plugin community (see "Community!"), maybe as mongoose-dictionary-type or something.

@paglias
Copy link
Contributor

paglias commented Apr 15, 2015

For tasks -> collections migrations it's needed just as an intermediary step? Like array -> dictionary -> collection?

@lefnire
Copy link
Contributor Author

lefnire commented Apr 15, 2015

@paglias we should G+ about this sometime, I'm thinking much different / more complex than before. Firstly, I'm thinking a top-level collection called tasks. I'll use a Mongoose discriminator to change schema based on task type, but keeping a top-level collection tasks (rather than individual collections habits, dailies, etc) will reduce the number of queries when mapping the collection to an owner.

Each task will have top-level attrs under shared which are defined by the owner (user, challenge, template, etc) and "overrides" which are pertinent where multiple users are participating in a single task (eg, with challenges). That overrides attr (called members) will has a schema, but the keys are the user IDs. So that's that dictionary type at play.

Very choppy version of my thought here, and I'll explain more about that structure over a G+ sometime.

@lefnire
Copy link
Contributor Author

lefnire commented Apr 15, 2015

And also, if Mongoose 4.x Plugins API is different than 3.x, let's favor 4.x API and upgrade our mongoose to 4.x during the T=>C migration

@paglias
Copy link
Contributor

paglias commented Apr 15, 2015

Putting my thoughts here so I won't forget them:

  1. Here there are all the breaking changes & new features in Mongoose 4.0
  2. @lefnire I noticed there are some challenges that are exceeding the 16mb Mongo document limit. I suppose 99.9% of the size is due to the users' tasks inside, we won't run into the same problem with the overrides?

@lefnire
Copy link
Contributor Author

lefnire commented Apr 15, 2015

@paglias that's a very good point. The challenge schema is pretty slim - users' tasks aren't stored in the challenge, instead they're looked up later as needed. However, all challenge.tasks' history is augmented with each user interaction, and if that's adding too much space then we'll definitely hit the same issue with this direction. Do you have a sample challenge UUID I can investigate?

@paglias
Copy link
Contributor

paglias commented Apr 15, 2015

@lefnire from New Relic I think ad13b407-c8cc-49c9-9c8e-d4c2747b9ec4 could be one of those, see this error https://rpm.newrelic.com/accounts/565514/applications/3386694/traced_errors/3104670647

@lefnire
Copy link
Contributor Author

lefnire commented Apr 15, 2015

Interesting. Indeed, the history entries for each task in that challenge are some 60k entries, a very likely culprit. (We should start pruning challenge histories, or record less). The members array is 918 entries of strings, so that's probably not the culprit here. I'll think on the new schema proposal, firstly we'd crack down on history really well should save room, but I wonder if indeed it would still require too much space...

@paglias
Copy link
Contributor

paglias commented Apr 15, 2015

The best idead that came to my mind for the dictionary type is to have a Schema.Types.Mixed field with a validator as exposed here that will use another schema to check each item in the dictionary:

var myschema = new mongoose.Schema({
   field : mongoose.Schema.Types.Mixed
});

var dictionarySchema = new mongoose.Schema({
    text: Number,
    notes: String,
    ... others fields....
});

myschema.path('field').validate(yourCustomValidator, 'your custom vaildator failed msg');

function yourCustomValidator (value) {
  Object.keys(value).forEach(function(dynamicKey){
    new dictionarySchema(value[dynamicKey]).validate(function(err){
      if(err) // an embedded document is invalid, break loop and invalidate parent document
      // go ahead
    })
  });
}

var M = mongoose.model('M', myschema);
var m = new M;
m.field = { might: {text: 'I should be a number, validation will file'}, for: {text: 321} }
m.save(); // Will fail in yourCustomValidator

I'm quite sure it works but:

  1. Performance of validating every item in the dictionary every time as we can't track edits to a Mixed type

I found this feature request in Mongoose that asks for the same thing and asked the maintainer of the project if he thinks we would have any problem developing a Dictionary type, mostly i'm worried by tracking edits/accesses to the items in the dictionary

@lefnire
Copy link
Contributor Author

lefnire commented Apr 15, 2015

Good find on the other ticket! Reading now.

I wonder if we should story history entries as a collection... they're so rarely used, we could do an AJAX lookup on request?

@paglias
Copy link
Contributor

paglias commented Apr 15, 2015

For challenges I think we should have the tasks schema defined inside the challenge and when a user joins the challenge we create a new tasks belonging to the user (so with its own document) and a reference to the challenge (task.{parent|challenge|whatever}) and then we could query for the single user's tasks when necessary.

I think it'd the best approach for perfs, mongo...

@paglias
Copy link
Contributor

paglias commented Apr 15, 2015

@lefnire I found how to implement the dictionary type, it'll use a slightly modified version of mongoose's embedded document type.

But there's something we can't avoid: when adding a new item to the dictionary like:

dictionary.<user-id> = // new obj

we cannot use a simple assignement because there's no way of intercepting it, we're going to need using something like:

dictionary.$add(<user-id>, new obj);

also if we want to set it to undefined or null and then back to an object we'll have to use $add there too.

I think I can put up a working version quite quickly and then when we'll have the rest of the code we'll make sure it fits our needs

@paglias paglias closed this as completed Jul 10, 2015
@alexmakeev
Copy link

Alternative is to wrap Array into Dict:
Like [{key:"asd", val:123}]
I have wrote such wrapper and use it when needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants