Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalising state and garbage collection #1824

Closed
olalonde opened this issue Jun 22, 2016 · 31 comments
Closed

Normalising state and garbage collection #1824

olalonde opened this issue Jun 22, 2016 · 31 comments

Comments

@olalonde
Copy link

olalonde commented Jun 22, 2016

The way my state is structured right now is a bit messy.

I have multiple top level keys which are related to the currently logged in user and some keys
which aren't related to the logged in user, e.g.:

{
  // logged in user
  me: {
    id: 'olalonde',
    username: 'olalonde',
    name: 'Olivier Lalonde',
    email: 'olalonde@gmail.com',
    token: 'some-json-webtoken',
  },
  // lgoged in user's files
  myFiles: {
    pagination: { total: 6, limit: 3, page: 1 },
    data: [
      { id: 'a', filename: 'myfile.txt' },
      { id: 'b', filename: 'myfile.doc' },
      { id: 'c', filename: 'myfile.jpg' },
    ],
  },
  // lgoged in user's events
  myEvents: {
    pagination: { total: 2, limit: 3, page: 1 },
    data: [
      { id: 'a', text: 'Some event...' },
      { id: 'b', text: 'Some other event...' },
    ],
  },
  // loaded when user browses a user's profile
  activeUserProfile: {
    id: 'billgates',
    username: 'billgates',
    name: 'Bill Gates',
  },
  // loaded when user browses a user's profile
  activeUserFiles: {
    pagination: { total: 2, limit: 3, page: 1 },
    data: [
      { id: 'd', filename: 'somefile.txt' },
      { id: 'e', filename: 'somefile.doc' },
    ],
  },
}

It's a bit messy and I was looking for suggestions to refactor.

When logging out I'd have to check for a LOGOUT action in the me, myFiles
and myEvents reducers which is not really maintainable if I keep adding
stuff. Another option would be to put everything related to the logged in user
under a me key/reducer and clear me when LOGOUT is dispatched, e.g.:

{
  // logged in user
  me: {
    user: {
      id: 'olalonde',
      username: 'olalonde',
      name: 'Olivier Lalonde',
      email: 'olalonde@gmail.com',
      token: 'some-json-webtoken',
    },
    files: {
      pagination: { total: 6, limit: 3, page: 1 },
      data: [
        { id: 'a', filename: 'myfile.txt' },
        { id: 'b', filename: 'myfile.doc' },
        { id: 'c', filename: 'myfile.jpg' },
      ],
    },
    events: {
      pagination: { total: 2, limit: 3, page: 1 },
      data: [
        { id: 'a', text: 'Some event...' },
        { id: 'b', text: 'Some other event...' },
      ],
    },
  },
  activeUser: {
    // logged in user's events
    // loaded when user browses a user's profile
    user: {
      id: 'billgates',
      username: 'billgates',
      name: 'Bill Gates',
    },
    files: {
      pagination: { total: 2, limit: 3, page: 1 },
      data: [
        { id: 'd', filename: 'somefile.txt' },
        { id: 'e', filename: 'somefile.doc' },
      ],
    },
  }
}

A bit cleaner... But there will be some duplication in my files and user
reducers. Another option would be to normalise the whole thing:

{
  me: 'olalonde',
  users: {
    olalonde: {
      id: 'olalonde',
      username: 'olalonde',
      name: 'Olivier Lalonde',
      email: 'olalonde@gmail.com',
      token: 'some-json-webtoken',
      files: {
        pagination: { total: 6, limit: 3, page: 1 },
        entitites: ['a', 'b', 'c'],
      },
      events: {
        pagination: { total: 2, limit: 3, page: 1 },
        entitites: ['a','b'],
      },
    },
    billgates: {
      id: 'billgates',
      username: 'billgates',
      name: 'Bill Gates',
      files: {
        pagination: { total: 2, limit: 3, page: 1 },
        entities: ['d','e'],
      },
    },
  },
  files: {
    a: { id: 'a', filename: 'myfile.txt' },
    b: { id: 'b', filename: 'myfile.doc' },
    c: { id: 'c', filename: 'myfile.jpg' },
    d: { id: 'd', filename: 'somefile.txt' },
    e: { id: 'e', filename: 'somefile.doc' },
  },
  events: {
    a: { id: 'a', text: 'Some event...' },
    b: { id: 'b', text: 'Some other event...' },
  }
}

Now this kind of normalisation leads to new problems. How do I garbage collect
the state? e.g. If a user browses thousands of profiles, I don't want to end up
with a state that has thousands of entities. I could issue an { type: 'RESET_PROFILE_DATA', user_id: 'billgates' } action when a user's profile component is
unmounted. But how does my reducer know which piece of data can be safely removed (e.g. the logged in user state shouldn't be cleared if the user navigates outside its own profile page). We
could add special cases everywhere but it doesn't feel maintainable. Another problem is that there
could be two separate components which are using the same piece of state and it'd be hard to tell.

I guess we could keep a garbage collection counter for every entity which gets incremented when the data is needed and decremented when the data can be garbage collected, etc. But that can get quite complex.

Also, what if I need to display a user's file list simultaneously in two different components and both have their own pagination. I could have filesForComponentA and filesForComponentB instead of files?

Anyways, I've got more ideas but would be curious how people typically handle this situation.

@gaearon
Copy link
Contributor

gaearon commented Jun 22, 2016

If a user browses thousands of profiles

Is this a real use case in your app?

@olalonde
Copy link
Author

@gaearon the profiles/files thing was a made up example. But let's take something like Twitter of Facebook for example. It's not unreasonable to think a user might browse through tons of stuff before closing the browser window, right? Maybe it's not something I should worry about but it feels to me like it could eventually become an issue.

@olalonde
Copy link
Author

olalonde commented Jun 22, 2016

Do you suggest I should go with the normalised approach and worry about "garbage collection" if/when it becomes an issue?

@aweary
Copy link
Contributor

aweary commented Jun 22, 2016

I think avoiding premature optimization is generally good advice, do some stress tests on your app first and see if performance suffers, and if so you can consider optimizations then.

@naw
Copy link
Contributor

naw commented Jun 22, 2016

I think it's worth considering the perspective that denormalization itself can be a premature optimization.

In the examples above, @olalonde was potentially concerned about storing files and events in more than one place in the state tree. Storing the same kind of thing in more than one place is not intrinsically a poor choice. If you're just displaying the data (but not performing any operations on it), it's likely fine to store it without denormalizing it.

When you add operations against the data (say renaming a file, for example), then you begin to cross the line into duplicating logic in more than one place, at which point you could consider denormalization as a strategy to remove the duplication of logic.

If I had a phone number object that was stored inside two different kinds of things (say, a phone number for a main business profile, as well as phone numbers for each employee within an array of employees), then there wouldn't necessarily be any immediate value to denormalizing business phone numbers and employee phone numbers into a combined phone_numbers hash.

On the other hand, going back to the file example, if you wanted to rename a file, and you wanted such a feature to operate identically for that file regardless of whether it happened to be residing in the me key or the activeUser key, then that's perhaps a good time to denormalize in order to prevent duplication of logic in a reducer. Of course, you can also reduce duplication by building a reducer with a factory, so that you can have identical logic running in multiple places without actually repeating your code. In that case, you can eliminate duplication without needing to denormalize.

I think you have to look at the big picture of how you intend to use the data on the page before you decide when and where to denormalize. I wouldn't necessarily see your 2nd refactoring as superior to your 1st refactoring.

@gaearon
Copy link
Contributor

gaearon commented Jun 23, 2016

@naw solid advice

@gaearon
Copy link
Contributor

gaearon commented Jun 23, 2016

Do you suggest I should go with the normalised approach and worry about "garbage collection" if/when it becomes an issue?

Normally, yes.

@gaearon
Copy link
Contributor

gaearon commented Jun 23, 2016

(If you’re certain it’ll become an issue you might want to use something like Relay instead which I think handles this for you. Or you can roll your own normalized reducer factory based on schema that would keep track of what’s referenced where—but it sounds complicated.)

@olalonde
Copy link
Author

olalonde commented Jun 23, 2016

@naw thanks, good advice.

@gaearon thanks I'll have a look at Relay as well though it looks like it would require a bigger refactor.

My app state is mostly "read only" so I think I'll keep going with the denormalised approach and maybe name my keys more explicitly to make it clear which views owns which piece of state (e.g. dashboard, profileView, etc.).

@olalonde
Copy link
Author

olalonde commented Jun 23, 2016

Would be interesting to see different redux state shapes examples... I went through the examples/ directory but the apps are all relatively simple, even the "real world" one. Some links to a few "complex Redux app" repos that use different state shapes would be immensely useful. :)

@markerikson
Copy link
Contributor

Yeah, that's right along with the kind of stuff I'm hoping to cover in my "Structuring Reducers" recipe ( #1784 ). Which is actually kind of an issue - there's three or four overlapping topics, and I'm not yet sure how to approach them yet. State shape, normalizing data, various ways to define reducers, ...

@markerikson
Copy link
Contributor

That said, I do have links to a number of real and example apps over at https://github.com/markerikson/redux-ecosystem-links/blob/master/apps-and-examples.md. Looking through those apps might be instructive.

@woniesong92
Copy link

woniesong92 commented Jun 26, 2016

I have a question that might be related to "normalized state." Let's assume that we're trying to build Reddit (just like in your tutorial example) and the following comes from the API server.

subreddits: [{
  title: "food"
  posts: [{
    id: "",
    body: "..",
    comments: [{
      id: ".."
      body: "..",
    }]
    ..morePosts
  },
  title: "culture"
  posts: [{
    id: "",
    body: "..",
    comments: [{
      id: ".."
      body: "..",
    }]
    ..morePosts
  },
]

What's the conventional way of using reducers for the above response? I am generally following the idea of having a look-up table (e.g. subredditByTitle) but I remember seeing from somewhere that I shouldn't nest PostsReducer and CommentsReducer under the same SubredditsReducer. I first started out by having only subreddits reducer in the top-level, but I realized that once I start nesting the reducers, I have to pass down the actions related to Posts and Comments through subreddits reducer.

Should it look like this?

subredditByTitle: {
  food: {
    id: subreddit_1,
    title: "food"
    posts: [post_1, post_2]
  }
  culture: {
    id: subreddit_2,
    title: "culture"
    posts: [post_3, post_4]
  }
}

postsById: {
  post_1: {
    body: ".."
    comments: [comment_1, comment_2]
  },
  post_2: {
    body: "..",
    comments: [comment_3, comment_4]
  }
}

commentsById: {
  comment_1: {
    body: ".."
  },
  comment_2: {
    body: ".."
  }
}

@markerikson
Copy link
Contributor

First, the typical layout for an app with normalized data would be something like:

{
    someAppData1 : { ..... },
    someAppData2 : { ..... },
    entities : {
        EntityType1 : {
            byId : {
                et1id1 : {},
                et1id2 : {},
                // etc
            },
            items : ["et1id1", "et1id2"]
        },
        EntityType2 : {
            byId : {
                et2id1 : {},
                et2id2 : {},
                // etc
            },
            items : ["et2id1", "et2id2"]
        },
    }
}

So, put all the normalized data under a parent key, have a key for each type of item, and then those keys keys contain the lookup tables and ID arrays.

The next issue is how to structure the logic for managing those entities. Some of your reducer logic could be fairly generic, like "look up an item based on its type name and its ID, and update its attributes". Some may be specific to a certain entity type. Some may require access to multiple entity keys at once. This is where you need to start thinking outside the box of combineReducers, and use that in conjunction with other approaches. For example, you can use combineReducers to write your "strictly per-domain" logic, write another function that takes the entire entities state and does cross-entity-type behavior, and use the reduceReducers utility to run the two in sequence and produce your overall "entities" reducer.

@woniesong92
Copy link

@markerikson Thank you, that's helpful. One more question: where should isFetching fit in the normalized state? If I am fetching EntityType1 I can probably do the following:

EntityType1 : {
    byId : {
        et1id1 : {},
        et1id2 : {},
        // etc
    },
    items : ["et1id1", "et1id2"],
    isFetching: true
},

But what if I am updating/creating a single entity? For example, when I am fetching posts I will update isFetching of posts entity. But when I am updating, creating, or deleting a single post where should the state of isFetching go?

@markerikson
Copy link
Contributor

Haven't actually dealt with that concern myself. I suppose it might depend on whether you're expecting to be fetching multiple entities with multiple requests at once, or only one request out at a time. Could either put it on each entity value if you're expecting to do multiple requests, or having a single fetching : "someId" value at the top of the EntityType1 section (next to byId and items).

You might also want to look at some of the utilities listed in my Redux addons catalog, particularly the libs that try to handle collection CRUD, network requests, and more network requests, and see how they handle things.

@olalonde
Copy link
Author

Some links to a few "complex Redux app" repos that use different state shapes would be immensely useful. :)

For future reference, I found the new Wordpress to be a good example of a relatively large React/Redux app: https://github.com/Automattic/wp-calypso/tree/master/client

@woniesong92
Copy link

woniesong92 commented Jun 26, 2016

@markerikson Thanks. I have one last question. Let's say I have the following DB structure where Post is a subdocument of Page and Comment is that of Post.

  Page
    Post
      Comment

Then the normalized state looks like this (using subdocuments):

pagesById: {
  page_1: {
    pageTitle: "Elmo",
    posts: [post_1, post_2]
  }
},
postsById: {
  post_1: {
    body: "First post"
    comments: [comment_1, comment_2]
  },
  post_2: {
    body: "Second post"
    comments: [comment_3, comment_4]
  },
},
commentsById: {
  comment_1: {
    body: "First comment"
  }..
}

When I add a post, I have to do two things:

  1. In pagesById: add the newly created post_id to pagesById[page_id].posts
  2. In postsById: add the newly created post object

Doing these two actions when I add a new post feels a little awkward because of how Pages reducer is responding to an action that's related to Posts. Similarly, when I want to add a comment, Posts reducer has to respond to an action that's related to Comments. Is this a normal behavior?

On a similar note, I am curious whether the use of subdocuments in Mongoose when I am using Redux is discouraged. If I weren't using subdocuments, the redux state would look slightly different (not using subdocuments):

pages: {
  allPageIds: [page_1],
  pagesById: {
    page_1: {
      pageTitle: "Elmo",
    }
  },
},
posts: {
  allPostIds: [post_1, post_2]
  postsById: {
    post_1: {
      page_id: page_1,
      body: "First post"
    },
    post_2: {
      page_id: page_1,
      body: "Second post"
    },
  },
},
comments: {
  allCommentIds: [comment_1, comment_2, comment_3, comment_4]
  commentsById: {
    comment_1: {
      post_id: post_1
      body: "First comment"
    }..
  }
}

With this new state, only the Posts reducer will have to respond to the ADD_POST action because we can just update posts.allPostIds and posts.postsById.

I'd appreciate any insight 😄

@markerikson
Copy link
Contributor

Yeah, updating the relational info for a parent item when creating a new child (as one example) is absolutely valid. It's not that, say, your Comments reducer is the only thing that should do work when you add a comment. That's really the point of both the normalization of data, and the multiple-reducers-responding-to-one-action concept: any reducer function can respond appropriately to any given action.

It would also be entirely valid to take a similar-ish but different approach, where the entities reducer would specifically update the posts section and the comments section rather than delegating it to the individual sub-reducers, on the grounds that perhaps having the two related steps be done explicitly within one handler might be easier to follow.

I've never touched Mongoose myself, and have no idea what a "subdocument is", so can't help you there.

@woniesong92
Copy link

@markerikson awesome. Thank you. @gaearon do you have an opinion on using subdocuments?

@timdorr
Copy link
Member

timdorr commented Jul 31, 2016

I'm going to close this out in favor of Mark's documentation efforts. I think there's enough here to going on for your original question and it's now down to things more specific to your project wants and needs.

@Gargron
Copy link

Gargron commented Apr 30, 2017

I'm interested in ways to garbage collect normalized data in Redux. This has become a problem in the Mastodon web UI, as the users have a habit of leaving it open with the firehose of content on and browsing a lot of stuff (mastodon/mastodon#787). It's unfortunate that OP's concerns were dismissed, as it would have been nice to find a ready solution here.

@neurosnap
Copy link

neurosnap commented May 1, 2017

We are creating an email app where we persist the state to cache so on reboot it gets hydrated and looks like a normal desktop app. Since we are dealing with an email app, we have to be very careful about what we store in our redux state. Since we persist the state on reboot it is imperative that we properly manage it as it is extremely easy to see our thread and message slices reach 10k+ records.

As it stands today we are very aggressive with pruning, but we see a not-so-distant future where this luxury is no longer viable.

It's really not an ideal solution but we essentially listen for when a thread gets added to a folder and then kick-off a saga that will scan for any de-referenced threads and then cascade remove any corresponding messages to that thread.

It does feel like we are reinventing the garbage collection "wheel" here but I'm not exactly sure what other solutions we have.

@markerikson
Copy link
Contributor

Out of curiosity, how would this get handled in any other client-side application/framework?

I've noted several times that Redux is (as far as I can see) really no different than any other client-side technology in terms of caching data and memory usage, it's just that all the data is attached to one tree rather than split up into separate "model" instances or something.

@paynecodes
Copy link

I, too, am curious about how others are doing this sort of thing. I keep thinking of implementing a state path subscription model would work best, but I haven't thought it all the way through... Where to "subscribe" or otherwise signal intent to keep a given state path (selectors)? When to perform clean up (after several "ticks" with no further subscriptions to that path)? How to uniformly perform a cleanup (dispatch some standard action with enough context, and allow reducers to handle it)?

Then again.. this could be a terrible idea...

@aikoven
Copy link
Collaborator

aikoven commented Jun 26, 2017

My approach for garbage collection is reference counting items that are in use by any component. I have a number of HOCs that dispatch actions on mount/unmount which result in inc/dec of counters. Then there's a task that runs in an interval that prunes unused items.

@mrpmorris
Copy link

mrpmorris commented Jun 27, 2017

I dislike the way state is normalized. Nested reducers with nested state makes more sense to me, and it's so easy to garbage collect when you aren't using it any more.

@markerikson
Copy link
Contributor

@mrpmorris : It's entirely up to you how you structure your own app's state. Redux doesn't enforce any particular approach. However, there are very good reasons for normalization. Quoting the Structuring Reducers - Normalizing State Shape docs page:

Compared to the original nested format, this is an improvement in several ways:

  • Because each item is only defined in one place, we don't have to try to make changes in multiple places if that item is updated.
  • The reducer logic doesn't have to deal with deep levels of nesting, so it will probably be much simpler.
  • The logic for retrieving or updating a given item is now fairly simple and consistent. Given an item's type and its ID, we can directly look it up in a couple simple steps, without having to dig through other objects to find it.
  • Since each data type is separated, an update like changing the text of a comment would only require new copies of the "comments > byId > comment" portion of the tree. This will generally mean fewer portions of the UI that need to update because their data has changed. In contrast, updating a comment in the original nested shape would have required updating the comment object, the parent post object, the array of all post objects, and likely have caused all of the Post components and Comment components in the UI to re-render themselves.

Note that a normalized state structure generally implies that more components are connected and each component is responsible for looking up its own data, as opposed to a few connected components looking up large amounts of data and passing all that data downwards. As it turns out, having connected parent components simply pass item IDs to connected children is a good pattern for optimizing UI performance in a React Redux application, so keeping state normalized plays a key role in improving performance.

@paynecodes
Copy link

@aikoven That seems similar to what I'm thinking of. I'm curious, though... How are you describing the state paths which should be cleaned up? Some kind of DSL?

@aikoven
Copy link
Collaborator

aikoven commented Jul 1, 2017

@jpdesigndev The data is organized as follows:

{
  collection1: {
    pk1: item1,
    pk2: item2,
    ...
  },
  collection2: {
    pk1: item1,
    pk2: item2,
    ...
  },
} 

Garbage collector dispatches action with payload

{
  collectionN: <array of pks to remove>,
  collectionM: ...,
}

@cybrown

This comment has been minimized.

@reduxjs reduxjs locked as resolved and limited conversation to collaborators Oct 30, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests