Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upserts #766

Open
tintin10q opened this issue Feb 4, 2022 · 19 comments
Open

Upserts #766

tintin10q opened this issue Feb 4, 2022 · 19 comments

Comments

@tintin10q
Copy link

Is your feature request related to a problem?

Often when handling data, you don't know if you handle new data or a record you already have. For instance, if you often fetch a user's data from some api then you don't know if you have already fetched/inserted that before. Especially if there are relations that might change. For instance, users watch history on Netflix. You don't know if the shows that the user watched already exist in the entities.

Currently, if you do an insert, the insert will replace already existing data on the model with default values if the data was not given.

Let's say I have this data:

{id: 1, age:3, name: "tintin"}

If I insert this, then it works fine. But if you, after that, insert:

{id: 1, age:4}

Then this will actually remove the name field and set it to the default value.

This means that always before inserting new data you have to check if it is already present and if it is, do an update instead.

Describe the solution you'd like

I think an .upsert() is a nice solution for this.

upsert is a mix between insert and update. If there is no model yet with this id it inserts a new one with the default values. If there already is a model with this id it just updates the field given in the input. In this case it does not generate the default fields for the other fields, but just leaves them as is.

Describe alternatives you've considered

The alternative is checking if the model with the id you try to insert or update already exists first and then taking the update or insert action. I think adding an upsert method that does this for you is a good addition. Especially if you try to insert a large object with a lot of relations and you don't want to check for each relation if it already exists.

@tintin10q
Copy link
Author

Is there maybe already some other way to achieve this that I missed?

@cuebit
Copy link
Member

cuebit commented Feb 4, 2022

Curious to know if you've come across insertOrUpdate in the docs? This would be the upsert equivalent.

@tintin10q
Copy link
Author

That seems to be exactly what I needed! Thanks. Somehow I have read over it. I already found it weird that I couldn't find something like this. Thank you!!

However, I have just tried it, and it seems really, really slow! Like what took maybe 1-2 seconds at most to insert now takes almost 30 seconds, and the only thing I changed was insert to insertOrUpdate.

@tintin10q
Copy link
Author

Yes I just did some more testing it is really really slow. Doing an if to test if it already exists and then choosing update or insert is fast. But insertOrUpdate is really really slow. It freezes the page.

@tintin10q
Copy link
Author

Never mind, update alone is also really slow and freezes the page. How many updates is ok?

It should not freeze the page right? Because it returns a promise?

@cuebit
Copy link
Member

cuebit commented Feb 4, 2022

@tintin10q how large is your data set?

@tintin10q
Copy link
Author

tintin10q commented Feb 4, 2022

How would you find that out? The objects I insert at the time are around 10kb. But some of that is not inserted.

At the moment there are, 11543 objects. But by far the most is around 8000 images objects. There are 8000 because it stores the image links for various sizes. I could perhaps try to only store one size. Image has polymorphic relations with some other types. Does that maybe slow it down? I could also try to store images on fewer things.

What would you consider large? Because I think I will have a lot more data than this in the future.

@tintin10q
Copy link
Author

Some things also actually seem to be inserted multiple times. There should right now only be around 2-3 human objects but there are like 101 of them. So I don't know why that happens yet. But anyway there are now around 11500 objects. Is that a lot?

@tintin10q
Copy link
Author

Every time I insert the 10 kb object it inserts 20 (empty) humans more than needed. So that is also weird.

@tintin10q
Copy link
Author

So far I have tried to build the thing I insert so that I insert the whole thing in one go so that it makes the relationships in one go. Is it faster to do smaller insertOrUpdate? At least the page might not freeze.

@tintin10q
Copy link
Author

@cuebit What would you consider a large dataset?

@tintin10q
Copy link
Author

Do you know of any way to not freeze the ui when inserting/updating data into the vuex orm? Maybe somehow with web workers.

@Tofandel
Copy link

Tofandel commented Mar 10, 2022

Anything more than 1000 in JS is a very large dataset, and might not be appropriate for frontend use, I have some apps with lazy loading of data still using 4000-5000 items on a page load to display charts and it's at the limit of what is acceptable in loading speed, it takes ~2 secs to display

@Tofandel
Copy link

It all depends on how critical your data is and how you load it. You can insert it in intervaled chunks of 500, otherwise the CPU needed to process all the data will for sure give you a big performance hit

@tintin10q
Copy link
Author

Thank you. I will spit up the the inserts themselves into around 500 things. But it seems like insertOrUpdate can be really slow. Do you happen to know if things get slower the more data you have stored?

@tintin10q
Copy link
Author

tintin10q commented Mar 10, 2022

It only seems to get slow with insertOrUpdate when I try to insertOrUpdate the same data again. If the data does not exist yet then it is super fast but if the data already exists it takes minutes. So the updating seems te be super slow. I am going to see how much I can insert at a time.

@Tofandel
Copy link

Tofandel commented Mar 10, 2022

Do you mean you're doing one insertOrUpdate of initial data with says 5000 rows and then another insertOrUpdate with those same 5000 rows?

Then it makes sense that it's slow, because if it's found (and it will be for each record) it will run a partial update on each record and needs to check for changes even if there is nothing changed, that's expensive as the objects get merged into themselves, instead you should really keep track of the updated records if you want to insertOrUpdate the same dataset and only update those, or just do a .create() to clear out all data and only create what you're passing, this will be more efficient if you pass the full dataset

@cuebit
Copy link
Member

cuebit commented Mar 10, 2022

@tintin10q depends on the complexity of your model definition and the number of relationships involved.

Where there are existing records, Vuex ORM will iterate through the fields in your model definition and map to those visible in your data structure. Vuex ORM only checks for existence, not whether the value has indeed changed, and updates accordingly.

Furthermore, if you have multiple relationships, depending on how deep those relations exist in your data structure and models, the operation becomes expensive particularly for updates since the iteration is n*r where n is the number of records for single model and r is the number of relations applicable to that model. This further multiplies the deeper the relations exist in your data structure.

It's expensive because large datasets coupled with complex model definitions with a range of relations require a number of iterations and round trips.

When dealing with large datasets it's often good practice to insert data that's needed. I can't imagine a real world interface where someone would be dropping 5000 records with nested relations to display in a component hierarchy without expecting delays.

@tintin10q
Copy link
Author

Ok so I think I should basically save the data of the person before inserting the relationships and then just update it back. That seems to be quite fast.

First do a .find and save the result then do a bunch of inserts. Maybe also cutting the data in smaller parts but it seems that the number of records that are inserted does not really matter. And then restore the non relationship data with the result of the first find with an update.

What do you guys think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants