Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Effects of multiple concurrent runs against the same database #377

Open
moredhel opened this issue Jul 1, 2021 · 5 comments
Open

Effects of multiple concurrent runs against the same database #377

moredhel opened this issue Jul 1, 2021 · 5 comments

Comments

@moredhel
Copy link

moredhel commented Jul 1, 2021

Hi,

We have recently moved to managing our mongo schema with migrate-mongo. We are wanting to automate the process of migrating the state though, so plan to have a hands-off process.

Our plan is to run the migration in a pre-start script for our service, then to start the service.

Our concern is that if we have N (where N > 2) services trying to do this at the same time what, if any, is the risk that we are taking on & what is the expected behaviour?

Questions

Given: Running migrate-mongo in parallel against the same database

  • Is there a potential for data-corruption/loss?
  • Is there a potential for failed/inconsistent migrations?
  • Is there a potential for other failures I haven't mentioned?

Thanks in Advance.

Hamish

@Igor-Techsee
Copy link

Same question

@ronen-laufer
Copy link

+1

@daveboulard
Copy link
Contributor

Hello @moredhel !

Yes, running another migrate-mongo before the end of the first migrate-mongo will spin another migration script.
And it will apply the changes a second time.

Might it lead to a problem ?
I would say that it depends of the script you are running.
If your script appends a string to a property for every documents, running it 4 times will append it 4 times. Which is not what you want.
If your script replaces all empty values of a property to "true" - for example - running it 4 times will yield the same result. Which is okay.
But ultimately, it is a risk, yes.

Since we encountered the same issue, here is what we did :
I made a PR which leverages the TTL indexes here : #262.
Everything is explained on the PR.
This is not ideal or perfect, but it should protect you from concurrent accesses.

This worked quite well, be we decided to go for another solution though. Which I think is far better but is heavily dependent of our infrastructure, and CI/CD.

We decided that an application shouldn't migrate its own data. Another service should do that.
Even if the migration code lives in the same code repo than the application.
So, when CI/CD is triggered, 2 docker images are built. One that will execute the migrate-mongo of the repo, and one that will execute the service (or one service) of the repo.
The CD first deploys the migrate-mongo image, and we run a job in kubernetes that will spin the new migrate-mongo container.
If it succeeds, it will spin the new service image.
If it fails, it won't spin the new version.

That way, we are sure that the scripts are executed 1 and only 1 time. And we are sure that a version always runs with the right related migration script.

I hope this helps.

@moredhel
Copy link
Author

Hi Dave,
Thanks for the information & accompanying PR, this looks like it will be helpful.

We have actually gone for a similar approach to yours.

We decided that an application shouldn't migrate its own data. Another service should do that.
Even if the migration code lives in the same code repo than the application.
So, when CI/CD is triggered, 2 docker images are built. One that will execute the migrate-mongo of the repo, and one that will execute the service (or one service) of the repo.
The CD first deploys the migrate-mongo image, and we run a job in kubernetes that will spin the new migrate-mongo container.
If it succeeds, it will spin the new service image.
If it fails, it won't spin the new version.

Which we're pretty happy with at the moment. This PR will still help with the case of multiple parallel CI runs though if they start trying to step on one-another's toes.

Thanks again for the PR, it does address the core concern of this issue so we can close it once #262 has been successfully merged.

@zergeborg
Copy link

Bump

@daveboulard Is this still the case with the latest library version? Are we still not supposed to run the library simultaneously from multiple server nodes?

I am asking because I was under the impression that migrate-mongo manages it's own DB collection/document to track the state of migrations, so I wonder what's stopping the library from let's say creating a unique index on the fileName field in the changelog collection and aborting the late arriving simultaneous processes to start the migration when the fileName already exists?

Was this approach considered? Or do I misunderstand how the migration library works?

BTW, is migrate-mongo running the changelog insert/update operation before or after the up command completes? Maybe that's the issue here? If the library changes the changelog collection after up completes, that makes parallelization unnecessary complex IMO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants