Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy a large table between servers, a couple wrong ways, maybe one right way #20

Open
utterances-bot opened this issue Oct 25, 2021 · 5 comments

Comments

@utterances-bot
Copy link

Copy a large table between servers, a couple wrong ways, maybe one right way | Chad’s Blog

Have you ever needed to copy a giant 50 million record table from one server to another? Because I did…I failed a few times. But eventually I figured it out with the help of the SQL community.

https://chadbaldwin.net/2021/10/19/copy-large-table.html

Copy link

This is my first time on the site, but it seems odd that you were forced to jump through so many hoops to get this done, when your dba probably could have gotten this done a lot quicker without using third party tools. But I never heard of dbatools before, so this was an interesting read for sure! (I only have a few years of professional database experience)

@chadbaldwin
Copy link
Owner

@MattEHarrington it was a bit of an odd scenario. DBA was out on PTO/vacation. I didn't want to wait to get the project done and it was the only thing holding me back. My initial assumption was that it was only going to take me a few minutes to get done in the first place...but it turned into one of those rabbit hole situations where I went from expecting it to take 10 minutes, to actually taking a couple hours. But I'm still happy I went through the process, because now I know how to use bcp as well as Copy-DbaDbTableData which will definitely come in handy in the future.

That said, while dbatools may be a 3rd party tool, I would consider it to be fairly standard in a lot of places. I've even seen it on quite a few job postings for both DBA and developer roles. If you've never heard of it, I very highly recommend getting familiar. It's a great set of tools, especially if you're working with lots of instances, or need to automate certain recurring tasks.

Copy link

Hey Chad! This is an excellent blog post. Something to consider when using DBATools to transfer data is that even if you kill the process in the PowerShell ISE/IDE, the process can still exist on SQL's side. I was running this process on one of our secondaries to prevent blocking and ended up killing the process thinking it was cleared from SQL, but that was not the case.

Something else I would recommend is looking into import/export within SSMS. It's only a couple of steps although not possible to script out.

Finally, I have never done this before, but creating a BIML package might also be another option assuming all you are doing is moving data from one table to another. Both of these options will most likely be faster than DBATools.

@chadbaldwin
Copy link
Owner

@janszedc thanks for the feedback! I don't use the PowerShell ISE, but I'll keep that in mind if I ever do. Generally I use VS Code, and monitor it through task manager. If I want to kill it, then I kill it directly from the terminal window in VS Code, or I kill the process using task manager. Then I check the SQL server sessions to ensure there is no hanging session that needs killing due to some sort of hanging locks or transactions.

You're not the first to suggest other methods, such as the SSMS data transfer wizard.

Here was my response to that if you're interested:

https://www.reddit.com/r/SQLServer/comments/qbg4zt/blog_copy_a_large_table_between_servers_a_couple/hhd5tfq/?utm_source=reddit&utm_medium=web2x&context=3

I tried testing it, but it was going to take a projected 89 minutes to complete, which is still much slower than dbatools. Though I will mention that it technically is scriptable since you can produce SSIS packages (dtsx file) from it that can be run on their own which I have used before as well.

I would be fairly surprised to find things that are faster than dbatools, since it's using the .NET SqlBulkCopy class, which is the same base as what bcp is using, which is also incredibly fast. I would guess SSIS is using SqlBulkCopy as well, but I generally don't like the overhead of building an SSIS package.

I'll check out the BIML thing, thanks!

Copy link

Great post! For flexibility reasons and to distribute the amount of data accross several databases and servers I use PoSh. We have a process which allocates multiple tables (each with a unique run number!) with sometimes millions of records for every run. The data is later used by our analytics people as input for their trickery. The processing database was growing very quickly very large (we knew it), so I had to implement a way to distribute the amount of data but keep a central control where the output of each run is located. DBA's wouldn't be a big help in this scenario. A pragmatic solution was to add a distribution step to the end who moves the tables automatically to it's destination and keeps the inventory updated where everything is. A nice side effect of this setup: The processing happens on a very powerfull SQL server instance (costly), the output will find it's final place on a much cheaper setup. Have to check out Copy-DbaDbTableData though!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants