-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Live migration docs #2925
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -9,10 +9,9 @@ import SourceTargetNote from "versionContent/_partials/_migrate_source_target_no | |||||||||||||||||||||
|
||||||||||||||||||||||
# Live migration | ||||||||||||||||||||||
|
||||||||||||||||||||||
Live migration is a migration strategy to move a large amount of data | ||||||||||||||||||||||
(100 GB-10 TB+) with low downtime (on the order of few minutes). It requires | ||||||||||||||||||||||
more steps to execute than a migration with downtime using [pg_dump/restore][pg-dump-and-restore], | ||||||||||||||||||||||
but supports more use-cases and has less requirements than the [dual-write and backfill] method. | ||||||||||||||||||||||
Live migration is a strategy used to move a large amount of data (100 GB-10 TB+) with minimal downtime (typically a few minutes). It involves copying existing data from the source to the target and supports change data capture to stream ongoing changes from the source during the migration process. | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think
When the initial data copy completes, live migration applies the recorded transactions ... |
||||||||||||||||||||||
|
||||||||||||||||||||||
In contrast, [pg_dump/restore][pg-dump-and-restore] only supports copying the database from the source to the target without capturing ongoing changes, which results in downtime. On the other hand, the [dual-write and backfill] method requires setting up dual write in the application logic. This method is recommended only for append-only workloads as it does not support updates and deletes during migration. | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||||||
|
||||||||||||||||||||||
<SourceTargetNote /> | ||||||||||||||||||||||
|
||||||||||||||||||||||
|
@@ -23,8 +22,8 @@ Roughly, it consists of four steps: | |||||||||||||||||||||
|
||||||||||||||||||||||
1. Prepare and create replication slot in source database. | ||||||||||||||||||||||
2. Copy schema from source to target, optionally enabling hypertables. | ||||||||||||||||||||||
3. Copy data from source to target while capturing changes. | ||||||||||||||||||||||
4. Apply captured changes from source to target. | ||||||||||||||||||||||
3. Copy data from source to target while capturing ongoing changes from source. | ||||||||||||||||||||||
4. Apply captured ongoing changes to target. | ||||||||||||||||||||||
|
||||||||||||||||||||||
Live migration works well when: | ||||||||||||||||||||||
- Large, busy tables have primary keys, or don't have many `UPDATE` or | ||||||||||||||||||||||
|
@@ -38,7 +37,7 @@ For more information, refer to the step-by-step migration guide: | |||||||||||||||||||||
- [Live migration from PostgreSQL][from-postgres] | ||||||||||||||||||||||
- [Live migration from TimescaleDB][from-timescaledb] | ||||||||||||||||||||||
|
||||||||||||||||||||||
If you want to manually migrate data from PostgreSQL, refer to | ||||||||||||||||||||||
If you want to have more control over migration and prefer to manually migrate data from PostgreSQL, refer to | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||||||
[Live migration from PostgreSQL manually][live-migration-manual]. | ||||||||||||||||||||||
|
||||||||||||||||||||||
If you are migrating from AWS RDS to Timescale, you can refer to [this][live-migration-playbook] playbook | ||||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You should remove |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -14,9 +14,9 @@ import DumpPreDataSourceSchema from "versionContent/_partials/_migrate_pre_data_ | |||||
import DumpPostDataSourceSchema from "versionContent/_partials/_migrate_post_data_dump_source_schema.mdx"; | ||||||
import LiveMigrationStep2 from "versionContent/_partials/_migrate_live_migration_step2.mdx"; | ||||||
|
||||||
# Live migration from PostgreSQL database with pgcopydb | ||||||
# Live migration from PostgreSQL database | ||||||
|
||||||
This document provides detailed instructions to migrate data from your | ||||||
This document provides instructions to migrate data from your | ||||||
PostgreSQL database to a Timescale instance with minimal downtime (on the order | ||||||
of a few minutes) of your production applications, using the [live migration] | ||||||
strategy. To simplify the migration, we provide you with a docker image | ||||||
|
@@ -26,10 +26,8 @@ migration. | |||||
You should provision a dedicated instance to run the migration steps from. | ||||||
Ideally an AWS EC2 instance that's in the same region as the Timescale target | ||||||
service. For an ingestion load of 10,000 transactions/s, and assuming that the | ||||||
historical data copy takes 2 days, we recommend 4 CPUs with 4 to 8 GiB of RAM | ||||||
and 1.2 TiB of storage. | ||||||
|
||||||
<SourceTargetNote /> | ||||||
historical data of size 2 TB, we recommend 4 CPUs with 4 to 8 GiB of RAM | ||||||
and 1.2 TiB of storage, this approximates takes 24 hours to complete the migration. | ||||||
|
||||||
In detail, the migration process consists of the following steps: | ||||||
|
||||||
|
@@ -46,7 +44,7 @@ In detail, the migration process consists of the following steps: | |||||
|
||||||
<LiveMigrationStep2 /> | ||||||
|
||||||
Next, you need to ensure that your source tables and hypertables have either a primary key | ||||||
Next, you need to ensure that your source tables have either a primary key | ||||||
or `REPLICA IDENTITY` set. This is important as it is a requirement for replicating `DELETE` and | ||||||
`UPDATE` operations. Replica identity assists the replication process in identifying the rows | ||||||
being modified. It defaults to using the table's primary key. | ||||||
|
@@ -120,19 +118,14 @@ it will start `ANALYZE` on the target database. This updates statistics in the | |||||
target database, which is necessary for optimal querying performance in the | ||||||
target database. Wait for `ANALYZE` to complete. | ||||||
|
||||||
<Highlight type="important"> | ||||||
Application downtime begins here. | ||||||
</Highlight> | ||||||
## 4. Validate the data in target database and use it as new primary | ||||||
|
||||||
Once the lag between the databases is below 30 megabytes, and you're ready to | ||||||
take your applications offline, stop all applications which are writing to the | ||||||
source database. This is the downtime phase and will last until you have | ||||||
completed the validation step (4). Be sure to go through the validation step | ||||||
before you enter the downtime phase to keep the overall downtime minimal. | ||||||
Once the lag between the databases is below 130 megabytes, we recommend performing data integrity checks. There are two ways to do this: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
The code waits for lag to be 30MB. |
||||||
|
||||||
Stopping writes to the source database allows the live migration process to | ||||||
finish replicating data to the target database. This will be evident when the | ||||||
replication lag reduces to 0 megabytes. | ||||||
1. With downtime: Stop database operations from your application, which will result in downtime. This allows the live migration to catch up on the lag between the source and target databases, enabling the validation checks to be performed. The downtime will last until the lag is eliminated and the data integrity checks are completed. | ||||||
2. Without downtime: Since the difference between the source and target databases is less than 130 MB, you can perform data integrity checks, excluding the latest data that is still being written. This approach does not require taking your application down. | ||||||
Comment on lines
-133
to
+126
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we should make this change. We should let the users have 100% confidence on data integrity and not promote "partial integrity" by comparing non-recent data, otherwise he may have doubts later on regarding integrity. Note, the users will always have to take momentary downtime to switch applications writing from source to target. |
||||||
|
||||||
Now that the data integrity checks are complete, it's time to switch your target database to become the primary one. If you have selected option 2 for data integrity checks, stop writing to the source database and immediately start writing to the target database from the application. This will minimize application downtime to as low as application restart. It allows the live migration process to complete replicating data to the target database, as the source will no longer receive any new transactions. You will know the process is complete when the replication lag reduces to 0 megabytes. If you have chosen option 1 for data integrity checks, start your application to write data to the target database. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is bit confusing. When you select option 2, you have to wait till the lag to become 0 and then change the application to switch to the target. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is that not mandatory? If the lag is, let's say, 5 minutes, the application can start writes to target right away and the 5-minute lag catches up in parallel as new writes end up in the target. The only limitation is that during the switchover, they cannot access the tail part of the data as it is in flight (catching up with the lag). By doing this, they wouldn't lose any write data or create gaps in their timeseries data. However, for a moment, until the lag catches up, the in-flight data cannot be accessed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This would break the transactional consistency & not recommended. Additionally, you will miss update/deletes targeting the non migrated data. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got it, lets mention both as choices i.e. if user wants 100% transactional consistency he has to trade off taking database down for sometime, if the user wants zero downtime during cutover he has to trade off transactional consistency if any updates/deletes are happening to latest data. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would rather refrain from suggesting anything causes transactional inconsistencies. IMHO, the whole point of migration with logical decoding is "transactional consistency", we shouldn't break that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should not make this change. It complicates the matter. Let's have just 1 way to confirm data integrity and be 100% confident that all the data in my source/production database is migrated to Timescale. |
||||||
|
||||||
Once the replication lag is 0, wait for a few minutes and then provide the | ||||||
signal to proceed by pressing key `c`. | ||||||
|
@@ -150,19 +143,5 @@ message if all the mentioned steps were successful. | |||||
Migration successfully completed | ||||||
``` | ||||||
|
||||||
## 4. Validate the data in target database and use it as new primary | ||||||
|
||||||
Now that all data has been migrated, the contents of both databases should | ||||||
be the same. How exactly this should best be validated is dependent on | ||||||
your application. You could compare the number of rows or an aggregate of | ||||||
columns to validate that the target database matches with the source. | ||||||
|
||||||
<Highlight type="important"> | ||||||
Application downtime ends here. | ||||||
</Highlight> | ||||||
|
||||||
Once you are confident with the data validation, the final step is to configure | ||||||
your applications to use the target database. | ||||||
|
||||||
[Hypertable docs]: /use-timescale/:currentVersion:/hypertables/ | ||||||
[live migration]: https://docs.timescale.com/migrate/latest/live-migration/ |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,7 +14,7 @@ import DumpPreDataSourceSchema from "versionContent/_partials/_migrate_pre_data_ | |
import DumpPostDataSourceSchema from "versionContent/_partials/_migrate_post_data_dump_source_schema.mdx"; | ||
import LiveMigrationStep2 from "versionContent/_partials/_migrate_live_migration_step2.mdx"; | ||
|
||
# Live migration from TimescaleDB database with pgcopydb | ||
# Live migration from TimescaleDB database | ||
|
||
This document provides detailed instructions to migrate data from your | ||
TimescaleDB database (self-hosted or on [Managed Service for TimescaleDB]) to a | ||
|
@@ -26,7 +26,8 @@ scripts that you need to perform the live migration. | |
You should provision a dedicated instance to run the migration steps from. | ||
Ideally an AWS EC2 instance that's in the same region as the Timescale target service. | ||
For an ingestion load of 10,000 transactions/s, and assuming that the historical | ||
data copy takes 2 days, we recommend 4 CPUs with 4 to 8 GiB of RAM and 1.2 TiB of storage. | ||
data copy takes 2 days, we recommend 4 CPUs with 4 to 8 GiB of RAM and 1.5x of storage | ||
in the ec2 machine. | ||
|
||
<SourceTargetNote /> | ||
|
||
|
@@ -114,49 +115,30 @@ start `ANALYZE` on the target database. This updates statistics in the target | |
which is necessary for optimal querying performance in the target database. Wait for | ||
`ANALYZE` to complete. | ||
|
||
<Highlight type="important"> | ||
Application downtime begins here. | ||
</Highlight> | ||
## 4. Validate the data in target database and use it as new primary | ||
|
||
Once the lag between the databases is below 30 megabytes, and you're ready to | ||
take your applications offline, stop all applications which are writing to the | ||
source database. This is the downtime phase and will last until you have | ||
completed the validation step (4). Be sure to go through the validation step | ||
(4) before you enter the downtime phase to keep the overall downtime minimal. | ||
Once the lag between the databases is below 130 megabytes, we recommend performing data integrity checks. There are two ways to do this: | ||
|
||
Stopping writes to the source database allows the live migration process to | ||
finish replicating data to the target database. This will be evident when the | ||
replication lag reduces to 0 megabytes. | ||
1. With downtime: Stop database operations from your application, which will result in downtime. This allows the live migration to catch up on the lag between the source and target databases, enabling the validation checks to be performed. The downtime will last until the lag is eliminated and the data integrity checks are completed. | ||
2. Without downtime: Since the difference between the source and target databases is less than 130 MB, you can perform data integrity checks, excluding the latest data that is still being written. This approach does not require taking your application down. | ||
|
||
Now that the data integrity checks are complete, it's time to switch your target database to become the primary one. If you have selected option 2 for data integrity checks, stop writing to the source database and immediately start writing to the target database from the application. This will minimize application downtime to as low as application restart. It allows the live migration process to complete replicating data to the target database, as the source will no longer receive any new transactions. You will know the process is complete when the replication lag reduces to 0 megabytes. If you have chosen option 1 for data integrity checks, start your application to write data to the target database. | ||
Comment on lines
-121
to
+125
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we should make this change. It complicates the validation work and promotes partial integrity checks. We should discuss this on team call if you think its important to mention |
||
|
||
Once the replication lag is 0, wait for a few minutes and then provide the | ||
signal to proceed by pressing key `c`. | ||
|
||
```sh | ||
[WATCH] Source DB - Target DB => 0MB. Press "c" (and ENTER) to stop live-replay | ||
[WATCH] Source DB - Target DB => 0MB. Press "c" (and ENTER) to proceed | ||
Syncing last LSN in Source DB to Target DB ... | ||
``` | ||
|
||
The live migration image will continue the remaining work under live replay, | ||
copy TimescaleDB metadata, sequences, and run policies. You should see the | ||
following message if all the mentioned steps were successful. | ||
The live migration image will continue the remaining work that includes | ||
migrating sequences and cleaning up resources. You should see the following | ||
message if all the mentioned steps were successful. | ||
|
||
```sh | ||
Migration successfully completed | ||
``` | ||
|
||
## 4. Validate the data in target database and use it as new primary | ||
|
||
Now that all data has been migrated, the contents of both databases should be the | ||
same. How exactly this should best be validated is dependent on your application. | ||
You could compare the number of rows or an aggregate of columns to validate that | ||
the target database matches with the source. | ||
|
||
<Highlight type="important"> | ||
Application downtime ends here. | ||
</Highlight> | ||
|
||
Once you are confident with the data validation, the final step is to configure | ||
your applications to use the target database. | ||
|
||
[Managed Service for TimescaleDB]: https://www.timescale.com/mst-signup/ | ||
[live migration]: https://docs.timescale.com/migrate/latest/live-migration/ |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,19 +36,19 @@ module.exports = [ | |
excerpt: "Migrate a large database with low downtime", | ||
children: [ | ||
{ | ||
title: "Live migration from PostgreSQL", | ||
title: "From PostgreSQL", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please also apply this change to the dual-write and backfill pages |
||
href: "live-migration-from-postgres", | ||
excerpt: | ||
"Migrate from PostgreSQL using live migration", | ||
}, | ||
{ | ||
title: "Live migration from TimescaleDB", | ||
title: "From TimescaleDB", | ||
href: "live-migration-from-timescaledb", | ||
excerpt: | ||
"Migrate from TimescaleDB using live migration", | ||
}, | ||
{ | ||
title: "(Advanced) Live migration from PostgreSQL manually", | ||
title: "(Advanced) From PostgreSQL manually", | ||
href: "live-migration-from-postgres-manually", | ||
excerpt: | ||
"Migrate from TimescaleDB using live migration manually", | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -346,6 +346,7 @@ ALTER TABLE {table_name} REPLICA IDENTITY FULL; | |
``` | ||
|
||
## 3. Set up a replication slot and snapshot | ||
|
||
Once you're sure that the tables which will be affected by `UPDATE` and `DELETE` | ||
queries have `REPLICA IDENTITY` set, you will need to create a replication slot. | ||
|
||
|
@@ -372,32 +373,12 @@ Additionally, `follow` command exports a snapshot ID to `/tmp/pgcopydb/snapsho | |
This ID can be utilized to migrate data that was in the database before the replication | ||
slot was created. | ||
|
||
## 4. Migrate roles and schema from source to target | ||
Before applying DML operations from the replication slot, the schema and data from | ||
the source database need to be migrated. | ||
The larger the size of the source database, the more time it takes to perform the | ||
initial migration, and the longer the buffered files need to be stored. | ||
|
||
### 4.a Migrate database roles from source database | ||
<LiveMigrationRoles /> | ||
|
||
### 4.b Dump the database schema from the source database | ||
<DumpPreDataSourceSchema /> | ||
|
||
### 4.c Load the roles and schema into the target database | ||
```sh | ||
psql -X -d "$TARGET" \ | ||
-v ON_ERROR_STOP=1 \ | ||
--echo-errors \ | ||
-f roles.sql \ | ||
-f pre-data-dump.sql | ||
``` | ||
## 4. Perform Live Migration | ||
|
||
## 5. Perform "live migration" | ||
The remaining steps for migrating data from a RDS Postgres instance to Timescale | ||
with low-downtime are the same as the ones mentioned in "Live migration" | ||
documentation from [Step 5] onwards. You should follow the mentioned steps | ||
documentation from [Step 3] onwards. You should follow the mentioned steps | ||
to successfully complete the migration process. | ||
|
||
[live migration]: /migrate/:currentVersion:/live-migration/live-migration-from-postgres/ | ||
[Step 5]: /migrate/:currentVersion:/live-migration/live-migration-from-postgres/#5-enable-hypertables | ||
[Step 3]: /migrate/:currentVersion:/live-migration/live-migration-from-postgres/#3-run-the-live-migration-docker-image | ||
Comment on lines
-390
to
+384
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not right. We are asking users to continue from Step 3 in docker run --rm -dit --name live-migration \
-e PGCOPYDB_SOURCE_PGURI=$SOURCE \
-e PGCOPYDB_TARGET_PGURI=$TARGET \
-v ~/live-migration:/opt/timescale/ts_cdc \
timescale/live-migration:v0.0.1 Since the above command already has The correct way should be to ask users to continue from Step 3 in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import line (Line 1) of this file can now be removed.