Update Live migration docs #2925

VineethReddy02 · 2024-01-05T10:27:16Z

Description

Update live migration docs with fixes and better explanation on database switchover.

Links

Fixes #[insert issue link, if any]

Writing help

For information about style and word usage, see the style guide

Review checklists

Reviewers: use this section to ensure you have checked everything before approving this PR:

Subject matter expert (SME) review checklist

Is the content technically accurate?
Is the content complete?
Is the content presented in a logical order?
Does the content use appropriate names for features and products?
Does the content provide relevant links to further information?

Documentation team review checklist

Is the content free from typos?
Does the content use plain English?
Does the content contain clear sections for concepts, tasks, and references?
Have any images been uploaded to the correct location, and are resolvable?
If the page index was updated, are redirects required
and have they been implemented?
Have you checked the built version of this content?

github-actions · 2024-01-05T10:27:28Z

Allow 10 minutes from last push for the staging site to build. If the link doesn't work, try using incognito mode instead. For internal reviewers, check web-documentation repo actions for staging build status. Link to build for this PR: http://docs-dev.timescale.com/docs-vineeth-nits-live-migration

arajkumar · 2024-01-08T11:53:48Z

migrate/live-migration/live-migration-from-postgres.md

+1. With downtime: Stop database operations from your application, which will result in downtime. This allows the live migration to catch up on the lag between the source and target databases, enabling the validation checks to be performed. The downtime will last until the lag is eliminated and the data integrity checks are completed.
+2. Without downtime: Since the difference between the source and target databases is less than 130 MB, you can perform data integrity checks, excluding the latest data that is still being written. This approach does not require taking your application down.
+
+Now that the data integrity checks are complete, it's time to switch your target database to become the primary one. If you have selected option 2 for data integrity checks, stop writing to the source database and immediately start writing to the target database from the application. This will minimize application downtime to as low as application restart. It allows the live migration process to complete replicating data to the target database, as the source will no longer receive any new transactions. You will know the process is complete when the replication lag reduces to 0 megabytes. If you have chosen option 1 for data integrity checks, start your application to write data to the target database.


If you have selected option 2 for data integrity checks, stop writing to the source database and immediately start writing to the target database from the application

This is bit confusing. When you select option 2, you have to wait till the lag to become 0 and then change the application to switch to the target.

Is that not mandatory? If the lag is, let's say, 5 minutes, the application can start writes to target right away and the 5-minute lag catches up in parallel as new writes end up in the target. The only limitation is that during the switchover, they cannot access the tail part of the data as it is in flight (catching up with the lag). By doing this, they wouldn't lose any write data or create gaps in their timeseries data. However, for a moment, until the lag catches up, the in-flight data cannot be accessed.

This would break the transactional consistency & not recommended. Additionally, you will miss update/deletes targeting the non migrated data.

Got it, lets mention both as choices i.e. if user wants 100% transactional consistency he has to trade off taking database down for sometime, if the user wants zero downtime during cutover he has to trade off transactional consistency if any updates/deletes are happening to latest data.

I would rather refrain from suggesting anything causes transactional inconsistencies. IMHO, the whole point of migration with logical decoding is "transactional consistency", we shouldn't break that.

I think we should not make this change. It complicates the matter. Let's have just 1 way to confirm data integrity and be 100% confident that all the data in my source/production database is migrated to Timescale.

JamesGuthrie · 2024-01-08T13:27:24Z

_partials/_migrate_dual_write_step1.md

The import line (Line 1) of this file can now be removed.

JamesGuthrie · 2024-01-08T13:28:38Z

migrate/page-index/page-index.js

@@ -36,19 +36,19 @@ module.exports = [
        excerpt: "Migrate a large database with low downtime",
        children: [
          {
-            title: "Live migration from PostgreSQL",
+            title: "From PostgreSQL",


Please also apply this change to the dual-write and backfill pages

JamesGuthrie · 2024-01-08T13:57:58Z

migrate/live-migration/index.md

@@ -38,7 +37,7 @@ For more information, refer to the step-by-step migration guide:
 - [Live migration from PostgreSQL][from-postgres]
 - [Live migration from TimescaleDB][from-timescaledb]

-If you want to manually migrate data from PostgreSQL, refer to
+If you want to have more control over migration and prefer to manually migrate data from PostgreSQL, refer to 


Suggested change

If you want to have more control over migration and prefer to manually migrate data from PostgreSQL, refer to

If you want to have more control over the migration and prefer to manually migrate data from PostgreSQL, refer to

JamesGuthrie · 2024-01-09T08:21:42Z

migrate/live-migration/index.md

-(100&nbsp;GB-10&nbsp;TB+) with low downtime (on the order of few minutes). It requires
-more steps to execute than a migration with downtime using [pg_dump/restore][pg-dump-and-restore],
-but supports more use-cases and has less requirements than the [dual-write and backfill] method.
+Live migration is a strategy used to move a large amount of data (100&nbsp;GB-10&nbsp;TB+) with minimal downtime (typically a few minutes). It involves copying existing data from the source to the target and supports change data capture to stream ongoing changes from the source during the migration process.


Suggested change

Live migration is a strategy used to move a large amount of data (100 GB-10 TB+) with minimal downtime (typically a few minutes). It involves copying existing data from the source to the target and supports change data capture to stream ongoing changes from the source during the migration process.

Live migration is a strategy to move a large amount of data

(100 GB-10 TB+) with minimal downtime (typically a few minutes). It

achieves low downtime by simultaneously 1) copying existing data from the

source database to the target database and 2) recording ongoing changes which

are made on the source. When the initial data copy completes, it continuously

applies the recorded transactions to the target database until the target

database is fully caught up with the source database. At this point the

application's database connection is switched to the target database (which may

result in a short downtime), and the migration is complete.

I think 1) & 2) is not normal in docs.

When the initial data copy completes, it continuously
applies the recorded transactions

When the initial data copy completes, live migration applies the recorded transactions ...

JamesGuthrie · 2024-01-09T08:22:01Z

migrate/live-migration/index.md

-but supports more use-cases and has less requirements than the [dual-write and backfill] method.
+Live migration is a strategy used to move a large amount of data (100&nbsp;GB-10&nbsp;TB+) with minimal downtime (typically a few minutes). It involves copying existing data from the source to the target and supports change data capture to stream ongoing changes from the source during the migration process.
+
+In contrast, [pg_dump/restore][pg-dump-and-restore] only supports copying the database from the source to the target without capturing ongoing changes, which results in downtime. On the other hand, the [dual-write and backfill] method requires setting up dual write in the application logic. This method is recommended only for append-only workloads as it does not support updates and deletes during migration.


Suggested change

In contrast, [pg_dump/restore][pg-dump-and-restore] only supports copying the database from the source to the target without capturing ongoing changes, which results in downtime. On the other hand, the [dual-write and backfill] method requires setting up dual write in the application logic. This method is recommended only for append-only workloads as it does not support updates and deletes during migration.

In contrast, [pg_dump/restore][pg-dump-and-restore] only supports copying data

from the source to the target without recording ongoing changes, so

applications which are writing must be stopped for the duration of the

migration. On the other hand, the [dual-write and backfill] method also

provides a way to migrate with low downtime, but requires modifying your

application to write to two databases simultaneously, and only works with

append-only workloads as it does not support updates and deletes during

migration.

Harkishen-Singh · 2024-01-09T09:30:19Z

migrate/live-migration/live-migration-from-postgres.md

You should remove import SourceTargetNote from "versionContent/_partials/_migrate_source_target_note.mdx"; from import list.

Harkishen-Singh · 2024-01-09T09:31:18Z

migrate/live-migration/live-migration-from-postgres.md

-source database. This is the downtime phase and will last until you have
-completed the validation step (4). Be sure to go through the validation step
-before you enter the downtime phase to keep the overall downtime minimal.
+Once the lag between the databases is below 130 megabytes, we recommend performing data integrity checks. There are two ways to do this:


Suggested change

Once the lag between the databases is below 130 megabytes, we recommend performing data integrity checks. There are two ways to do this:

Once the lag between the databases is below 30 megabytes, we recommend performing data integrity checks. There are two ways to do this:

The code waits for lag to be 30MB.

Harkishen-Singh · 2024-01-09T09:37:59Z

migrate/live-migration/live-migration-from-postgres.md

-Stopping writes to the source database allows the live migration process to
-finish replicating data to the target database. This will be evident when the
-replication lag reduces to 0 megabytes.
+1. With downtime: Stop database operations from your application, which will result in downtime. This allows the live migration to catch up on the lag between the source and target databases, enabling the validation checks to be performed. The downtime will last until the lag is eliminated and the data integrity checks are completed.
+2. Without downtime: Since the difference between the source and target databases is less than 130 MB, you can perform data integrity checks, excluding the latest data that is still being written. This approach does not require taking your application down.


I don't think we should make this change. We should let the users have 100% confidence on data integrity and not promote "partial integrity" by comparing non-recent data, otherwise he may have doubts later on regarding integrity. Note, the users will always have to take momentary downtime to switch applications writing from source to target.

Harkishen-Singh · 2024-01-09T09:40:48Z

migrate/live-migration/live-migration-from-postgres.md

+1. With downtime: Stop database operations from your application, which will result in downtime. This allows the live migration to catch up on the lag between the source and target databases, enabling the validation checks to be performed. The downtime will last until the lag is eliminated and the data integrity checks are completed.
+2. Without downtime: Since the difference between the source and target databases is less than 130 MB, you can perform data integrity checks, excluding the latest data that is still being written. This approach does not require taking your application down.
+
+Now that the data integrity checks are complete, it's time to switch your target database to become the primary one. If you have selected option 2 for data integrity checks, stop writing to the source database and immediately start writing to the target database from the application. This will minimize application downtime to as low as application restart. It allows the live migration process to complete replicating data to the target database, as the source will no longer receive any new transactions. You will know the process is complete when the replication lag reduces to 0 megabytes. If you have chosen option 1 for data integrity checks, start your application to write data to the target database.


I think we should not make this change. It complicates the matter. Let's have just 1 way to confirm data integrity and be 100% confident that all the data in my source/production database is migrated to Timescale.

Harkishen-Singh · 2024-01-09T09:43:49Z

migrate/live-migration/live-migration-from-timescaledb.md

-Once the lag between the databases is below 30 megabytes, and you're ready to
-take your applications offline, stop all applications which are writing to the
-source database. This is the downtime phase and will last until you have
-completed the validation step (4). Be sure to go through the validation step
-(4) before you enter the downtime phase to keep the overall downtime minimal.
+Once the lag between the databases is below 130 megabytes, we recommend performing data integrity checks. There are two ways to do this:

-Stopping writes to the source database allows the live migration process to
-finish replicating data to the target database. This will be evident when the
-replication lag reduces to 0 megabytes.
+1. With downtime: Stop database operations from your application, which will result in downtime. This allows the live migration to catch up on the lag between the source and target databases, enabling the validation checks to be performed. The downtime will last until the lag is eliminated and the data integrity checks are completed.
+2. Without downtime: Since the difference between the source and target databases is less than 130 MB, you can perform data integrity checks, excluding the latest data that is still being written. This approach does not require taking your application down.
+
+Now that the data integrity checks are complete, it's time to switch your target database to become the primary one. If you have selected option 2 for data integrity checks, stop writing to the source database and immediately start writing to the target database from the application. This will minimize application downtime to as low as application restart. It allows the live migration process to complete replicating data to the target database, as the source will no longer receive any new transactions. You will know the process is complete when the replication lag reduces to 0 megabytes. If you have chosen option 1 for data integrity checks, start your application to write data to the target database.


I don't think we should make this change. It complicates the validation work and promotes partial integrity checks. We should discuss this on team call if you think its important to mention without downtime option.

Harkishen-Singh · 2024-01-09T09:53:03Z

migrate/playbooks/rds-timescale-live-migration.md

-  -v ON_ERROR_STOP=1 \
-  --echo-errors \
-  -f roles.sql \
-  -f pre-data-dump.sql
-```
+## 4. Perform Live Migration

-## 5. Perform "live migration"
 The remaining steps for migrating data from a RDS Postgres instance to Timescale
 with low-downtime are the same as the ones mentioned in "Live migration"
-documentation from [Step 5] onwards. You should follow the mentioned steps
+documentation from [Step 3] onwards. You should follow the mentioned steps
 to successfully complete the migration process.

 [live migration]: /migrate/:currentVersion:/live-migration/live-migration-from-postgres/
-[Step 5]: /migrate/:currentVersion:/live-migration/live-migration-from-postgres/#5-enable-hypertables
+[Step 3]: /migrate/:currentVersion:/live-migration/live-migration-from-postgres/#3-run-the-live-migration-docker-image


This is not right. We are asking users to continue from Step 3 in live-migration-from-postgres page. The step 3 there is

docker run --rm -dit --name live-migration \ -e PGCOPYDB_SOURCE_PGURI=$SOURCE \ -e PGCOPYDB_TARGET_PGURI=$TARGET \ -v ~/live-migration:/opt/timescale/ts_cdc \ timescale/live-migration:v0.0.1

Since the above command already has pgcopydb follow internally, we should not ask users to perform step 3 in this page which also includes pgcopydb follow.

The correct way should be to ask users to continue from Step 3 in live-migration-from-postgres page right from step 3 on this page (by replacing step 3 on this page).

billy-the-fish · 2024-05-07T11:28:04Z

@VineethReddy02, would you like me to work on the comments made for this issue or shall we close it?

Harkishen-Singh · 2024-05-07T12:32:59Z

It seems like the PR has become outdated, and it might be best to close it and create a new one. There have been notable changes in the live migration documentation since this PR was initially submitted.

VineethReddy02 requested review from arajkumar and Harkishen-Singh January 5, 2024 10:27

Update Live migration docs

efdb20d

VineethReddy02 force-pushed the vineeth-nits-live-migration branch from b8d3b46 to efdb20d Compare January 5, 2024 10:48

JamesGuthrie self-requested a review January 8, 2024 09:09

arajkumar reviewed Jan 8, 2024

View reviewed changes

JamesGuthrie requested changes Jan 9, 2024

View reviewed changes

Harkishen-Singh requested changes Jan 9, 2024

View reviewed changes

billy-the-fish closed this May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Live migration docs #2925

Update Live migration docs #2925

VineethReddy02 commented Jan 5, 2024

github-actions bot commented Jan 5, 2024

arajkumar Jan 8, 2024

VineethReddy02 Jan 8, 2024

arajkumar Jan 8, 2024

VineethReddy02 Jan 9, 2024

arajkumar Jan 9, 2024

Harkishen-Singh Jan 9, 2024

JamesGuthrie Jan 8, 2024

JamesGuthrie Jan 8, 2024

JamesGuthrie Jan 8, 2024

JamesGuthrie Jan 9, 2024

Harkishen-Singh Jan 9, 2024

JamesGuthrie Jan 9, 2024

Harkishen-Singh Jan 9, 2024

Harkishen-Singh Jan 9, 2024

Harkishen-Singh Jan 9, 2024

Harkishen-Singh Jan 9, 2024

Harkishen-Singh Jan 9, 2024

Harkishen-Singh Jan 9, 2024

billy-the-fish commented May 7, 2024

Harkishen-Singh commented May 7, 2024

	If you want to have more control over migration and prefer to manually migrate data from PostgreSQL, refer to
	If you want to have more control over the migration and prefer to manually migrate data from PostgreSQL, refer to

-Live migration is a strategy used to move a large amount of data (100&nbsp;GB-10&nbsp;TB+) with minimal downtime (typically a few minutes). It involves copying existing data from the source to the target and supports change data capture to stream ongoing changes from the source during the migration process.
+Live migration is a strategy to move a large amount of data
+(100&nbsp;GB-10&nbsp;TB+) with minimal downtime (typically a few minutes). It
+achieves low downtime by simultaneously 1) copying existing data from the
+source database to the target database and 2) recording ongoing changes which
+are made on the source. When the initial data copy completes, it continuously
+applies the recorded transactions to the target database until the target
+database is fully caught up with the source database. At this point the
+application's database connection is switched to the target database (which may
+result in a short downtime), and the migration is complete.

-In contrast, [pg_dump/restore][pg-dump-and-restore] only supports copying the database from the source to the target without capturing ongoing changes, which results in downtime. On the other hand, the [dual-write and backfill] method requires setting up dual write in the application logic. This method is recommended only for append-only workloads as it does not support updates and deletes during migration.
+In contrast, [pg_dump/restore][pg-dump-and-restore] only supports copying data
+from the source to the target without recording ongoing changes, so
+applications which are writing must be stopped for the duration of the
+migration. On the other hand, the [dual-write and backfill] method also
+provides a way to migrate with low downtime, but requires modifying your
+application to write to two databases simultaneously, and only works with
+append-only workloads as it does not support updates and deletes during
+migration.

	Once the lag between the databases is below 130 megabytes, we recommend performing data integrity checks. There are two ways to do this:
	Once the lag between the databases is below 30 megabytes, we recommend performing data integrity checks. There are two ways to do this:

Update Live migration docs #2925

Update Live migration docs #2925

Conversation

VineethReddy02 commented Jan 5, 2024

Description

Links

Writing help

Review checklists

Subject matter expert (SME) review checklist

Documentation team review checklist

github-actions bot commented Jan 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

billy-the-fish commented May 7, 2024

Harkishen-Singh commented May 7, 2024