Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full resync always stuck in congested (behind) state after few days #46

Open
Lathanderjk opened this issue Aug 22, 2022 · 1 comment
Open

Comments

@Lathanderjk
Copy link

Lathanderjk commented Aug 22, 2022

Hi,

DRBD resource always stuck in Behind state and sync status start decreasing 98.20 -> 98.19 ... 98.12% after 2~3 days when on-congestion policy is "pull-ahead" there is no entry about congestion fill/extents reached in kernel logs as when you hit the configured limit.

I tried to increase congestion-fill to crazy value (100M -> 200M,500M or disable 0) and congestion-extents (to value even higher than al-extents) or commented them completely out from configuration but no help still same outcome.
Commenting out on-congestion pull-ahead (switch to default block) will help and resync started continuing again.

When congested logs on primary are filling with thousand same entries in loop:
[ +0.104537] drbd storage/0 drbd1 backup-dc: repl( PausedSyncS -> Ahead )
[ +1.026862] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source
[ +0.002791] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source exit code 0
[ +0.001076] drbd storage/0 drbd1 backup-dc: repl( Ahead -> PausedSyncS )
[ +0.000718] drbd storage/0 drbd1 backup-dc: Began resync as PausedSyncS (will sync 12844472428 KB [3211118107 bits set]).
[ +0.043059] drbd storage/0 drbd1 backup-dc: repl( PausedSyncS -> Ahead )
[ +1.040371] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source
[ +0.004944] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source exit code 0
[ +0.000894] drbd storage/0 drbd1 backup-dc: repl( Ahead -> PausedSyncS )
[ +0.000695] drbd storage/0 drbd1 backup-dc: Began resync as PausedSyncS (will sync 12844472428 KB [3211118107 bits set]).
[ +0.098964] drbd storage/0 drbd1 backup-dc: repl( PausedSyncS -> Ahead )
[ +1.046465] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source
[ +0.003419] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source exit code 0
[ +0.004561] drbd storage/0 drbd1 backup-dc: repl( Ahead -> PausedSyncS )
[ +0.010005] drbd storage/0 drbd1 backup-dc: Began resync as PausedSyncS (will sync 12844472428 KB [3211118107 bits set]).
[ +0.046983] drbd storage/0 drbd1 backup-dc: repl( PausedSyncS -> Ahead )
[ +1.022996] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source
[ +0.006174] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source exit code 0
[ +0.011331] drbd storage/0 drbd1 backup-dc: repl( Ahead -> PausedSyncS )
[ +0.009500] drbd storage/0 drbd1 backup-dc: Began resync as PausedSyncS (will sync 12844472428 KB [3211118107 bits set]).
[ +0.264396] drbd storage/0 drbd1 backup-dc: repl( PausedSyncS -> Ahead )
[ +1.052604] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source
[ +0.004883] drbd storage/0 drbd1 backup-dc: helper command: /sbin/drbdadm before-resync-source exit code 0
[ +0.008559] drbd storage/0 drbd1 backup-dc: repl( Ahead -> PausedSyncS )
[ +0.008532] drbd storage/0 drbd1 backup-dc: Began resync as PausedSyncS (will sync 12844472428 KB [3211118107 bits set]).

ENV: DRBD 9.1.7 Oracle Linux 8.6(lattest updates, but same with few months old ackages)
Full configuration in attachment, congestion is only configured for backup storage(backup-dc) node because is way slower.
storage.txt

@Lathanderjk Lathanderjk changed the title full resync always stuck in congested (behind) state after few days Full resync always stuck in congested (behind) state after few days Aug 22, 2022
@Lathanderjk
Copy link
Author

Zero progress was actually because of dynamic sync-rate controller, after switching off congestion control resync start but never finished either. After setting fixed rate "c-plan-ahead 0 and "resync-rate 50M" everything works as expected, fixed rate is still better than no rate at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant