Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: avoid initial 1s delay and use exponential backoff on spec / status checks for LogicalVolume #903

Merged
merged 1 commit into from May 15, 2024

Conversation

jakobmoellerdev
Copy link
Contributor

@jakobmoellerdev jakobmoellerdev commented Apr 25, 2024

fix #904

@jakobmoellerdev jakobmoellerdev requested a review from a team as a code owner April 25, 2024 15:25
@jakobmoellerdev jakobmoellerdev changed the title fix: avoid initial 1s delay and use exponential backoff fix: avoid initial 1s delay and use exponential backoff on spec / status checks for LogicalVolume Apr 25, 2024
@jakobmoellerdev jakobmoellerdev force-pushed the avoid-1s-wait-on-loop branch 4 times, most recently from 1ac3277 to f4af94b Compare April 25, 2024 19:30
internal/driver/internal/k8s/logicalvolume_service.go Outdated Show resolved Hide resolved
internal/driver/internal/k8s/logicalvolume_service.go Outdated Show resolved Hide resolved
internal/driver/internal/k8s/logicalvolume_service.go Outdated Show resolved Hide resolved
internal/driver/internal/k8s/logicalvolume_service.go Outdated Show resolved Hide resolved
internal/driver/internal/k8s/logicalvolume_service.go Outdated Show resolved Hide resolved
internal/driver/internal/k8s/logicalvolume_service.go Outdated Show resolved Hide resolved
internal/driver/internal/k8s/logicalvolume_service.go Outdated Show resolved Hide resolved
internal/driver/internal/k8s/logicalvolume_service.go Outdated Show resolved Hide resolved
internal/driver/internal/k8s/logicalvolume_service.go Outdated Show resolved Hide resolved
internal/driver/internal/k8s/logicalvolume_service.go Outdated Show resolved Hide resolved
@jakobmoellerdev jakobmoellerdev force-pushed the avoid-1s-wait-on-loop branch 2 times, most recently from 27a1887 to be92bdb Compare April 26, 2024 07:54
@jakobmoellerdev
Copy link
Contributor Author

I believe I spotted a side effect of this change which is kind of strange. There is currently a failure in the resize test while checking for failure events. However, when isolating this test to run alone the failure doesnt occur, making me believe this is actually a regression from a different test.

@jakobmoellerdev
Copy link
Contributor Author

I think something is completely off here. The tests flake hard and im trying to understand why. This looks like a race condition somewhere, still trying to identify whats happening

@toshipp
Copy link
Contributor

toshipp commented Apr 26, 2024

I'll check it next week, but responses may be delayed due to the holiday period.

@jakobmoellerdev
Copy link
Contributor Author

jakobmoellerdev commented Apr 26, 2024

I have noticed an interesting error message that is concerning to me as it leads me to believe that there might be an issue on the external-resizer patch:

│ Name:             topo-pvc.17c9d3a398b23494                                                                                                                                                                                                                                   │
│ Namespace:        e2etest-xgwhounmef                                                                                                                                                                                                                                          │
│ Labels:           <none>                                                                                                                                                                                                                                                      │
│ Annotations:      <none>                                                                                                                                                                                                                                                      │
│ API Version:      v1                                                                                                                                                                                                                                                          │
│ Count:            1                                                                                                                                                                                                                                                           │
│ Event Time:       <nil>                                                                                                                                                                                                                                                       │
│ First Timestamp:  2024-04-26T12:17:12Z                                                                                                                                                                                                                                        │
│ Involved Object:                                                                                                                                                                                                                                                              │
│   API Version:       v1                                                                                                                                                                                                                                                       │
│   Kind:              PersistentVolumeClaim                                                                                                                                                                                                                                    │
│   Name:              topo-pvc                                                                                                                                                                                                                                                 │
│   Namespace:         e2etest-xgwhounmef                                                                                                                                                                                                                                       │
│   Resource Version:  1509                                                                                                                                                                                                                                                     │
│   UID:               d447f77e-fcb4-49f7-acea-73d26b524755                                                                                                                                                                                                                     │
│ Kind:                Event                                                                                                                                                                                                                                                    │
│ Last Timestamp:      2024-04-26T12:17:12Z                                                                                                                                                                                                                                     │
│ Message:             Mark PVC "e2etest-xgwhounmef/topo-pvc" as file system resize required failed: can't patch status of  PVC e2etest-xgwhounmef/topo-pvc with Operation cannot be fulfilled on persistentvolumeclaims "topo-pvc": the object has been modified; please apply │
│  your changes to the latest version and try again                                                                                                                                                                                                                             │
│ Metadata:                                                                                                                                                                                                                                                                     │
│   Creation Timestamp:  2024-04-26T12:17:12Z                                                                                                                                                                                                                                   │
│   Resource Version:    1524                                                                                                                                                                                                                                                   │
│   UID:                 0199e840-7f9a-422e-b44e-c3014f8f5c0e                                                                                                                                                                                                                   │
│ Reason:                VolumeResizeFailed 

This comes from the external-resizer PVC update for a FileSystemResizeRequired condition failing with a collision and is an issue in our tests because we test for VolumeExpansionFailed events. In this case the PVC is updated quite fast so the Event is triggered but we have an immediate success after.

test/e2e/e2e_test.go Outdated Show resolved Hide resolved
charts/topolvm/templates/controller/deployment.yaml Outdated Show resolved Hide resolved
internal/driver/internal/k8s/logicalvolume_service.go Outdated Show resolved Hide resolved
Copy link
Contributor

@toshipp toshipp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@toshipp
Copy link
Contributor

toshipp commented May 13, 2024

@jakobmoellerdev
If you wish to squash some commits, please tell us.

@jakobmoellerdev
Copy link
Contributor Author

I will squash

Signed-off-by: Jakob Möller <jmoller@redhat.com>
@jakobmoellerdev
Copy link
Contributor Author

I have the fix for external resizer not sending events in merged now, so once we update to the 1.30 release of resizer, the event test will no longer have to contain the special check.

@toshipp
Copy link
Contributor

toshipp commented May 14, 2024

I'm glad to hear that. I assume this is yours. Great work! Thank you @jakobmoellerdev

@daichimukai daichimukai merged commit 8eec8e5 into topolvm:main May 15, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

LogicalVolumeService updates wait for at least 1s and do not use backoffs
3 participants