Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd: prevent calling rbdVolume.Destroy() after an error #4564

Open
wants to merge 1 commit into
base: devel
Choose a base branch
from

Conversation

nixpanic
Copy link
Member

@nixpanic nixpanic commented Apr 17, 2024

It seems possible that the .Destroy() function is called on a nil
pointer. This would cause a panic in the node-plugin.

Depending on how far GenVolFromVolID() comes, the returned rbdVolume can
be connected. If an error occurs, the rbdVolume should not be connected
at all, so call the .Destroy() function in those cases too.

Fixes: #4562

@mergify mergify bot added the component/rbd Issues related to RBD label Apr 17, 2024
It seems possible that the .Destroy() function is called on a nil
pointer. This would cause a panic in the node-plugin.

Depending on how far GenVolFromVolID() comes, the returned rbdVolume can
be connected. If an error occurs, the rbdVolume should not be connected
at all, so call the .Destroy() function in those cases too.

Fixes: ceph#4562
Signed-off-by: Niels de Vos <ndevos@ibm.com>
@nixpanic nixpanic added the bug Something isn't working label Apr 17, 2024
Copy link
Collaborator

@Madhu-1 Madhu-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nixpanic sadly we dont have any CI for RDR workflow, we need to test this code manually to ensure nothing breaks

Comment on lines +1199 to +1203
defer func() {
if err != nil {
vol.Destroy()
}
}()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case of err the volume is already destroyed above, do we need to destroy it again? if yes , can you please add check check for vol !=nil

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generateVolumeFromVolumeID() should now only return a connected volume if no error occurred. But it is possible that this function itself detects an error below. In that case, the .Destroy() function should be called too.

Destroying a volume that is already destroyed is fine, extra checks for that just make the code more complex.

Comment on lines +1212 to +1214
if vol != nil {
vol.Destroy()
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when will we hit this one? In case of error we are already doing the Destory and in case of non-nil error case the vol is returned at line 1197

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me if this can be hit, better destroy something that could have been at line 1194.

If the vol was connected, and generateVolumeFromMapping() overwrites that variable, the 1st vol would leak and no .Destroy() would be done on it.

@nixpanic
Copy link
Member Author

/test ci/centos/mini-e2e-helm/k8s-1.29

@nixpanic
Copy link
Member Author

/test ci/centos/mini-e2e-helm/k8s-1.29

Failed with a panic:

  I0417 16:06:16.202797       1 utils.go:198] ID: 6 Req-ID: 0001-0024-d4674b32-7792-4ac0-980b-f38fba668959-0000000000000002-d51f314b-dd4c-4047-bd31-1e66123f821e GRPC call: /csi.v1.Controller/DeleteVolume
  I0417 16:06:16.202977       1 utils.go:199] ID: 6 Req-ID: 0001-0024-d4674b32-7792-4ac0-980b-f38fba668959-0000000000000002-d51f314b-dd4c-4047-bd31-1e66123f821e GRPC request: {"secrets":"***stripped***","volume_id":"0001-0024-d4674b32-7792-4ac0-980b-f38fba668959-0000000000000002-d51f314b-dd4c-4047-bd31-1e66123f821e"}
  E0417 16:06:16.228637       1 omap.go:80] ID: 6 Req-ID: 0001-0024-d4674b32-7792-4ac0-980b-f38fba668959-0000000000000002-d51f314b-dd4c-4047-bd31-1e66123f821e omap not found (pool="replicapool", namespace="", name="csi.volume.d51f314b-dd4c-4047-bd31-1e66123f821e"): rados: ret=-2, No such file or directory
  W0417 16:06:16.228693       1 voljournal.go:729] ID: 6 Req-ID: 0001-0024-d4674b32-7792-4ac0-980b-f38fba668959-0000000000000002-d51f314b-dd4c-4047-bd31-1e66123f821e unable to read omap keys: pool or key missing: key not found: rados: ret=-2, No such file or directory
  E0417 16:06:16.234548       1 rbd_journal.go:690] ID: 6 Req-ID: 0001-0024-d4674b32-7792-4ac0-980b-f38fba668959-0000000000000002-d51f314b-dd4c-4047-bd31-1e66123f821e failed to get image id replicapool/csi-vol-d51f314b-dd4c-4047-bd31-1e66123f821e: image not found: RBD image not found
  SIGSEGV: segmentation violation
  PC=0x7f92b28feb5b m=5 sigcode=128
  signal arrived during cgo execution

  goroutine 76 [syscall]:
  runtime.cgocall(0x1c4d850, 0xc0009a4fa8)
  	/usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc0009a4f80 sp=0xc0009a4f48 pc=0x416e8b
  github.com/ceph/go-ceph/rbd._Cfunc_rbd_open(0x7f925010a550, 0x7f92501a15c0, 0xc00008a788, 0x0)
  	_cgo_gotypes.go:2266 +0x4b fp=0xc0009a4fa8 sp=0xc0009a4f80 pc=0x1bba6eb
  github.com/ceph/go-ceph/rbd.OpenImage.func3(0xc00005ac30?, 0x2c?, 0x27a9080?, 0x7f92ab9203c8?)
  	/go/src/github.com/ceph/ceph-csi/vendor/github.com/ceph/go-ceph/rbd/rbd.go:1096 +0x9b fp=0xc0009a5000 sp=0xc0009a4fa8 pc=0x1bc035b
  github.com/ceph/go-ceph/rbd.OpenImage(0xc00071f8e0, {0xc00005ac30, 0x2c}, {0x0, 0x0})
  	/go/src/github.com/ceph/ceph-csi/vendor/github.com/ceph/go-ceph/rbd/rbd.go:1096 +0x13e fp=0xc0009a5098 sp=0xc0009a5000 pc=0x1bc005e
  github.com/ceph/ceph-csi/internal/rbd.(*rbdImage).open(0xc0000e6240)
  	/go/src/github.com/ceph/ceph-csi/internal/rbd/rbd_util.go:508 +0x45 fp=0xc0009a5100 sp=0xc0009a5098 pc=0x1bf3865
  github.com/ceph/ceph-csi/internal/rbd.(*rbdImage).GetImageMirroringInfo(0xc0000e6240)
  	/go/src/github.com/ceph/ceph-csi/internal/rbd/mirror.go:62 +0x3f fp=0xc0009a5190 sp=0xc0009a5100 pc=0x1bdacdf
  github.com/ceph/ceph-csi/internal/rbd.cleanupRBDImage({0x27c9b10, 0xc0009bc240}, 0xc0000e6240, 0x64?)
  	/go/src/github.com/ceph/ceph-csi/internal/rbd/controllerserver.go:970 +0x47 fp=0xc0009a5388 sp=0xc0009a5190 pc=0x1bce7c7
  github.com/ceph/ceph-csi/internal/rbd.(*ControllerServer).DeleteVolume(0xc0009182c0, {0x27c9b10, 0xc0009bc240}, 0xc000882900)
  	/go/src/github.com/ceph/ceph-csi/internal/rbd/controllerserver.go:963 +0x68f fp=0xc0009a56a0 sp=0xc0009a5388 pc=0x1bcdccf
  github.com/container-storage-interface/spec/lib/go/csi._Controller_DeleteVolume_Handler.func1({0x27c9b10, 0xc0009bc240}, {0x208ef40?, 0xc000882900})
  	/go/src/github.com/ceph/ceph-csi/vendor/github.com/container-storage-interface/spec/lib/go/csi/csi.pb.go:6605 +0x72 fp=0xc0009a56e0 sp=0xc0009a56a0 pc=0x1a2fe32
  github.com/ceph/ceph-csi/internal/csi-common.panicHandler({0x27c9b10?, 0xc0009bc240?}, {0x208ef40?, 0xc000882900?}, 0xc0009be120?, 0xc00071f550?)
  	/go/src/github.com/ceph/ceph-csi/internal/csi-common/utils.go:226 +0x77 fp=0xc0009a5750 sp=0xc0009a56e0 pc=0x1af4197
  github.com/ceph/ceph-csi/internal/csi-common.NewMiddlewareServerOption.ChainUnaryServer.func2.1({0x27c9b10?, 0xc0009bc240?}, {0x208ef40?, 0xc000882900?})
  	/go/src/github.com/ceph/ceph-csi/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:48 +0x45 fp=0xc0009a5790 sp=0xc0009a5750 pc=0x1af3705
  github.com/ceph/ceph-csi/internal/csi-common.logGRPC({0x27c9b10, 0xc0009bc240}, {0x208ef40?, 0xc000882900?}, 0x1d74d20?, 0xc000882940)
  	/go/src/github.com/ceph/ceph-csi/internal/csi-common/utils.go:201 +0x183 fp=0xc0009a5840 sp=0xc0009a5790 pc=0x1af3f23
  github.com/ceph/ceph-csi/internal/csi-common.NewMiddlewareServerOption.ChainUnaryServer.func2.1({0x27c9b10?, 0xc0009bc240?}, {0x208ef40?, 0xc000882900?})
  	/go/src/github.com/ceph/ceph-csi/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:48 +0x45 fp=0xc0009a5880 sp=0xc0009a5840 pc=0x1af3705
  github.com/ceph/ceph-csi/internal/csi-common.contextIDInjector({0x27c9b10, 0xc000995e30}, {0x208ef40, 0xc000882900}, 0xc0005d7818?, 0xc000882980)
  	/go/src/github.com/ceph/ceph-csi/internal/csi-common/utils.go:189 +0x132 fp=0xc0009a58e8 sp=0xc0009a5880 pc=0x1af3d52
  github.com/ceph/ceph-csi/internal/csi-common.NewMiddlewareServerOption.ChainUnaryServer.func2({0x27c9b10, 0xc000995e30}, {0x208ef40, 0xc000882900}, 0xc0000e2940, 0x1ea7b40?)
  	/go/src/github.com/ceph/ceph-csi/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:53 +0x135 fp=0xc0009a5950 sp=0xc0009a58e8 pc=0x1af3655
  github.com/container-storage-interface/spec/lib/go/csi._Controller_DeleteVolume_Handler({0x2146ce0?, 0xc0009182c0}, {0x27c9b10, 0xc000995e30}, 0xc00013c400, 0xc00050e090)
  	/go/src/github.com/ceph/ceph-csi/vendor/github.com/container-storage-interface/spec/lib/go/csi/csi.pb.go:6607 +0x135 fp=0xc0009a59a8 sp=0xc0009a5950 pc=0x1a2fd15
  google.golang.org/grpc.(*Server).processUnaryRPC(0xc0008da400, {0x27c9b10, 0xc000995da0}, {0x27d2460, 0xc00075f380}, 0xc00015e5a0, 0xc00050e240, 0x3a98a58, 0x0)
  	/go/src/github.com/ceph/ceph-csi/vendor/google.golang.org/grpc/server.go:1369 +0xe23 fp=0xc0009a5d98 sp=0xc0009a59a8 pc=0x1a06243
  google.golang.org/grpc.(*Server).handleStream(0xc0008da400, {0x27d2460, 0xc00075f380}, 0xc00015e5a0)
  	/go/src/github.com/ceph/ceph-csi/vendor/google.golang.org/grpc/server.go:1780 +0x1016 fp=0xc0009a5f78 sp=0xc0009a5d98 pc=0x1a0b456
  google.golang.org/grpc.(*Server).serveStreams.func2.1()
  	/go/src/github.com/ceph/ceph-csi/vendor/google.golang.org/grpc/server.go:1019 +0x8b fp=0xc0009a5fe0 sp=0xc0009a5f78 pc=0x1a0424b
  runtime.goexit()
  	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0009a5fe8 sp=0xc0009a5fe0 pc=0x4813c1
  created by google.golang.org/grpc.(*Server).serveStreams.func2 in goroutine 82
  	/go/src/github.com/ceph/ceph-csi/vendor/google.golang.org/grpc/server.go:1030 +0x135

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component/rbd Issues related to RBD
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect Error Message when Volume Mount Fails
2 participants