Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to delete all the accounts in a loop, got internal error while deleting few of them #7920

Open
rkomandu opened this issue Mar 21, 2024 · 14 comments
Assignees
Labels
Milestone

Comments

@rkomandu
Copy link
Collaborator

rkomandu commented Mar 21, 2024

Environment info

  • NooBaa Version: VERSION
  • Platform: Kubernetes 1.14.1 | minikube 1.1.1 | OpenShift 4.1 | other: specify

Standalone Noobaa with ODF 4.15.0 (d/s build ) 0313 --> noobaa-core-5.15.0-20240313.el9.x86_64

Actual behavior

  1. Created accounts and deleting them in a loop
    2
    for i in seq 8100 9000; do mms3 account delete s3user-$i; done

Account s3user-8369 deleted successfully
Account s3user-8370 deleted successfully
Account s3user-8371 deleted successfully
Account s3user-8372 deleted successfully
Account s3user-8373 deleted successfully
Account s3user-8374 deleted successfully
Account s3user-8375 deleted successfully
Account s3user-8376 deleted successfully
Failed to execute command for Account s3user-8377: The server encountered an internal error. Please retry the request --> here
Account s3user-8378 deleted successfully
Account s3user-8379 deleted successfully
Account s3user-8380 deleted successfully
Account s3user-8381 deleted successfully
Account s3user-8382 deleted successfully

[root@rkomandu-ip-cls-x-worker2 ~]# mms3 account list | grep 8377
s3user-8377 /mnt/fs1/s3user-8377-dir 8377 8377

[root@rkomandu-ip-cls-x-worker2 ~]# mms3 account list | grep 8377
s3user-8377 /mnt/fs1/s3user-8377-dir 8377 8377
[root@rkomandu-ip-cls-x-worker2 ~]# mms3 account list | grep 8376
[root@rkomandu-ip-cls-x-worker2 ~]# mms3 account list | grep 8378

[root@rkomandu-ip-cls-x-worker2 ~]# ls -ld /mnt/fs1/s3user-8377-dir
drwxrwx--- 2 8377 8377 4096 Mar 14 10:27 /mnt/fs1/s3user-8377-dir

[root@rkomandu-ip-cls-x-worker2 ~]# ls -ltr /mnt/fs1/s3user-8377-dir
total 0

Account s3user-8522 deleted successfully
Account s3user-8523 deleted successfully
Account s3user-8524 deleted successfully
Account s3user-8525 deleted successfully
Account s3user-8526 deleted successfully
Account s3user-8527 deleted successfully
Account s3user-8528 deleted successfully
Failed to execute command for Account s3user-8529: The server encountered an internal error. Please retry the request --> here
Account s3user-8530 deleted successfully
Account s3user-8531 deleted successfully
Account s3user-8532 deleted successfully
..
Account s3user-8966 deleted successfully
Account s3user-8967 deleted successfully
Account s3user-8968 deleted successfully
Account s3user-8969 deleted successfully
Account s3user-8970 deleted successfully
Account s3user-8971 deleted successfully
Failed to execute command for Account s3user-8972: The server encountered an internal error. Please retry the request --> here
Account s3user-8973 deleted successfully
Account s3user-8974 deleted successfully
Account s3user-8975 deleted successfully
Account s3user-8976 deleted successfully
Failed to execute command for Account s3user-8977: The server encountered an internal error. Please retry the request --> here
Account s3user-8978 deleted successfully

[root@rkomandu-ip-cls-x-worker2 ~]# mms3 account list | grep 8972
s3user-8972 /mnt/fs1/s3user-8972-dir 8972 8972

[root@rkomandu-ip-cls-x-worker2 ~]# mms3 account list | grep 8977
s3user-8977 /mnt/fs1/s3user-8977-dir 8977 8977

we opened defect internally (RTC 327700) , and Ramya figured out that it is due to Noobaa problem.

noobaa_events.log
-------------------------
{"timestamp":"2024-03-19T03:35:46.501712-07:00","message":"{\"code\":\"noobaa_account_deleted\",\"message\":\"Account deleted\",\"description\":\"Noobaa Acc
ount deleted\",\"entity_type\":\"NODE\",\"event_type\":\"INFO\",\"scope\":\"NODE\",\"severity\":\"INFO\",\"state\":\"HEALTHY\",\"arguments\":{\"account\":\"
s3user-8970\"},\"pid\":515114}","host":"rkomandu-ip-cls-x-worker1","severity":"notice","facility":"local2","syslog-tag":"node[515114]:","source":"node"}
{"timestamp":"2024-03-19T03:35:50.833533-07:00","message":"{\"code\":\"noobaa_account_deleted\",\"message\":\"Account deleted\",\"description\":\"Noobaa Account deleted\",\"entity_type\":\"NODE\",\"event_type\":\"INFO\",\"scope\":\"NODE\",\"severity\":\"INFO\",\"state\":\"HEALTHY\",\"arguments\":{\"account\":\"s3user-8971\"},\"pid\":515690}","host":"rkomandu-ip-cls-x-worker1","severity":"notice","facility":"local2","syslog-tag":"node[515690]:","source":"node"}
{"timestamp":"2024-03-19T03:35:58.484860-07:00","message":"{\"code\":\"noobaa_account_deleted\",\"message\":\"Account deleted\",\"description\":\"Noobaa Account deleted\",\"entity_type\":\"NODE\",\"event_type\":\"INFO\",\"scope\":\"NODE\",\"severity\":\"INFO\",\"state\":\"HEALTHY\",\"arguments\":{\"account\":\"s3user-8973\"},\"pid\":516371}","host":"rkomandu-ip-cls-x-worker1","severity":"notice","facility":"local2","syslog-tag":"node[516371]:","source":"node"}
{"timestamp":"2024-03-19T03:36:02.950806-07:00","message":"{\"code\":\"noobaa_account_deleted\",\"message\":\"Account deleted\",\"description\":\"Noobaa Account deleted\",\"entity_type\":\"NODE\",\"event_type\":\"INFO\",\"scope\":\"NODE\",\"severity\":\"INFO\",\"state\":\"HEALTHY\",\"arguments\":{\"account\":\"s3user-8974\"},\"pid\":516690}","host":"rkomandu-ip-cls-x-worker1","severity":"notice","facility":"local2","syslog-tag":"node[516690]:","source":"node"}
{"timestamp":"2024-03-19T03:36:07.077782-07:00","message":"{\"code\":\"noobaa_account_deleted\",\"message\":\"Account deleted\",\"description\":\"Noobaa Account deleted\",\"entity_type\":\"NODE\",\"event_type\":\"INFO\",\"scope\":\"NODE\",\"severity\":\"INFO\",\"state\":\"HEALTHY\",\"arguments\":{\"account\":\"s3user-8975\"},\"pid\":517290}","host":"rkomandu-ip-cls-x-worker1","severity":"notice","facility":"local2","syslog-tag":"node[517290]:","source":"node"}

as you could observe no entry for the s3user-8972 delete request

Expected behavior

It shouldn't show

Steps to reproduce

Create accounts in a loop say 2K and then try to delete, may be along with buckets as well. Try deleting them from the CES nodes

More information - Screenshots / Logs / Other output

@rkomandu rkomandu added the NS-FS label Mar 21, 2024
@naveenpaul1 naveenpaul1 self-assigned this Mar 21, 2024
@naveenpaul1 naveenpaul1 added the Non Containerized Non containerized label Mar 21, 2024
@naveenpaul1
Copy link
Contributor

naveenpaul1 commented Mar 21, 2024

@rkomandu Please run the test after updating the log level, and share the actual error log.
you can search for the staring NSFS Manage command: exit on error

@rkomandu
Copy link
Collaborator Author

yes it is reporting as that only.

Deleting the account 2k in a loop with sleep 10 interval and observed multiple times the Failed to execute command


Account s3user-12394 deleted successfully
Account s3user-12395 deleted successfully
Account s3user-12396 deleted successfully
Account s3user-12397 deleted successfully
Failed to execute command for Account s3user-12398: The server encountered an internal error. Please retry the request
Account s3user-12399 deleted successfully
Failed to execute command for Account s3user-12400: The server encountered an internal error. Please retry the request
Failed to execute command for Account s3user-12401: The server encountered an internal error. Please retry the request
Account s3user-12402 deleted successfully
Account s3user-12403 deleted successfully


Account s3user-12397 deleted successfully
Failed to execute command for Account s3user-12398: The server encountered an internal error. Please retry the request
Account s3user-12399 deleted successfully
Failed to execute command for Account s3user-12400: The server encountered an internal error. Please retry the request
Failed to execute command for Account s3user-12401: The server encountered an internal error. Please retry the request

Now let us look at the noobaa.log for the 12397 and 12399 deletion and for the 12398, it had come as "NSFS Manage command"

in noobaa_event.log

{"timestamp":"2024-03-22T00:26:20.271279-07:00","message":"{\"code\":\"noobaa_account_deleted\",\"message\":\"Account deleted\",\"description\":\"Noobaa Accou
nt deleted\",\"entity_type\":\"NODE\",\"event_type\":\"INFO\",\"scope\":\"NODE\",\"severity\":\"INFO\",\"state\":\"HEALTHY\",\"arguments\":{\"account\":\"s3us
er-12397\"},\"pid\":3271588}","host":"rkomandu-ip-cls-x-worker1","severity":"notice","facility":"local2","syslog-tag":"node[3271588]:","source":"node"}

Now if we check at the above timestamp in the below noobaa.log for the s3user-12397, s3user-12398 and s3user-12399 account deletion


2024-03-22T00:26:20.258310-07:00 rkomandu-ip-cls-x-worker1 node[3271588]: [/3271588] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: NamespaceFS._open_file: mode=r /mnt/ces-shared-root/ces/s3-config/accounts
2024-03-22T00:26:20.259054-07:00 rkomandu-ip-cls-x-worker1 node[3271588]: [/3271588] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: NamespaceFS._open_file: mode=r /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12397.json
2024-03-22T00:26:20.259709-07:00 rkomandu-ip-cls-x-worker1 node[3271588]: [/3271588] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: native_fs_utils: delete_config_file config_path: /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12397.json is_gpfs: ESC[33mtrueESC[39m
2024-03-22T00:26:20.259808-07:00 rkomandu-ip-cls-x-worker1 node[3271588]: [/3271588] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: native_fs_utils: delete_config_file unlinking: /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12397.json is_gpfs= ESC[33mtrueESC[39m
2024-03-22T00:26:20.260664-07:00 rkomandu-ip-cls-x-worker1 node[3271588]: [/3271588] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: Namespace_fs._delete_version_id unlink: File {} ESC[33m20ESC[39m /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12397.json File {} ESC[33m21ESC[39m
2024-03-22T00:26:20.264526-07:00 rkomandu-ip-cls-x-worker1 node[3271588]: [/3271588] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: native_fs_utils: delete_config_file done /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12397.json
2024-03-22T00:26:20.464058-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ESC[36m   [L1]ESC[39m core.server.bg_services.semaphore_monitor:: semaphore_monitor: START
2024-03-22T00:26:20.464601-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m184549376ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
...
...
2024-03-22T00:26:20.867639-07:00 rkomandu-ip-cls-x-worker1 node[3062942]: [nsfs/3062942] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m184549376ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:20.867906-07:00 rkomandu-ip-cls-x-worker1 node[3062942]: [nsfs/3062942] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m20971520ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:20.867943-07:00 rkomandu-ip-cls-x-worker1 node[3062942]: [nsfs/3062942] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m33554432ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:20.867968-07:00 rkomandu-ip-cls-x-worker1 node[3062942]: [nsfs/3062942] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m2097152ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:20.956759-07:00 rkomandu-ip-cls-x-worker1 node[3271531]: [/3271531] ESC[36m   [L1]ESC[39m core.cmd.health:: Error: Config file path should be a valid path /mnt/ces-shared-root/ces/s3-config/buckets/bucket-12890.json [Error: No such file or directory] { code: ESC[32m'ENOENT'ESC[39m }
2024-03-22T00:26:21.309702-07:00 rkomandu-ip-cls-x-worker1 node[3062948]: [nsfs/3062948] ESC[36m   [L1]ESC[39m core.server.bg_services.semaphore_monitor:: semaphore_monitor: START
2024-03-22T00:26:21.310118-07:00 rkomandu-ip-cls-x-worker1 node[3062948]: [nsfs/3062948] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m184549376ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:21.310274-07:00 rkomandu-ip-cls-x-worker1 node[3062948]: [nsfs/3062948] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m20971520ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:21.310301-07:00 rkomandu-ip-cls-x-worker1 node[3062948]: [nsfs/3062948] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m33554432ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:21.310324-07:00 rkomandu-ip-cls-x-worker1 node[3062948]: [nsfs/3062948] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m2097152ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:21.493171-07:00 rkomandu-ip-cls-x-worker1 node[3062959]: [nsfs/3062959] ESC[36m   [L1]ESC[39m core.server.bg_services.semaphore_monitor:: semaphore_monitor: START
2024-03-22T00:26:21.493778-07:00 rkomandu-ip-cls-x-worker1 node[3062959]: [nsfs/3062959] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m184549376ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m

..

2024-03-22T00:26:28.675289-07:00 rkomandu-ip-cls-x-worker1 node[3272188]: [/3272188] ^[[36m  [LOG]^[[39m CONSOLE:: read_rand_seed: got 32 bytes from /dev/random, total 32 ...
2024-03-22T00:26:28.675459-07:00 rkomandu-ip-cls-x-worker1 node[3272188]: [/3272188] ^[[36m  [LOG]^[[39m CONSOLE:: read_rand_seed: closing fd ...
2024-03-22T00:26:28.675954-07:00 rkomandu-ip-cls-x-worker1 node[3272188]: [/3272188] ^[[36m  [LOG]^[[39m CONSOLE:: init_rand_seed: seeding with 32 bytes
2024-03-22T00:26:28.677312-07:00 rkomandu-ip-cls-x-worker1 node[3272188]: [/3272188] ^[[36m   [L1]^[[39m core.cmd.manage_nsfs:: nsfs.check_and_create_config_dirs: config dir exists: /mnt/ces-shared-root/ces/s3-config/buckets
2024-03-22T00:26:28.678024-07:00 rkomandu-ip-cls-x-worker1 node[3272188]: [/3272188] ^[[36m   [L1]^[[39m core.cmd.manage_nsfs:: nsfs.check_and_create_config_dirs: config dir exists: /mnt/ces-shared-root/ces/s3-config/accounts
2024-03-22T00:26:28.678507-07:00 rkomandu-ip-cls-x-worker1 node[3272188]: [/3272188] ^[[36m   [L1]^[[39m core.cmd.manage_nsfs:: nsfs.check_and_create_config_dirs: config dir exists: /mnt/ces-shared-root/ces/s3-config/access_keys
**2024-03-22T00:26:28.944651-07:00 rkomandu-ip-cls-x-worker1 node[3272188]: [/3272188] ^[[36m   [L1]^[[39m core.cmd.manage_nsfs:: NSFS Manage command: exit on error Error: No such file or directory**
2024-03-22T00:26:30.464455-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ^[[36m   [L1]^[[39m core.server.bg_services.semaphore_monitor:: semaphore_monitor: START
2024-03-22T00:26:30.464796-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ^[[36m   [L1]^[[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ^[[33m184549376^[[39m waiting_value ^[[33m0^[[39m buffers length ^[[33m0^[[39m



2024-03-22T00:26:37.530367-07:00 rkomandu-ip-cls-x-worker1 node[3272625]: [/3272625] ESC[36m  [LOG]ESC[39m CONSOLE:: generate_entropy: adding entropy: dd if=/dev/vda bs=1048576 count=32 skip=255731 | md5sum
2024-03-22T00:26:37.538563-07:00 rkomandu-ip-cls-x-worker1 node[3272625]: [/3272625] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: NamespaceFS._open_file: mode=r /mnt/ces-shared-root/ces/s3-config/accounts
2024-03-22T00:26:37.539654-07:00 rkomandu-ip-cls-x-worker1 node[3272625]: [/3272625] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: NamespaceFS._open_file: mode=r /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12399.json
2024-03-22T00:26:37.540691-07:00 rkomandu-ip-cls-x-worker1 node[3272625]: [/3272625] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: native_fs_utils: delete_config_file config_path: /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12399.json is_gpfs: ESC[33mtrueESC[39m
2024-03-22T00:26:37.540797-07:00 rkomandu-ip-cls-x-worker1 node[3272625]: [/3272625] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: native_fs_utils: delete_config_file unlinking: /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12399.json is_gpfs= ESC[33mtrueESC[39m
2024-03-22T00:26:37.541637-07:00 rkomandu-ip-cls-x-worker1 node[3272625]: [/3272625] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: Namespace_fs._delete_version_id unlink: File {} ESC[33m21ESC[39m /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12399.json File {} ESC[33m23ESC[39m
2024-03-22T00:26:37.552608-07:00 rkomandu-ip-cls-x-worker1 node[3272625]: [/3272625] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: native_fs_utils: delete_config_file done /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12399.json
2024-03-22T00:26:40.464133-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ESC[36m   [L1]ESC[39m core.server.bg_services.semaphore_monitor:: semaphore_monitor: START
2024-03-22T00:26:40.464511-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m184549376ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:40.464650-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m20971520ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:40.464676-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m33554432ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:40.464700-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/30629

where as the account is still there

[root@rkomandu-ip-cls-x-worker1 log]# mms3 account list |grep 12398
s3user-12398            /mnt/fs1/s3user-12398-dir                       12398   12398

[root@rkomandu-ip-cls-x-worker1 log]# mms3 account list |grep 12399

[root@rkomandu-ip-cls-x-worker1 log]# ls -ld /mnt/ces-shared-root/ces/s3-config
s3-config/                s3-config-backup.tar.bz2
[root@rkomandu-ip-cls-x-worker1 log]# ls -ld /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12398.json
-rw------- 1 root root 376 Mar 21 09:12 /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12398.json
[root@rkomandu-ip-cls-x-worker1 log]# less /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12398.json
[root@rkomandu-ip-cls-x-worker1 log]# cat /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12398.json
{"name":"s3user-12398","creation_date":"2024-03-21T16:12:04.264Z","access_keys":[{"access_key":"EdOlObT16fevKyqcDIY0","secret_key":"73Iako95fyfdaMvFySHxq//BfimhW3+D6LLG+4rv"}],"nsfs_account_config":{"uid":12398,"gid":12398,"new_buckets_path":"/mnt/fs1/s3user-12398-dir","fs_backend":"GPFS"},"email":"s3user-12398","allow_bucket_creation":true,"_id":"65fc5c54a19a460e1e094ab4"}

noobaa.log is about 700MB,

omandu-ip-cls-x-worker1 log]# ls -lh /var/log/noobaa.log
-rw-r--r-- 1 root root 733M Mar 22 01:37 /var/log/noobaa.log

for now, you can check from above logs @naveenpaul1 . uploading in GH is not possible and box also has this restriction. I need to delete the older noobaa.log content and have for the current day only to see if it reduces any size. However you can continue from above log snippets i think

@guymguym guymguym added this to the 5.15.3 milestone Apr 21, 2024
@guymguym guymguym modified the milestones: 5.15.3, 5.15.4 May 8, 2024
@romayalon romayalon assigned shirady and unassigned naveenpaul1 May 20, 2024
@shirady
Copy link
Contributor

shirady commented May 21, 2024

Hi @rkomandu,

  1. From what I understand you are creating accounts in a loop and then deleting the accounts in a loop.
    In the details of the issue we can see the example of the account s3user-12398 that was not deleted:
    The server encountered an internal error. Please retry the request
    Since it was not deleted, we can see the config file of it.

  2. I would like to add to the printings of the Internal Error in this issue, it might look like this:
    stdout: '{\n "error": {\n "code": "InternalError",\n "message": "The server encountered an internal error. Please retry the request",\n "detail":

  3. I was not able to reproduce this, I created and deleted over 1,000 accounts with sleep of 1,000 milliseconds.
    You can see that I have them all created (and the directory of the accounts config is empty from any config after this).

➜  grep -i AccountCreated run_test_2048.txt | wc -l
    2048
➜  grep -i AccountDeleted run_test_2048.txt | wc -l
    2048

@rkomandu
Copy link
Collaborator Author

it is a recent noobaa 5.15.3 d/s build of 0514


[root@c83f1-app2 ~]# mms3 account delete s3user5001
Failed to execute command for Account s3user5001: The server encountered an internal error. Please retry the request
[root@c83f1-app2 ~]# mms3 account list s3user5001
Failed to list Account s3user5001: Account does not exist
[root@c83f1-app2 ~]# rpm -qa |grep mms3
gpfs.mms3-5.2.1-0.240521.104312.el9.x86_64

[root@c83f1-app2 ~]# mmhealth node show CES S3 |grep s3user5001
S3                 DEGRADED      16 hours ago      s3_access_denied(s3user5001), s3_storage_not_exist(newbucket-5001)
  s3user5001       DEGRADED      16 hours ago      s3_access_denied(s3user5001)
s3_access_denied      s3user5001      WARNING     16 hours ago         Account does not have access to the storage path mentioned in schema.

it shows the account exists though via mmhealth 

@rkomandu
Copy link
Collaborator Author

@shirady

i am deleting all the accounts and buckets via mms3 cli and came across the similar problem as shown above

bucket delete passed , account delete failed with Internal error

Bucket bucket-10335 deleted successfully
Failed to execute command for Account s3user-10335: The server encountered an internal error. Please retry the request
10335 is done

noobaa cli for status and list for the account is shown below

noobaa-cli account status --name s3user-10335
load_nsfs_nc_config.setting config.NSFS_NC_CONF_DIR /ibm/cesSharedRoot/ces/s3-config
nsfs: config_dir_path=/ibm/cesSharedRoot/ces/s3-config config.json= {
  ENDPOINT_PORT: 6001,
  ENDPOINT_SSL_PORT: 6443,
  GPFS_DL_PATH: '/usr/lpp/mmfs/lib/libgpfs.so',
  NOOBAA_LOG_LEVEL: 'default',
  ENDPOINT_FORKS: 2,
  UV_THREADPOOL_SIZE: 16,
  NSFS_CALCULATE_MD5: false,
  ALLOW_HTTP: false,
  NSFS_NC_STORAGE_BACKEND: 'GPFS',
  NSFS_NC_CONFIG_DIR_BACKEND: 'GPFS',
  NSFS_DIR_CACHE_MAX_DIR_SIZE: 536870912,
  NSFS_DIR_CACHE_MAX_TOTAL_SIZE: 1073741824,
  NC_MASTER_KEYS_STORE_TYPE: 'executable',
  NC_MASTER_KEYS_GET_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_get',
  NC_MASTER_KEYS_PUT_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_put'
}
2024-05-23 08:55:13.218253 [PID-3055247/TID-3055247] FS::GPFS GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
2024-05-23 08:55:13.218355 [PID-3055247/TID-3055247] FS::GPFS found GPFS lib file GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
May-23 8:55:13.548 [/3055247]   [LOG] CONSOLE:: detect_fips_mode: found /proc/sys/crypto/fips_enabled with value 0
(node:3055247) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.

Please migrate your code to use AWS SDK for JavaScript (v3).
For more information, check the migration guide at https://a.co/7PzMCcy
(Use `node --trace-warnings ...` to show where the warning was created)
May-23 8:55:13.579 [/3055247]   [LOG] CONSOLE:: read_rand_seed: reading 32 bytes from /dev/urandom ...
May-23 8:55:13.583 [/3055247]   [LOG] CONSOLE:: read_rand_seed: got 32 bytes from /dev/urandom, total 32 ...
May-23 8:55:13.583 [/3055247]   [LOG] CONSOLE:: read_rand_seed: closing fd ...
May-23 8:55:13.583 [/3055247]   [LOG] CONSOLE:: init_rand_seed: seeding with 32 bytes
{
  "response": {
    "code": "AccountStatus",
    "reply": {
      "_id": "664e0aa633201f0668d9ac6b",
      "name": "s3user-10335",
      "email": "s3user-10335",
      "creation_date": "2024-05-22T15:09:26.881Z",
      "nsfs_account_config": {
        "uid": 10335,
        "gid": 10335,
        "new_buckets_path": "/gpfs/remote_fvt_fs/s3user-10335-dir",
        "fs_backend": "GPFS"
      },
      "allow_bucket_creation": true,
      "master_key_id": "664e04d7f8bda475703e483b"
    }
  }
}

]# noobaa-cli account list --name s3user-10335
load_nsfs_nc_config.setting config.NSFS_NC_CONF_DIR /ibm/cesSharedRoot/ces/s3-config
nsfs: config_dir_path=/ibm/cesSharedRoot/ces/s3-config config.json= {
  ENDPOINT_PORT: 6001,
  ENDPOINT_SSL_PORT: 6443,
  GPFS_DL_PATH: '/usr/lpp/mmfs/lib/libgpfs.so',
  NOOBAA_LOG_LEVEL: 'default',
  ENDPOINT_FORKS: 2,
  UV_THREADPOOL_SIZE: 16,
  NSFS_CALCULATE_MD5: false,
  ALLOW_HTTP: false,
  NSFS_NC_STORAGE_BACKEND: 'GPFS',
  NSFS_NC_CONFIG_DIR_BACKEND: 'GPFS',
  NSFS_DIR_CACHE_MAX_DIR_SIZE: 536870912,
  NSFS_DIR_CACHE_MAX_TOTAL_SIZE: 1073741824,
  NC_MASTER_KEYS_STORE_TYPE: 'executable',
  NC_MASTER_KEYS_GET_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_get',
  NC_MASTER_KEYS_PUT_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_put'
}
2024-05-23 08:55:23.493482 [PID-3055620/TID-3055620] FS::GPFS GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
2024-05-23 08:55:23.493593 [PID-3055620/TID-3055620] FS::GPFS found GPFS lib file GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
May-23 8:55:23.825 [/3055620]   [LOG] CONSOLE:: detect_fips_mode: found /proc/sys/crypto/fips_enabled with value 0
(node:3055620) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.

Please migrate your code to use AWS SDK for JavaScript (v3).
For more information, check the migration guide at https://a.co/7PzMCcy
(Use `node --trace-warnings ...` to show where the warning was created)
May-23 8:55:23.856 [/3055620]   [LOG] CONSOLE:: read_rand_seed: reading 32 bytes from /dev/urandom ...
May-23 8:55:23.861 [/3055620]   [LOG] CONSOLE:: read_rand_seed: got 32 bytes from /dev/urandom, total 32 ...
May-23 8:55:23.861 [/3055620]   [LOG] CONSOLE:: read_rand_seed: closing fd ...
May-23 8:55:23.862 [/3055620]   [LOG] CONSOLE:: init_rand_seed: seeding with 32 bytes
May-23 8:55:24.452 [/3055620]    [L0] core.manage_nsfs.nc_master_key_manager:: init_from_exec: get master keys response status=OK, version=12
{
  "error": {
    "code": "InternalError",
    "message": "The server encountered an internal error. Please retry the request",
    "detail": "Error: master key id is missing in master_keys_by_id\n    at NCMasterKeysManager.decryptSync (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:305:59)\n    at NCMasterKeysManager.decrypt (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:294:21)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async /usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:335:29\n    at async Promise.all (index 0)\n    at async NCMasterKeysManager.decrypt_access_keys (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:333:39)\n    at async /usr/local/noobaa-core/src/cmd/manage_nsfs.js:665:58\n    at async Semaphore.surround (/usr/local/noobaa-core/src/util/semaphore.js:71:84)\n    at async Promise.all (index 9)\n    at async list_config_files (/usr/local/noobaa-core/src/cmd/manage_nsfs.js:660:29)"
  }
}

@shirady
Copy link
Contributor

shirady commented May 23, 2024

Thank you @rkomandu, so we can understand from the error attached that the reason for the internal error is related to encryption (master key id is missing in master_keys_by_id), full edited message:

"detail": "Error: master key id is missing in master_keys_by_id
at NCMasterKeysManager.decryptSync (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:305:59)
at NCMasterKeysManager.decrypt (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:294:21)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async /usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:335:29
at async Promise.all (index 0)
at async NCMasterKeysManager.decrypt_access_keys (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:333:39)
at async /usr/local/noobaa-core/src/cmd/manage_nsfs.js:665:58
async Semaphore.surround (/usr/local/noobaa-core/src/util/semaphore.js:71:84)
async Promise.all (index 9)
async list_config_files (/usr/local/noobaa-core/src/cmd/manage_nsfs.js:660:29)"

This internal error printing came from the account list.

I would verify that when manually deleting the account we see the same details in the error (please run account delete on the account that account).
If you can attach this please so we will have the full information.

cc: @romayalon

@rkomandu
Copy link
Collaborator Author

# noobaa-cli account delete --name s3user-10335
load_nsfs_nc_config.setting config.NSFS_NC_CONF_DIR /ibm/cesSharedRoot/ces/s3-config
nsfs: config_dir_path=/ibm/cesSharedRoot/ces/s3-config config.json= {
  ENDPOINT_PORT: 6001,
  ENDPOINT_SSL_PORT: 6443,
  GPFS_DL_PATH: '/usr/lpp/mmfs/lib/libgpfs.so',
  NOOBAA_LOG_LEVEL: 'default',
  ENDPOINT_FORKS: 2,
  UV_THREADPOOL_SIZE: 16,
  NSFS_CALCULATE_MD5: false,
  ALLOW_HTTP: false,
  NSFS_NC_STORAGE_BACKEND: 'GPFS',
  NSFS_NC_CONFIG_DIR_BACKEND: 'GPFS',
  NSFS_DIR_CACHE_MAX_DIR_SIZE: 536870912,
  NSFS_DIR_CACHE_MAX_TOTAL_SIZE: 1073741824,
  NC_MASTER_KEYS_STORE_TYPE: 'executable',
  NC_MASTER_KEYS_GET_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_get',
  NC_MASTER_KEYS_PUT_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_put'
}
2024-05-23 10:19:24.584673 [PID-3183486/TID-3183486] FS::GPFS GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
2024-05-23 10:19:24.584759 [PID-3183486/TID-3183486] FS::GPFS found GPFS lib file GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
May-23 10:19:24.904 [/3183486]   [LOG] CONSOLE:: detect_fips_mode: found /proc/sys/crypto/fips_enabled with value 0
(node:3183486) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.

Please migrate your code to use AWS SDK for JavaScript (v3).
For more information, check the migration guide at https://a.co/7PzMCcy
(Use `node --trace-warnings ...` to show where the warning was created)
May-23 10:19:24.933 [/3183486]   [LOG] CONSOLE:: read_rand_seed: reading 32 bytes from /dev/urandom ...
May-23 10:19:24.938 [/3183486]   [LOG] CONSOLE:: read_rand_seed: got 32 bytes from /dev/urandom, total 32 ...
May-23 10:19:24.938 [/3183486]   [LOG] CONSOLE:: read_rand_seed: closing fd ...
May-23 10:19:24.938 [/3183486]   [LOG] CONSOLE:: init_rand_seed: seeding with 32 bytes
May-23 10:19:25.531 [/3183486]    [L0] core.manage_nsfs.nc_master_key_manager:: init_from_exec: get master keys response status=OK, version=15
{
  "error": {
    "code": "InternalError",
    "message": "The server encountered an internal error. Please retry the request",
    "detail": "Error: master key id is missing in master_keys_by_id\n    at NCMasterKeysManager.decryptSync (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:305:59)\n    at NCMasterKeysManager.decrypt (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:294:21)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async /usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:335:29\n    at async Promise.all (index 0)\n    at async NCMasterKeysManager.decrypt_access_keys (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:333:39)\n    at async fetch_existing_account_data (/usr/local/noobaa-core/src/cmd/manage_nsfs.js:416:30)\n    at async fetch_account_data (/usr/local/noobaa-core/src/cmd/manage_nsfs.js:385:16)\n    at async account_management (/usr/local/noobaa-core/src/cmd/manage_nsfs.js:336:18)\n    at async main (/usr/local/noobaa-core/src/cmd/manage_nsfs.js:112:13)"
  }
}

Infact i would say, that all account deletes have the same problem. this is a blocker and a high priority one for us now 


@rkomandu
Copy link
Collaborator Author

Every account delete has an issue now

Bucket bucket-10001 deleted successfully
Failed to execute command for Account s3user-10001: The server encountered an internal error. Please retry the request
10001 is done
Bucket bucket-10002 deleted successfully
Failed to execute command for Account s3user-10002: The server encountered an internal error. Please retry the request
10002 is done
Bucket bucket-10003 deleted successfully
Failed to execute command for Account s3user-10003: The server encountered an internal error. Please retry the request
10003 is done
Bucket bucket-10004 deleted successfully
Failed to execute command for Account s3user-10004: The server encountered an internal error. Please retry the request
10004 is done
Bucket bucket-10005 deleted successfully
Failed to execute command for Account s3user-10005: The server encountered an internal error. Please retry the request
10005 is done
Bucket bucket-10006 deleted successfully
Failed to execute command for Account s3user-10006: The server encountered an internal error. Please retry the request
10006 is done
Bucket bucket-10007 deleted successfully
Failed to execute command for Account s3user-10007: The server encountered an internal error. Please retry the request
10007 is done
Bucket bucket-10008 deleted successfully
Failed to execute command for Account s3user-10008: The server encountered an internal error. Please retry the request
10008 is done
Bucket bucket-10009 deleted successfully
Failed to execute command for Account s3user-10009: The server encountered an internal error. Please retry the request
10009 is done
Bucket bucket-10010 deleted successfully
Failed to execute command for Account s3user-10010: The server encountered an internal error. Please retry the request
10010 is done
Bucket bucket-10011 deleted successfully
Failed to execute command for Account s3user-10011: The server encountered an internal error. Please retry the request
10011 is done
Bucket bucket-10012 deleted successfully
Failed to execute command for Account s3user-10012: The server encountered an internal error. Please retry the request
10012 is done

@rkomandu
Copy link
Collaborator Author

Thank you @rkomandu, so we can understand from the error attached that the reason for the internal error is related to encryption (master key id is missing in master_keys_by_id), full edited message:

"detail": "Error: master key id is missing in master_keys_by_id
at NCMasterKeysManager.decryptSync (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:305:59)
at NCMasterKeysManager.decrypt (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:294:21)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async /usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:335:29
at async Promise.all (index 0)
at async NCMasterKeysManager.decrypt_access_keys (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:333:39)
at async /usr/local/noobaa-core/src/cmd/manage_nsfs.js:665:58
async Semaphore.surround (/usr/local/noobaa-core/src/util/semaphore.js:71:84)
async Promise.all (index 9)
async list_config_files (/usr/local/noobaa-core/src/cmd/manage_nsfs.js:660:29)"

This internal error printing came from the account list.

I would verify that when manually deleting the account we see the same details in the error (please run account delete on the account that account). If you can attach this please so we will have the full information.

cc: @romayalon

@shirady , this "internal server" is the latest problem w/r/t master_key as posted in above comments. However when the defect was opened in Mar 3rd week, there is no Enc enabled in d/s ODF 4.15.0. At that time the error is same as now, but don't have the noobaa cli command output

@rkomandu
Copy link
Collaborator Author

this ENC is a new problem with ODF 4.15.3, however the main problem could still be there, as deleting in a loop is what is being in all of my first few updates

@rkomandu
Copy link
Collaborator Author

rkomandu commented May 24, 2024

for the latest problem @romayalon , it is the master_key problem in the CCR. I had updated in the RTC defect https://jazz07.rchland.ibm.com:21443/jazz/web/projects/GPFS#action=com.ibm.team.workitem.viewWorkItem&id=330894
as comment 2

@shirady
the master_key problem is solved as shown below

for example the CCR master_key is

  1  gpfs.ganesha.statdargs.conf
        1  idmapd.conf
       18  _ces_s3.master_keys

# mmccr fget _ces_s3.master_keys /tmp/_ces_s3.master_keys
fget:18

# cat /tmp/_ces_s3.master_keys
{"timestemp":1716479808994,"active_master_key":"664f6740471f94033f84c97b","master_keys_by_id":{"664f6740471f94033f84c97b":{"id":"664f6740471f94033f84c97b","cipher_key":"7uboJzYGucsCimII30BbFCUdgS3zBv/oobwg9TXG0V8=","cipher_iv":"uipRG4v0jVQg8jnR2pcxsA==","encryption_type":"aes-256-gcm"}}}[root@c83f1-app3 s3]#


 cp /ibm/cesSharedRoot/ces/s3-config/accounts/s3user-10001.json /tmp/s3user-10001.json

less /tmp/s3user-10001.json

{"_id":"664dffe41b9d01307a23e986","name":"s3user-10001","email":"s3user-10001","creation_date":"2024-05-22T14:23:32.560Z","access_keys":[{"access_key":"8ca0bWSRFCScwdd4s8JJ","encrypted_secret_key":"k07Bhd5oQ2tphVljeV9Y6pUIjyqQLFoEG9nHqFTWTHmwagrgEh/fBg=="}],"nsfs_account_config":{"uid":10001,"gid":10001,"new_buckets_path":"/gpfs/remote_fvt_fs/s3user-10001-dir","fs_backend":"GPFS"},"allow_bucket_creation":true,"master_key_id":"664dd5d4b9b23ffb8378c6da"}




edit the s3user-10001.json file with the master_key of the CCR as per the /tmp/_ces_s3.master_keys
   
 vi /ibm/cesSharedRoot/ces/s3-config/accounts/s3user-10001.json (update the master_key of CCR) 

 mms3 account delete s3user-10001 --debug
Running Command /usr/lpp/mmfs/bin/mmlsconfig cesSharedRoot
Running Command /usr/lpp/mmfs/bin/mmces service list
Running Command: env LC_ALL=C /usr/local/bin/noobaa-cli account delete --name s3user-10001 2>/dev/null
Execute command failed
info:
{
  "response": {
    "code": "AccountDeleted"
  }
}

error: <nil>
Account s3user-10001 deleted successfully
it showed the command deleted successfully

We need to get this fixed ASAP, otherwise the 5.15.3-0415 build with Enc is not going to work for basic functionality of account delete

@romayalon
Copy link
Contributor

@rkomandu
According to the timestamp of the account, this account (creation_date":"2024-05-22T14:23:32.560) is older than the master key ("timestemp":1716479808994 = Thursday, May 23, 2024 3:56:48.994 PM), this is not a valid situation, you had an old account in this config_dir, this is why you see the missing master key error.

@rkomandu
Copy link
Collaborator Author

@romayalon ,
the updates that are made for the below comments are occurring on the 5.15.3 d/s build, when there are 2K accounts and 2K buckets are created. The master_key are different in the *.json files for many accounts, why they are different not sure, where Ramya is analyzing why the master_keys are regenerated when "we ran in a loop".
Bottom line as the account are unable to delete and this is happening due to the master_key value is different

#7920 (comment)
#7920 (comment)
#7920 (comment)

@romayalon
Copy link
Contributor

@rkomandu @shirady I posted yesterday a fix for the encryption issue, according to @ramya-c3 it's working now, @rkomandu can you try to reproduce with the new code and share if you still see the original internal error now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants