Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECSSYNC keeps crashing #93

Open
Faizel2 opened this issue Nov 3, 2022 · 8 comments
Open

ECSSYNC keeps crashing #93

Faizel2 opened this issue Nov 3, 2022 · 8 comments

Comments

@Faizel2
Copy link

Faizel2 commented Nov 3, 2022

ECSSync keeps crashing with the error below during a CAS migration. upgraded to the latest version. Anything above one thread seems to crash the session. Sufficient memory and resources available.

Please advise

*** Error in `java': free(): invalid next size (normal): 0x00007f8518004490 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7c619)[0x7f864e7a3619]
/usr/local/Centera_SDK/lib/64/libFPStreams64.so(_ZN20FPBasicGenericStream13prepareBufferEv+0x1c2)[0x7f85e4f1cd42]
/usr/local/Centera_SDK/lib/64/libFPCore64.so(_ZN22HPPReadBlobTransaction3runEv+0x4a1)[0x7f85e46beab1]
/usr/local/Centera_SDK/lib/64/libFPCore64.so(_ZN7Cluster8readBlobEP20FPBasicGenericStreamR10FPClipGuidR10FPBlobGuidllllR13FPClipContextR12FPTagContext+0x1f7)[0x7f85e465ebb7]
/usr/local/Centera_SDK/lib/64/libFPCore64.so(_ZN12ClusterCloud8readBlobEP20FPBasicGenericStreamR10FPClipGuidR10FPBlobGuidR13FPClipContextR12FPTagContextllll+0x1de)[0x7f85e466a19e]
/usr/local/Centera_SDK/lib/64/libFPCore64.so(_ZN5FPTag8BlobReadER20FPBasicGenericStreamlli+0x9f0)[0x7f85e46ab810]
/usr/local/Centera_SDK/lib/64/libFPCore64.so(_Z22_FPTag_BlobReadPartialP5FPTagP20FPBasicGenericStreamlll+0x15)[0x7f85e4688ec5]
/usr/local/Centera_SDK/lib/64/libFPLibrary64.so.3.4.757(FPTag_BlobReadPartial+0xb0)[0x7f85e5176640]
/usr/local/Centera_SDK/lib/64/libFPLibrary64.so.3.4.757(Java_com_filepool_natives_FPLibraryNative_FPTag_1BlobReadPartial+0x94)[0x7f85e51a06a4]
[0x7f8639f2bdd9]
======= Memory map: ========
00400000-00401000 r-xp 00000000 fd:00 269292636 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64/jre/bin/java
00600000-00601000 r--p 00000000 fd:00 269292636 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64/jre/bin/java
00601000-00602000 rw-p 00001000 fd:00 269292636 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64/jre/bin/java
00f61000-00f82000 rw-p 00000000 00:00 0 [heap]

@xiaoxin-ren
Copy link
Contributor

Hi @Faizel2 , the crash above is inside CAS lib, not much clue to indicate whether it's an ecssync isue, CAS lib issue or CAS platform issue.
Did it work before you upgrade ecssync? There's no code change related to CAS in recent releases, so it's not straightforward to tell whether it could be a regression. If you look into the errors in /var/log/ecs-sync/ecs-sync.log, you should see which objects hit failure. Does it work if you use CAS tool alone? You'll need to identify whether it's a read issue in source storage or write issue in target storage.
If you are using professional service to do the migration, please ask PS to follow up the troubleshooting. Otherwise, please open a service ticket upon the migration platform so that they can assist to narrow down issue and route it to the right person for further help.

@Faizel2
Copy link
Author

Faizel2 commented Nov 9, 2022

Hi Ren
We had the same issue before the upgrade from 3.2.7 to .3.5.2.
How can we use the CAS Tool alone to verify.
Professional services is involved, we opened a Service Ticket with ECS Support but they cannot assist. there is no option for the migration platform , please advise how to do this.

@xiaoxin-ren
Copy link
Contributor

xiaoxin-ren commented Nov 9, 2022

So this is not regression in Ecssync. A few options to proceed next:

  1. PS should have knowledge to use the CAS tool, what's their investigation result? Is it CAS lib deployment issue, or storage server issue(Centera, ECS)?
  2. The CAS lib is provided by Centera, so it's recommended to get a Service Ticket upon Centera if it's related to CAS lib or CAS read from source storage.
  3. ECS support will be able to assist you to identify any access issue related to CAS bucket in ECS. Please note that Ecssync is an open source project, out of the ECS customer support scope.
  4. We'll help here if the issue is narrowed down to be related to ecssync.

@xiaoxin-ren
Copy link
Contributor

@Faizel2, I see PS is currently investigating the issue. The provided ecs-sync log shows that CAS SDK and ECSSync works fine. It's a random crash in CAS SDK, not related to specific object, Please wait for further update from PS.

@holgerjakob
Copy link

Is there an update on this? We have some large multi billion CAS Migrations upcoming from Gen3 to EXF900.
Just so that we prepare an ECSSync VM and hopefully avoid this issue

@xiaoxin-ren
Copy link
Contributor

@holgerjakob, can you please reach PS or customer support for an update? Engineer investigated the issue and found out that the crash was caused by a scenario of syncing huge blob(4-5GB) running with default 16 threads. The workaround is to sync again by increased memory with reduced threads. You can tune the thread setting back after the hug blobs are successfully copied.
The investigation was done on Nov 10, 2022. I'm curious what caused the communication gap?

@holgerjakob
Copy link

Hi Ren
Thats good to hear. We configure VMs with at least 32 GB of Memory, more often 64 and set the Xmx memory Parameter accordingly.
That was not me who opened the ticket. Just sorting things out prior to using ECSSync on some upcoming large migrations to EXF900.

Thanks for responding
Holger

@holgerjakob
Copy link

Dear all

We did invest some time to come up with a new install guide. In case you are interested
https://www.backup.ch/wp-content/uploads/sites/5/ECS-Sync-Installation-V1.0.pdf

Adjusting the memory parameter is in it. If access to any of the files is not working we can provide links to them.

Take care, Holger

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants