Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when starting the second SN (SMLETHNode: Transferring 100 ETH to local account failed) - MNIST example #181

Open
joaquingarciaatos opened this issue Jun 22, 2023 · 4 comments

Comments

@joaquingarciaatos
Copy link

Issue description

  • issue description: error obtained when starting the second Swarm Network Node.
  • occurrence - consistent or rare:
  • error messages: SMLETHNode: Transferring 100 ETH to local account failed
  • commands used for starting containers: the ones provided in the MNIST example (https://github.com/HewlettPackard/swarm-learning/blob/master/examples/mnist/README.md)
  • docker logs [APLS, SPIRE, SN, SL, SWCI]:
    ######################################################################

HPE SWARM LEARNING SN NODE

######################################################################

© Copyright 2019-2022 Hewlett Packard Enterprise Development LP

######################################################################
2023-06-22 10:19:21,321 : swarm.blCnt : INFO : Setting up blockchain layer for the swarm node: START
2023-06-22 10:19:22,628 : swarm.blCnt : INFO : Creating Autopass License Provider
2023-06-22 10:19:23,407 : swarm.blCnt : INFO : Creating license server
2023-06-22 10:19:23,407 : swarm.blCnt : INFO : Setting license servers
2023-06-22 10:19:23,421 : swarm.blCnt : INFO : Acquiring floating license 1100000380:1
2023-06-22 10:19:24,047 : swarm.SN : INFO : Using URL : https://213.227.143.136:30304/is_up
2023-06-22 10:19:24,170 : swarm.SN : INFO : Sentinel Node is UP!
2023-06-22 10:19:43,727 : swarm.SN : INFO : SMLETHNode: Starting GETH ...
2023-06-22 10:22:16,547 : swarm.SN : ERROR : SMLETHNode: Transferring 100 ETH to local account failed
Traceback (most recent call last):
File "", line 1, in
File "start_swarm_sn.py", line 196, in start_swarm_sn.main
File "swarmfactory.py", line 615, in swarmfactory.createBCFullNodeForContainer
File "swarmbcnode.py", line 739, in swarmbcnode.smlethnode.initialize
File "swarmutils.py", line 678, in swarmutils.swarmlogger.emitError
RuntimeError: SMLETHNode: Transferring 100 ETH to local account failed
2023-06-22 10:22:16,556 : swarm.blCnt : WARNING : Releasing license

Swarm Learning Version:

  • Find the docker tag of the Swarm images ( $ docker images | grep hub.myenterpriselicense.hpe.com/hpe_eval/swarm-learning ): Version 2.0.0

OS and ML Platform

  • details of host OS: Ubuntu 20.04.6 LTS
  • details of ML platform used:
  • details of Swarm learning Cluster (Number of machines, SL nodes, SN nodes): 2 hosts, exactly the same as MNIST example

Quick Checklist: Respond [Yes/No]

  • APLS server web GUI shows available Licenses? Yes
  • If Multiple systems are used, can each system access every other system? Yes
  • Is Password-less SSH configuration setup for all the systems? Yes
  • If GPU or other protected resources are used, does the account have sufficient privileges to access and use them?
  • Is the user id a member of the docker group? Yes

Additional notes

  • Are you running documented example without any modification? Yes, just modifying the IPs of host 1 and host 2
  • Add any additional information about use case or any notes which supports for issue investigation: All the steps 1-9 and 11 from the README (https://github.com/HewlettPackard/swarm-learning/blob/master/examples/mnist/README.md) are followed correctly, but error in step 10 appears. I think the issue is related to "Ethereum" and the creation of the blockchain layer, but I do not have more information about the error.
@Deepthiappasani
Copy link

Can you check if the systems in which the nodes running are time synchronized?

@iArpanPatel
Copy link
Collaborator

@joaquingarciaatos please confirm, whether the issue is resolved post time synchronization using NTP?

@joaquingarciaatos
Copy link
Author

Can you check if the systems in which the nodes running are time synchronized?

Can you check if the systems in which the nodes running are time synchronized?

Hi! I tried to check the nodes are time synchronized, and they are. But it didn't solve anything about the issue... Do you have any idea about what can be the issue?

@Deepthiappasani
Copy link

The error might occur due to unsynchronized time between nodes, where even a slight time difference of few milli seconds can cause the issue. To resolve this, you can synchronize the nodes using NTP (Network Time Protocol). Afterward, restart the Docker service and try running the example again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants