Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blockchain Engineering - class of 2024 - Team Democracy-4: networking of crypto core #7914

Open
synctext opened this issue Feb 20, 2024 · 18 comments

Comments

@synctext
Copy link
Member

synctext commented Feb 20, 2024

Project assignment for Delft University of Technology master course called "Blockchain Engineering".
Custom assignment: combining Democracy-1 and Democracy-2 (overlap == networking part of crypto core)

Task You have completed this course successfully if you can scale the "networking part of crypto core" to 100 emulators/peers. Bonus goal is 500 peers! (ultimate goal is seriously let this scale to the 15 million users on WallStreetBets at Reddit).

  • Maximum connections in IPv8 is a hard-coded number, like 30.
  • Difficult to gather sufficient signatures
  • Deep dive into the code
  • Threshold voting (e.g. democratic process) is constrained by signature collection
  • with binary transfer you could collect and sign easily 10MByte of partial signature data
  • Question: 100 peers, threshold 60%, how big is the minimal signature data expressed in IPv8 UDP packets?
  • First solution: Random gossip any random received signature to 1 random connected peers. push blindly every 1 second: 1 outgoing IPv8 UDP packet to a single random peer. (toDo: DAO random music gossip example)
  • Final solution: push all collected signatures to 1 random peer every 1 second by opening a fresh binary transfer.

Blockchain networking part @mateichirita

  • Binary transfer of bulk votes using QUIC or uTP.
  • get this code running as an example, just PC.
  • get two processes in a PC to talk to eachoter
  • next sprint goal: get 2 Android phones/emulators to transfer using this library.
  • TODO??? Stress testing of networking (Ledbat, packet loss, initial handshake loss, etc)
  • you are allowed to look at the 2 other teams working, copy their code, ideas. HOWEVER: you need to document this. That's all. The goal is to make maximum progress. Like in industry.

Crypto Core Part

  • Focus: Multi-sig networking within crypto core @MichallHuTuDelft @Kheoss @Wouter2080
  • Objective: remove single coordinator from signature collection process
  • This coordinator creates the Bitcoin transaction and becomes bottleneck
  • Next sprint goal: have 3 peers exchange information and all racing to create a competing Bitcoin transaction

BONUS: Networking dashboard. Improve activity grid principle of status of each of the 25 connected IPv8 peers.
image

@synctext
Copy link
Member Author

synctext commented Feb 26, 2024

Minimal sprint goal: all your laptops inside 1 IPv8 community x 4 (start multiple IPv8 client). try to do also crypto stuff, not just sockets.

@Kheoss
Copy link

Kheoss commented Mar 5, 2024

Demo Week 4:
image

@InvictusRMC
Copy link
Member

InvictusRMC commented Mar 5, 2024

Pointers for next week:

  • Focus on pure JVM clients / ignore phones --> look at serviceId
  • Make code more flexible --> no code adjustments for changing clients
  • Make single scripts that launches x amount of clients and performs default flow:
    • All clients make btc wallet and request btc from central node
    • First client makes shared wallet
    • All clients join shared wallet --> voting is automatically performed
  • End goal: create graph that shows timings for the joining of all clients
    • Don't forget about connection limit parameter!

{added} Demo: Regtest block download operational. Get BTC inside superapp needs to work, reproduce stand-alone cmdline. Empty wallet means error creating DAO

@Kheoss
Copy link

Kheoss commented Mar 12, 2024

12-03-2024 update:

  • web-socket based server to control peers
  • adding proposals and voting to the JVM environment
  • make the JVM environment work for multiple peers
  • (ongoing) remove the blocking-while loop that is waiting for votes when joining a DAO

@MichallHuTuDelft
Copy link

Cleaning up the github repository

@mateichirita
Copy link

mateichirita commented Mar 12, 2024

12-03-2024 node.js server to trigger events on instances:
Screenshot from 2024-03-12 11-27-59

@mateichirita
Copy link

This sprint's update:

  • Created a visual interface for the coordinating server so that we can test functionalities
  • JoinDAO works
  • Voting is done automatically
  • We gathered some measurements of how fast all the instances join the same DAO. We used 5 instances in our tests. The main problem is that the time (in ms) is dependent on how fast the voting happens (how fast the instances that have to vote realize that they have to vote)
    [{"id":0,"time":0},{"id":1,"time":5526}, {"id":2,"time":5500},{"id":3,"time":5550}, {"id":4,"time":5493}].

image

@Kheoss
Copy link

Kheoss commented Mar 18, 2024

We are also working on the removal of the single coordinator from signature collection process, by removing the blocking while loop that waits for signatures on the client side with a new event based system:
image

Future plans

  • Finish the removal of the single coordinator.
  • Test against the current implementation.
  • Networking dashboard: activity principle grid of status of each of the 25 connected IPv8 peers.
  • Random gossip with all partial signatures in binary format.

@synctext
Copy link
Member Author

synctext commented Mar 18, 2024

  • push the JVM instances to 100 possibly
  • fully concurrent, but linear growth of the DAO. Join in order of sequence number. remove all magic (5 second) waits.
  • current code the joining client waits for the required votes and is the blocking element to conduct the join transaction. Also the blocking element for any proposed transaction is the original proposing node.
  • prior superapp code waits for threshold of DAO joining. Then it moves forward in protocol state.
  • usage of unsafe custom multi-sig for both threshold-votes and threshold-transaction
  • test with base line of everybody required to vote and another test with threshold.
  • Mid-term evaluation
    • "bit on the edge", orange alert.
    • no figures, no running baseline test
    • no changes yet to protocol or performance boosting

@MichallHuTuDelft MichallHuTuDelft removed their assignment Mar 18, 2024
@Kheoss
Copy link

Kheoss commented Mar 25, 2024

Progress check 25-03-2024

  • Removed single coordinator, the transaction is executed by the last peer who interacted with the proposal.
  • Refactored JVM Project to run on docker and (not tested) kubernets.
  • Basic netty-incubator-codec-http3 implementation + discussion about it.
  • Run baseline with 30 peers, too slow, inconclusive results

@InvictusRMC
Copy link
Member

InvictusRMC commented Mar 25, 2024

Solid progress last sprint! 🎉

  • Work on graph
  • Fix synchronization bug
  • PoC sharding concept
  • Do a deep dive into IPv8 stack --> you can change anything you want to improve performance.

@Kheoss
Copy link

Kheoss commented Apr 2, 2024

Progress check week 8:

Short description:
Focus on testing multiple techniques to achieve an increase in speed and/or number of connecting peers.
Detected possible improvements regarding the networking on trustchain ( preliminary results below )

  • Sharding approach ( Join vs Transfer shards, to be discussed )
  • Ran into hardware problems ( one laptop not enough even for 20 peers, possible solutions to expand the experimental setup under discussion) [ Docker Images vs Bash Script running x instances (running shadowJar to compile a jar and run it) ]
  • Problems with the Bitcoin server

Findings:

  • Bandwidth => the real bottleneck. Experimental setup: Peers broadcasting 1KB packages using the current broadcast algorithm (25-fanout broadcast) and using a "persistent gossip" ( TTL = untill every peer receives the message ). Run multiple trials with different number of peers and measure the number bandwidth usage of the docker network and the time for every peer to receive the messages.

Results (charts below: X-axis=number of peers, Y-axis=bandwidth usage in KB/ milliseconds):
As expected the bandwidth usage of the gossip mechanism is drastically lower than the current approach. The current algorithm for 15 peers peaked at around 1.2MB of band usage, which is a lot compared to only 228KB used by the gossip.

Unexpectedly, for low number of peers the gossip algorithm performed better speed-wise.
This result can be interpreted in two ways:

  • By the nature of the gossip mechanism, we can expect this advantage to dissipate with the increase of the number of peers.
  • The high bandwidth consumption of the 25-fanout broadcast would congest the network such that the peer messages would be delayed and the gossip would maintain its advantage with the increase of number of peers.

Chart_Bandwidth
Chart_Speed

Future improvements proposal:

  • The use of a DHT algorithm like Kadmelia for a more efficient topology.
  • Archival mechanism for trustchain example (old nodes become "axioms" and can be archived so the chain size remains manageable in time.

GIT: https://github.com/Kheoss/CSE4110_jre

@synctext
Copy link
Member Author

synctext commented Apr 5, 2024

  • Remember 99% of performance is lost if your switch from native Kotlin multithreads doing communication towards network-of-Dockers containers. Plus it won't scale to 10k instances.
  • please laser focus on having 1 bash script on "bare metal" running compiled Kotlin (native)
  • First get a example running 100x sockets
  • Then bash to create 100x full IPv8 processes (or even try 1000!) (test when laptop is out of memory)
  • No sharding, IPv8 should form a solid random network
  • Final sprint: DAO code

@Kheoss
Copy link

Kheoss commented Apr 12, 2024

Update 12/04/2024

  • managed to get to 100 peers
  • tried compiling with kotlin-compiler-1.9.23 ( no IPV8 linkage)
  • run existing experiments (on peers > 22)

Current setup:

  • Export JAR
  • Run multiple instances
  • STOP all java instances

Observation:
Slow discovery, takes a while for each peer to discover other peers.
Performance increase on windows OS (?)

@synctext
Copy link
Member Author

synctext commented Apr 12, 2024

  • running code is a requirement of this course
  • Extension of 1 week to wrap up this course
    • "laser focus" on Kotlin failed sadly
    • still JVM route, less useful for our lab, lacks scalability
  • Objective for running code
    • Infrastructure for DAO scalability research
    • 1 script to grow DAO until it breaks
    • Build upon completed task: concurrent voting on new member
      • that means faster join
    • DAO growing from 1 member, 2 members, 3 members... 100 members (but likely break somewhere, memory, UDP signature size)
    • 100 peer overlay, grow 1-by-one in Bash script
  • no superapp modification, laser focus on DAO scalability test infra
  • performance expectation: starting an IPv8 overlay or joining of single peer should only take 30 seconds.

Grading session --- 26/04

  • Running code required for passing grade

    • JAR file (running natively)
  • Finalize implementation
    • Create tests
  • Make discussed graphs
  • Write README-style report - check e.g. currencyii for inspiration.
  • Make pull request with code
  • Demo of your functionalities

@Kheoss
Copy link

Kheoss commented Apr 19, 2024

Progress check 19-04-2024

  • compiled the orchestrator into a CLI app.
  • improved UI
  • rollback to old voting
  • fix shell script for linux ( 50 peers on Linux )
  • add timers for "SYNC" ( when a peer joins a wallet all the other peers are "DESYNC" untill they receive the new data and become "SYNC")
    image

@InvictusRMC
Copy link
Member

InvictusRMC commented Apr 19, 2024

Final notes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants