Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"ideal" server spec per node. #289

Open
hiqsociety opened this issue Mar 3, 2023 · 4 comments
Open

"ideal" server spec per node. #289

hiqsociety opened this issue Mar 3, 2023 · 4 comments

Comments

@hiqsociety
Copy link

i was thinking about this raft implementation a lot and wondering what's the ratio of cpu cores vs disk space (and ram amount) that's "ideal" optimum for ingestion and throughput etc.

we all know golang's multi threading is not that great compared with c++ (seastar) or rust (monoio/glommio).
based on your experience, what do u think is best to achieve the optimum core / diskspace (and maybe ram)

your benchmark is measured using this spec:
https://github.com/lni/dragonboat/blob/master/docs/test.md

but real world situations doesnt do so many raft groups per server.
it's definitely only 1 node per server. so my question is based upon what's the ideal server spec per node.

asking so because i need to write my underlying application logic to allow for cpu processing and memory allocation and the choice of what database to use etc.

p.s. : i'm using this raft mostly for resilient storage space.

for storage:
my experience,

  1. golang mostly have performance degradation after 4-8 cores. it's not that ideal for multi threaded applications. (in most practical scenario even after good to advance code optimization)
  2. i'm going to use terarkdb for the backend for data store (log will use Tan) because it is more optimized than pebble or rocksdb, faster and take up lesser storage. 4-8 cores is ideal here, (since golang cant take full advantage of processing beyond that), i guesstimate usability at 60TB per 1 core (maximum stretch for commodity server) so it's around 480TB per server with 8 cores.
  3. 40gbps * 4 nic cards will be able to serve this nicely. 160gbps bandwidth.

for simple raft coordination
i wonder if it's tested in raspberry pi 4 or something smaller.

conclusion, what's my problem / question?

  1. what's the "ideal" server spec per node (real world application)
  2. what's the max limit of raft and caveats we shld look for if it's huge deployments (with 60k servers cluster?) maybe can talk about gossip per transmission in bytes (how many bytes are sent here overall total?)
  3. what is generally advisable deployment scenario and usage? (very vague since the we are mostly to build our own application on top) maybe can have use case mentioned here and how it's being used at what scale etc. missing case studies.

that's all i'm asking. hope to have clarifications. thx.

@kevburnsjr
Copy link
Contributor

kevburnsjr commented Mar 4, 2023

Your estimate of 60TB of storage per core is several orders of magnitude too much unless you're running a cold storage backup service. I don't know what your use case is but given that you've referenced ScyllaDB several times, I recommend you take a moment to look at their recommendations.
image

So if your cpu:ram ratio is 1:8 (which is on the high end), then — at a ratio of 100:1 — the max disk size per core supported by ScyllaDB would be 800GB. However, they recommend a 30:1 ratio which would be 240GB per core which sounds to me like a reasonable starting point.

These are just rough numbers. You will need to test your specific workload to determine what is the right amount based on the data volume, indexing requirements and access patterns that you expect.

@lni
Copy link
Owner

lni commented Mar 5, 2023

i was thinking about this raft implementation a lot and wondering what's the ratio of cpu cores vs disk space (and ram amount) that's "ideal" optimum for ingestion and throughput etc.

this heavily relies on your application's workload, e.g. how many proposals you need to perform every second, how heavy will it be to execute your proposal to update your state machine, do you really need the proposal execution result or you just need to make sure that it is correctly ordered & stored.

but real world situations doesnt do so many raft groups per server.
it's definitely only 1 node per server. so my question is based upon what's the ideal server spec per node.

Using only one node (replica) per server is going to cause you trouble later down the road.

The main purposes of having more nodes (replicas) per server is to allow certain portion of them to be migrate to other servers when the load is too high, it also allow high parallelism, you also get the benefits of only requiring to snapshot those busy replicas more often.

Spanner, TiDB and CockroachDB all follow this approach for good reasons.

@lni
Copy link
Owner

lni commented Mar 5, 2023

also note that dragonboat uses very limited amount of CPUs. to get the best performance, make sure you use NVME SSDs that have fast fsync performance.

if you really decide to use one replica per server, please understand that dragonboat is not optimized for such use case, it is targeting lots of replicas per server with large number of concurrent requests spread across those replicas. I mean I haven't worked on any project that uses only one replica per server.

@hiqsociety
Copy link
Author

@lni i meant i raft group per server and 3 replicas.
i actually thought 1 node = 1 raft group from most use case perspective in db terms.

benchmark "looks good".

at the end of the day, i guess i'm complaining coz there's not actual production use case studies for reference.

@kevburnsjr yes i know about scylladb's limitations. but i'll be using terarkdb so i guess can have more. 60TB / cpu core is a bit more towards "warm" storage. which is the way to use as data store with cdn in front.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants