New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reasonable to change hard-coded cluster size? #7129
Comments
@lonnietc Is this 64 the maximum number of shards for a table? Changing the limit would not affect any users other than those which choose to use that many shards. (The limit doesn't involve pre-allocated arrays or anything: there is just a runtime check that produces an error message.) I think there is some sharding logic, or automatic sharding logic, with runtime proportionate to the square of the number of shards, or worse. Or instead of n^2, more like n^2 log n, with some other parameters involved like number of replicas. It looks like 64 was picked as an "unrealistically" big number -- also, it was increased from 32, without explanation. For all I know, it's also possible there are pathological algorithms in replication and backfilling logic, which could suffer with too many shards. I think what we should do is the following:
|
For what it's worth, the server-side code for this limit is this line: So, you could set that to a larger value, and then recompile RethinkDB, update your cluster (or run one proxy node with the modified version, I think), and use that with e.g. Also, I was able to bypass the 64 count shard limit without recompiling by updating the table_config system table. I took a table with 3 shards and casually ran this query a few times to increase the number of shards to 96. This was all with a toy example and one server, I should mention.
|
@srh Thanks for taking the time to discuss this more with me. I am building a large P2P database project to support many thousands of nodes, as the idea, and have always liked ReThinkDB a lot but was not sure that it could scale to very large clusters and may join/leave the network at various times as can happen in a P2P topology. In general, I was thinking some thing like this.
Basically, buy sharding nd replicating then the data should be consistent should the node go offline for some reason and also data queries/update, etc. could come from any of the thousands of nodes in the mesh network. Of course, this is probably just a huge wish-list that can not happen as of yet, but maybe something to work towards in bringing back ReThinkDB to the mainstream in popularity since a lot of this seems to be what people are seeking, yet many database do not offer it, in my opinion. Just my humble thoughts :) |
I am so unfamiliar with what the database market in 2023 is like I don't know what to suggest. |
Yea, it is pretty crazy out there since NoSQL, some relational database things are happening but I also see a lot of interest in Graph database and a LOT of interest in working towards true P2P solutions that allow for easy horizontal scaling. That gave me the idea that maybe ReThinkDB could consider going down that path to get users excited about it again as a thought. |
Hello,
Hope that your day is well.
Would it be possible and reasonable to change the RethinkDB hard-coded cluster size from 64 to 65,535.
My guess is that this would drastically effect the database performance, but wanted to ask.
Thanks in advance
The text was updated successfully, but these errors were encountered: