Replies: 2 comments 1 reply
-
For data integrity, data repair speed is essential. I believe the following
would be true, but I don't have metrics to prove it.
a) replication with 3 copies
- this would be the fastest data repair. chunks from one failed drive exist
on many other drives in the cluster. Thus many drives are *read from* in
order to replace the missing chunks, and many drives are *written to* while
re-writing missing chunks.
b) EC 4+2
- this would be faster than 8+2 since only any 4 shards need to be read in
order to rewrite missing shards. Many disks in the whole pool will be used
to read/write shards, thus faster than standard RAID.
- caveat: network speed and chunk server CPU speed effect shard rebuild
time.
c) EC 8+2
- this would be faster than RAID 6+2, since many disks from the whole
cluster will be used to read/write (similar to EC 4+2)
- caveat: network speed and chunk server CPU speed effect shard rebuild
time.
d) RAID 6+2
- standard raid will have the slowest repair time since only disks that are
part of the pool will limit the I/O for rebuild time.
- assumes you have a hotspare. If you don't have a hot spare, then you need
to add time "human time" to the repair speed since a tech will need to do
the drive swap before repair begins.
- cluster load/RAID pool I/O load will determine rebuild times, along with
disk speeds.
e) Another option: ZFS DRAID
- if you have a large pool of disks on one chunk server, rebuild times can
be very fast (
https://openzfs.github.io/openzfs-docs/Basic%20Concepts/dRAID%20Howto.html#rebuilding-to-a-distributed-spare
).
caveat to all RAID: if you are layering goal1 on top of chunk servers using
RAID, if a chunk server goes down, all the chunks for each file on that
server will be unavailable until the server is operational again.
… Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Thanks for the useful information but how do enable EC with the 3.x community edition? I have tried several options beetleswith both master and chunk configs, but nothing seems to work. Enable erasure codinguse_ec = on Define erasure coding parametersec_k = 4 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi Guys
It is a theoretical question that started to bother me, but I cannot find any info online.
imagine different setups (random choice)
a) replication with 3 copies (1+2)
b) EC 4+2
c) EC 8+2
d) RAID 6+2
In each case, I will lose my data with a third failure.
But which one should I use if my priority is data safety?
Beta Was this translation helpful? Give feedback.
All reactions