-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add distance penalty to chunk server configuration #525
base: master
Are you sure you want to change the base?
Conversation
Hi, |
Usefull but with some limitations?
|
This is a very interesting patch, thank you. I hope something like it can be officially implemented. |
@Blackpaw, I haven't really looked at writes yet. If they are sent in parallel to all chunkservers, then yes, it will be at the speed of the slowest chunkserver. This patch-set does not require all chunkservers to be SSD, though. In our use case, we have three SSD chunkservers and three spinning disk chunkservers. Our VMs are set up to use the SSD chunkservers with three replicas, but every night I make a snapshot and set the goal of the snapshot to be spinning disk. LizardFS will then make six replicas (three SSD, three spinning disk), and then free the SSD replicas as those chunks are no longer used by the VMs. My problem was the the virtual hosts kept reading from the spinning disk, even though all new writes were going to the SSDs. This patch-set fixed that problem. I would like to look at the writes, but this is the first step. |
@MPaszkiewicz, I think I've addressed the reasons that this patch failed the sanity check in my last commit. Is there anything I need to do to have it resubmitted? |
@jdieter, your first patch introducing distance calculation between racks is quite likely to be merged soon. |
Thanks so much for the comment. I'll look at adding in a legacy data structure. As for the more complex topology mechanism, would there be a way to indicate that two chunkservers at the same IP would have different distances? This was the main thing that convinced me to do a separate chunkserver configuration option, because our SSD chunkservers are running on the same server as our HDD chunkservers. |
Yes, in order differentiate two chunkservers sharing the same IP, we can take ports into account. |
Ok, just to make sure this is clear. Is there any point in my addressing the issues in http://cr.skytechnology.pl:8081/#/c/2863, or do you feel strongly enough that you don't want to change the chunkserver logic that I just shouldn't bother? |
http://cr.skytechnology.pl:8081/#/c/2861 will most likely be merged after we apply the amended version, I think the best way is to leave this issue open for future reference (when we decide to rewrite topology in a more sane way). Then, patches 2862-2865 could be reconsidered as merge candidates. |
Ok, thanks so much. I'll hold off polishing this patch until you're ready for it or we have something else that does essentially the same thing. |
This commit fixes high cpu usage in fs_periodic_file_test. Fixes lizardfs#547 Change-Id: Ia93173dd0f358f3ff606c7ebb5848f2b786b2158
This commit adds missing initialization of load_factor member to avoid valgrind warnings. Change-Id: Ifca5ad0afd781c6fc23090206750a6fe66573f10
Currently distance is calculated as 0 (same machine), 1 (same rack) and 2 (different racks). This doesn't allow any way of saying that rack 1 is closer to rack 2 than it is to rack 42. This patch changes the topology distance calculation by having different racks represented by the difference between the racks + 1 (since a distance of 1 means they're on the same rack). Rack 2 now has a distance of 2 from rack 1 (abs(2-1)+1=2) and a distance of 41 from rack 42 (abs(2-42)+1=41), which means clients on rack 2 will far prefer chunk servers on rack 1. Signed-off-by: Jonathan Dieter <jdieter@lesbg.com>
We use 64-bit version of std::min to make sure we don't overflow, but make sure it's <= max for uint32_t, then cast to uint32_t. Signed-off-by: Jonathan Dieter <jdieter@lesbg.com>
Currently when reading chunks from chunkservers, there is no way to prioritize fast chunkservers running on SSDs over slow chunkservers running on spinning disk. This patch-set adds a new configuration option that allows the admin to set a distance penalty for each chunkserver. This penalty will be added to *any* distance calculations made for the chunkserver, which (depending on its value and whether my previous distance calculation patch has been applied) will make the chunkserver a less likely candidate for reads. This patch-set is now in production at our school, where I've configured all our spinning disks to have a penalty of 100, while all our SSDs have a penalty of 0 (the default). Our VMs (which have some chunks on spinning disk due to changed goals of snapshots) are now *much* faster, because they only read from the spinning disks if the SSDs are unavailable. This patch-set does include a network protocol change that makes it incompatible with older LizardFS installs. If there's any way for the chunkserver to tell older masters that they can ignore the LIZ_CSTOMA_REGISTER_PENALTY packet, or for the chunkserver to detect that the master doesn't recognize that packet, I would happily implement it. This first patch contains the chunkserver code to read the distance penalty and send it to the master. Signed-off-by: Jonathan Dieter <jdieter@lesbg.com>
This patch-set adds a new configuration option that allows the admin to set a distance penalty for each chunkserver. This penalty will be added to *any* distance calculations made for the chunkserver, which (depending on its value and whether my previous distance calculation patch has been applied) will make the chunkserver a less likely candidate for reads. With this patch, the master can read and apply the distance penalties sent by the chunkservers. Signed-off-by: Jonathan Dieter <jdieter@lesbg.com>
This patch-set adds a new configuration option that allows the admin to set a distance penalty for each chunkserver. This penalty will be added to *any* distance calculations made for the chunkserver, which (depending on its value and whether my previous distance calculation patch has been applied) will make the chunkserver a less likely candidate for reads. With this patch, the web interface now shows the distance penalty in the servers tab. Signed-off-by: Jonathan Dieter <jdieter@lesbg.com>
This patch-set adds a new configuration option that allows the admin to set a distance penalty for each chunkserver. This penalty will be added to *any* distance calculations made for the chunkserver, which (depending on its value and whether my previous distance calculation patch has been applied) will make the chunkserver a less likely candidate for reads. This patch updates the documentation to show the DISTANCE_PENALTY configuration option. Signed-off-by: Jonathan Dieter <jdieter@lesbg.com>
Signed-off-by: Jonathan Dieter <jdieter@lesbg.com>
Currently when reading chunks from chunkservers, there is no way to
prioritize fast chunkservers running on SSDs over slow chunkservers
running on spinning disk.
This patch-set adds a new configuration option (DISTANCE_PENALTY)
that allows the admin to set a distance penalty for each chunkserver.
This penalty will be added to any distance calculations made for the
chunkserver, which (depending on its value and whether my previous
distance calculation patch, #523, which is included as the first patch in this
patch-set, has been applied) will make the chunkserver a less likely
candidate for reads.
This patch-set is now in production at our school, where I've configured
all our spinning disks to have a penalty of 100, while all our SSDs have
a penalty of 0 (the default). Our VMs (which have some chunks on spinning
disk due to changed goals of snapshots) are now much faster, because
they only read from the spinning disks if the SSDs are unavailable.
This patch-set does include a network protocol change that makes it
incompatible with older LizardFS installs. If there's any way for the
chunkserver to tell older masters that they can ignore the
LIZ_CSTOMA_REGISTER_PENALTY packet, or for the chunkserver to detect that
the master doesn't recognize that packet, I would happily implement it.