Add distance penalty to chunk server configuration #525

jdieter · 2017-02-27T11:26:05Z

Currently when reading chunks from chunkservers, there is no way to
prioritize fast chunkservers running on SSDs over slow chunkservers
running on spinning disk.

This patch-set adds a new configuration option (DISTANCE_PENALTY)
that allows the admin to set a distance penalty for each chunkserver.
This penalty will be added to any distance calculations made for the
chunkserver, which (depending on its value and whether my previous
distance calculation patch, #523, which is included as the first patch in this
patch-set, has been applied) will make the chunkserver a less likely
candidate for reads.

This patch-set is now in production at our school, where I've configured
all our spinning disks to have a penalty of 100, while all our SSDs have
a penalty of 0 (the default). Our VMs (which have some chunks on spinning
disk due to changed goals of snapshots) are now much faster, because
they only read from the spinning disks if the SSDs are unavailable.

This patch-set does include a network protocol change that makes it
incompatible with older LizardFS installs. If there's any way for the
chunkserver to tell older masters that they can ignore the
LIZ_CSTOMA_REGISTER_PENALTY packet, or for the chunkserver to detect that
the master doesn't recognize that packet, I would happily implement it.

MPaszkiewicz · 2017-02-27T14:48:44Z

Hi,
Review link:
http://cr.skytechnology.pl:8081/#/c/2865/

Blackpaw · 2017-02-27T21:23:28Z

Usefull but with some limitations?

Writes will still be at the rate of the slowest chunkserver though won't they
It requires chunkservers to be all ssd

Zorlin · 2017-02-28T01:06:27Z

This is a very interesting patch, thank you. I hope something like it can be officially implemented.

jdieter · 2017-02-28T05:07:38Z

@Blackpaw, I haven't really looked at writes yet. If they are sent in parallel to all chunkservers, then yes, it will be at the speed of the slowest chunkserver.

This patch-set does not require all chunkservers to be SSD, though. In our use case, we have three SSD chunkservers and three spinning disk chunkservers. Our VMs are set up to use the SSD chunkservers with three replicas, but every night I make a snapshot and set the goal of the snapshot to be spinning disk. LizardFS will then make six replicas (three SSD, three spinning disk), and then free the SSD replicas as those chunks are no longer used by the VMs.

My problem was the the virtual hosts kept reading from the spinning disk, even though all new writes were going to the SSDs. This patch-set fixed that problem.

I would like to look at the writes, but this is the first step.

jdieter · 2017-03-01T05:50:56Z

@MPaszkiewicz, I think I've addressed the reasons that this patch failed the sanity check in my last commit. Is there anything I need to do to have it resubmitted?

MPaszkiewicz · 2017-03-01T09:16:45Z

@jdieter, your first patch introducing distance calculation between racks is quite likely to be merged soon.
Other set of patches can not be merged at the moment. You can take a look at the comment we made here: http://cr.skytechnology.pl:8081/#/c/2863
Our current plan for handling this kind of situations is to implement more complex topology mechanism based on graphs. In particular, we would like this change not to affect chunkserver logic.

jdieter · 2017-03-01T09:51:06Z

Thanks so much for the comment. I'll look at adding in a legacy data structure. As for the more complex topology mechanism, would there be a way to indicate that two chunkservers at the same IP would have different distances? This was the main thing that convinced me to do a separate chunkserver configuration option, because our SSD chunkservers are running on the same server as our HDD chunkservers.

MPaszkiewicz · 2017-03-02T11:57:48Z

Yes, in order differentiate two chunkservers sharing the same IP, we can take ports into account.

jdieter · 2017-03-27T07:32:05Z

Ok, just to make sure this is clear. Is there any point in my addressing the issues in http://cr.skytechnology.pl:8081/#/c/2863, or do you feel strongly enough that you don't want to change the chunkserver logic that I just shouldn't bother?

psarna · 2017-03-27T08:12:27Z

http://cr.skytechnology.pl:8081/#/c/2861 will most likely be merged after we apply the amended version,
but as for http://cr.skytechnology.pl:8081/#/c/2863, right now it is too complicated to be considered for a merge, even after addressing existing issues.

I think the best way is to leave this issue open for future reference (when we decide to rewrite topology in a more sane way). Then, patches 2862-2865 could be reconsidered as merge candidates.

jdieter · 2017-03-27T08:15:58Z

Ok, thanks so much. I'll hold off polishing this patch until you're ready for it or we have something else that does essentially the same thing.

This commit fixes high cpu usage in fs_periodic_file_test. Fixes lizardfs#547 Change-Id: Ia93173dd0f358f3ff606c7ebb5848f2b786b2158

This commit adds missing initialization of load_factor member to avoid valgrind warnings. Change-Id: Ifca5ad0afd781c6fc23090206750a6fe66573f10

Currently distance is calculated as 0 (same machine), 1 (same rack) and 2 (different racks). This doesn't allow any way of saying that rack 1 is closer to rack 2 than it is to rack 42. This patch changes the topology distance calculation by having different racks represented by the difference between the racks + 1 (since a distance of 1 means they're on the same rack). Rack 2 now has a distance of 2 from rack 1 (abs(2-1)+1=2) and a distance of 41 from rack 42 (abs(2-42)+1=41), which means clients on rack 2 will far prefer chunk servers on rack 1. Signed-off-by: Jonathan Dieter <jdieter@lesbg.com>

We use 64-bit version of std::min to make sure we don't overflow, but make sure it's <= max for uint32_t, then cast to uint32_t. Signed-off-by: Jonathan Dieter <jdieter@lesbg.com>

Currently when reading chunks from chunkservers, there is no way to prioritize fast chunkservers running on SSDs over slow chunkservers running on spinning disk. This patch-set adds a new configuration option that allows the admin to set a distance penalty for each chunkserver. This penalty will be added to *any* distance calculations made for the chunkserver, which (depending on its value and whether my previous distance calculation patch has been applied) will make the chunkserver a less likely candidate for reads. This patch-set is now in production at our school, where I've configured all our spinning disks to have a penalty of 100, while all our SSDs have a penalty of 0 (the default). Our VMs (which have some chunks on spinning disk due to changed goals of snapshots) are now *much* faster, because they only read from the spinning disks if the SSDs are unavailable. This patch-set does include a network protocol change that makes it incompatible with older LizardFS installs. If there's any way for the chunkserver to tell older masters that they can ignore the LIZ_CSTOMA_REGISTER_PENALTY packet, or for the chunkserver to detect that the master doesn't recognize that packet, I would happily implement it. This first patch contains the chunkserver code to read the distance penalty and send it to the master. Signed-off-by: Jonathan Dieter <jdieter@lesbg.com>

This patch-set adds a new configuration option that allows the admin to set a distance penalty for each chunkserver. This penalty will be added to *any* distance calculations made for the chunkserver, which (depending on its value and whether my previous distance calculation patch has been applied) will make the chunkserver a less likely candidate for reads. With this patch, the master can read and apply the distance penalties sent by the chunkservers. Signed-off-by: Jonathan Dieter <jdieter@lesbg.com>

This patch-set adds a new configuration option that allows the admin to set a distance penalty for each chunkserver. This penalty will be added to *any* distance calculations made for the chunkserver, which (depending on its value and whether my previous distance calculation patch has been applied) will make the chunkserver a less likely candidate for reads. With this patch, the web interface now shows the distance penalty in the servers tab. Signed-off-by: Jonathan Dieter <jdieter@lesbg.com>

This patch-set adds a new configuration option that allows the admin to set a distance penalty for each chunkserver. This penalty will be added to *any* distance calculations made for the chunkserver, which (depending on its value and whether my previous distance calculation patch has been applied) will make the chunkserver a less likely candidate for reads. This patch updates the documentation to show the DISTANCE_PENALTY configuration option. Signed-off-by: Jonathan Dieter <jdieter@lesbg.com>

Signed-off-by: Jonathan Dieter <jdieter@lesbg.com>

jdieter force-pushed the penalty branch from 63d47fc to 30eaff6 Compare March 1, 2017 09:42

DarkHaze and others added 9 commits May 11, 2017 14:56

master: Fix high cpu usage in fs_periodic_file_test

f75fdce

This commit fixes high cpu usage in fs_periodic_file_test. Fixes lizardfs#547 Change-Id: Ia93173dd0f358f3ff606c7ebb5848f2b786b2158

master: Add missing initializer to load_factor

d51c475

This commit adds missing initialization of load_factor member to avoid valgrind warnings. Change-Id: Ifca5ad0afd781c6fc23090206750a6fe66573f10

Change size to uint32_t and check for overflow.

ac81112

We use 64-bit version of std::min to make sure we don't overflow, but make sure it's <= max for uint32_t, then cast to uint32_t. Signed-off-by: Jonathan Dieter <jdieter@lesbg.com>

Convert spaces to tabs (I hope I got them all)

b2b73d2

Signed-off-by: Jonathan Dieter <jdieter@lesbg.com>

jdieter force-pushed the penalty branch from 30eaff6 to b2b73d2 Compare May 20, 2017 10:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add distance penalty to chunk server configuration #525

Add distance penalty to chunk server configuration #525

jdieter commented Feb 27, 2017

MPaszkiewicz commented Feb 27, 2017

Blackpaw commented Feb 27, 2017

Zorlin commented Feb 28, 2017

jdieter commented Feb 28, 2017

jdieter commented Mar 1, 2017

MPaszkiewicz commented Mar 1, 2017

jdieter commented Mar 1, 2017

MPaszkiewicz commented Mar 2, 2017

jdieter commented Mar 27, 2017

psarna commented Mar 27, 2017

jdieter commented Mar 27, 2017

Add distance penalty to chunk server configuration #525

Are you sure you want to change the base?

Add distance penalty to chunk server configuration #525

Conversation

jdieter commented Feb 27, 2017

MPaszkiewicz commented Feb 27, 2017

Blackpaw commented Feb 27, 2017

Zorlin commented Feb 28, 2017

jdieter commented Feb 28, 2017

jdieter commented Mar 1, 2017

MPaszkiewicz commented Mar 1, 2017

jdieter commented Mar 1, 2017

MPaszkiewicz commented Mar 2, 2017

jdieter commented Mar 27, 2017

psarna commented Mar 27, 2017

jdieter commented Mar 27, 2017