Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add distance penalty to chunk server configuration #525

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

jdieter
Copy link
Contributor

@jdieter jdieter commented Feb 27, 2017

Currently when reading chunks from chunkservers, there is no way to
prioritize fast chunkservers running on SSDs over slow chunkservers
running on spinning disk.

This patch-set adds a new configuration option (DISTANCE_PENALTY)
that allows the admin to set a distance penalty for each chunkserver.
This penalty will be added to any distance calculations made for the
chunkserver, which (depending on its value and whether my previous
distance calculation patch, #523, which is included as the first patch in this
patch-set, has been applied) will make the chunkserver a less likely
candidate for reads.

This patch-set is now in production at our school, where I've configured
all our spinning disks to have a penalty of 100, while all our SSDs have
a penalty of 0 (the default). Our VMs (which have some chunks on spinning
disk due to changed goals of snapshots) are now much faster, because
they only read from the spinning disks if the SSDs are unavailable.

This patch-set does include a network protocol change that makes it
incompatible with older LizardFS installs. If there's any way for the
chunkserver to tell older masters that they can ignore the
LIZ_CSTOMA_REGISTER_PENALTY packet, or for the chunkserver to detect that
the master doesn't recognize that packet, I would happily implement it.

@MPaszkiewicz
Copy link
Contributor

Hi,
Review link:
http://cr.skytechnology.pl:8081/#/c/2865/

@Blackpaw
Copy link

Usefull but with some limitations?

  • Writes will still be at the rate of the slowest chunkserver though won't they
  • It requires chunkservers to be all ssd

@Zorlin
Copy link

Zorlin commented Feb 28, 2017

This is a very interesting patch, thank you. I hope something like it can be officially implemented.

@jdieter
Copy link
Contributor Author

jdieter commented Feb 28, 2017

@Blackpaw, I haven't really looked at writes yet. If they are sent in parallel to all chunkservers, then yes, it will be at the speed of the slowest chunkserver.

This patch-set does not require all chunkservers to be SSD, though. In our use case, we have three SSD chunkservers and three spinning disk chunkservers. Our VMs are set up to use the SSD chunkservers with three replicas, but every night I make a snapshot and set the goal of the snapshot to be spinning disk. LizardFS will then make six replicas (three SSD, three spinning disk), and then free the SSD replicas as those chunks are no longer used by the VMs.

My problem was the the virtual hosts kept reading from the spinning disk, even though all new writes were going to the SSDs. This patch-set fixed that problem.

I would like to look at the writes, but this is the first step.

@jdieter
Copy link
Contributor Author

jdieter commented Mar 1, 2017

@MPaszkiewicz, I think I've addressed the reasons that this patch failed the sanity check in my last commit. Is there anything I need to do to have it resubmitted?

@MPaszkiewicz
Copy link
Contributor

@jdieter, your first patch introducing distance calculation between racks is quite likely to be merged soon.
Other set of patches can not be merged at the moment. You can take a look at the comment we made here: http://cr.skytechnology.pl:8081/#/c/2863
Our current plan for handling this kind of situations is to implement more complex topology mechanism based on graphs. In particular, we would like this change not to affect chunkserver logic.

@jdieter
Copy link
Contributor Author

jdieter commented Mar 1, 2017

Thanks so much for the comment. I'll look at adding in a legacy data structure. As for the more complex topology mechanism, would there be a way to indicate that two chunkservers at the same IP would have different distances? This was the main thing that convinced me to do a separate chunkserver configuration option, because our SSD chunkservers are running on the same server as our HDD chunkservers.

@MPaszkiewicz
Copy link
Contributor

Yes, in order differentiate two chunkservers sharing the same IP, we can take ports into account.

@jdieter
Copy link
Contributor Author

jdieter commented Mar 27, 2017

Ok, just to make sure this is clear. Is there any point in my addressing the issues in http://cr.skytechnology.pl:8081/#/c/2863, or do you feel strongly enough that you don't want to change the chunkserver logic that I just shouldn't bother?

@psarna
Copy link
Member

psarna commented Mar 27, 2017

http://cr.skytechnology.pl:8081/#/c/2861 will most likely be merged after we apply the amended version,
but as for http://cr.skytechnology.pl:8081/#/c/2863, right now it is too complicated to be considered for a merge, even after addressing existing issues.

I think the best way is to leave this issue open for future reference (when we decide to rewrite topology in a more sane way). Then, patches 2862-2865 could be reconsidered as merge candidates.

@jdieter
Copy link
Contributor Author

jdieter commented Mar 27, 2017

Ok, thanks so much. I'll hold off polishing this patch until you're ready for it or we have something else that does essentially the same thing.

DarkHaze and others added 9 commits May 11, 2017 14:56
This commit fixes high cpu usage in fs_periodic_file_test.

Fixes lizardfs#547

Change-Id: Ia93173dd0f358f3ff606c7ebb5848f2b786b2158
This commit adds missing initialization of load_factor member
to avoid valgrind warnings.

Change-Id: Ifca5ad0afd781c6fc23090206750a6fe66573f10
Currently distance is calculated as 0 (same machine), 1 (same rack) and 2
(different racks).  This doesn't allow any way of saying that rack 1 is
closer to rack 2 than it is to rack 42.

This patch changes the topology distance calculation by having different
racks represented by the difference between the racks + 1 (since a
distance of 1 means they're on the same rack).

Rack 2 now has a distance of 2 from rack 1 (abs(2-1)+1=2) and a distance
of 41 from rack 42 (abs(2-42)+1=41), which means clients on rack 2 will
far prefer chunk servers on rack 1.

Signed-off-by: Jonathan Dieter <jdieter@lesbg.com>
We use 64-bit version of std::min to make sure we don't overflow, but make
sure it's <= max for uint32_t, then cast to uint32_t.

Signed-off-by: Jonathan Dieter <jdieter@lesbg.com>
Currently when reading chunks from chunkservers, there is no way to
prioritize fast chunkservers running on SSDs over slow chunkservers
running on spinning disk.

This patch-set adds a new configuration option that allows the admin
to set a distance penalty for each chunkserver.  This penalty will be
added to *any* distance calculations made for the chunkserver, which
(depending on its value and whether my previous distance calculation
patch has been applied) will make the chunkserver a less likely candidate
for reads.

This patch-set is now in production at our school, where I've configured
all our spinning disks to have a penalty of 100, while all our SSDs have
a penalty of 0 (the default).  Our VMs (which have some chunks on spinning
disk due to changed goals of snapshots) are now *much* faster, because
they only read from the spinning disks if the SSDs are unavailable.

This patch-set does include a network protocol change that makes it
incompatible with older LizardFS installs.  If there's any way for the
chunkserver to tell older masters that they can ignore the
LIZ_CSTOMA_REGISTER_PENALTY packet, or for the chunkserver to detect that
the master doesn't recognize that packet, I would happily implement it.

This first patch contains the chunkserver code to read the distance
penalty and send it to the master.

Signed-off-by: Jonathan Dieter <jdieter@lesbg.com>
This patch-set adds a new configuration option that allows the admin
to set a distance penalty for each chunkserver.  This penalty will be
added to *any* distance calculations made for the chunkserver, which
(depending on its value and whether my previous distance calculation
patch has been applied) will make the chunkserver a less likely candidate
for reads.

With this patch, the master can read and apply the distance penalties sent
by the chunkservers.

Signed-off-by: Jonathan Dieter <jdieter@lesbg.com>
This patch-set adds a new configuration option that allows the admin
to set a distance penalty for each chunkserver.  This penalty will be
added to *any* distance calculations made for the chunkserver, which
(depending on its value and whether my previous distance calculation
patch has been applied) will make the chunkserver a less likely candidate
for reads.

With this patch, the web interface now shows the distance penalty in the
servers tab.

Signed-off-by: Jonathan Dieter <jdieter@lesbg.com>
This patch-set adds a new configuration option that allows the admin
to set a distance penalty for each chunkserver.  This penalty will be
added to *any* distance calculations made for the chunkserver, which
(depending on its value and whether my previous distance calculation
patch has been applied) will make the chunkserver a less likely candidate
for reads.

This patch updates the documentation to show the DISTANCE_PENALTY
configuration option.

Signed-off-by: Jonathan Dieter <jdieter@lesbg.com>
Signed-off-by: Jonathan Dieter <jdieter@lesbg.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants