Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] add request monitor for client #1863

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

M1eyu2018
Copy link
Collaborator

@M1eyu2018 M1eyu2018 commented Mar 27, 2023

Introduction:
In order to report abnormal request(e.g timeout, fail request) in fuse client to user, add request monitor for client in this pr.
This monitor works for all requests by one routine. When detecting a timeout request, it can report the process id and the operation type correctly.

Advantage:
1、less resource consumption because of only one routine
2、locate problem quickly by its reported process id and operation type

Monitor algorithm:
monitor_op_timeout
1、Time is split to three time window and the time window size is equal to timeout which is set in client config.
2、If timeout is one second, and If a request starts at time window called time window0 whose remainder is zero divided by three in second, the counter called counter0 will add one. If the request ends and doesn't time out, counter0 will subtract one.
3、For counter0, when this time window ends(this second passes in the case), the monitor waits one more time window size and then checks the counter if it's greater than zero. If yes, it means that there are timeout request which starts at time window0. So the monitor report them with their process id and operation type.
4、Similarly, counter1 works at time window1 whose remainder is one divided by three in second, and counter2 works at time window1 whose remainder is two divided by three in second. Thus, the three counter takes turns working and covers all time.

@netlify
Copy link

netlify bot commented Mar 27, 2023

Deploy Preview for cubefs-check ready!

Name Link
🔨 Latest commit 0e87281
🔍 Latest deploy log https://app.netlify.com/sites/cubefs-check/deploys/6459ca74c886900008eeb605
😎 Deploy Preview https://deploy-preview-1863--cubefs-check.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@M1eyu2018 M1eyu2018 force-pushed the add-request-monitor-for-client branch from f1b6521 to 6817ad7 Compare March 28, 2023 02:54
@leonrayang leonrayang force-pushed the add-request-monitor-for-client branch from 6817ad7 to 21083e3 Compare March 29, 2023 05:02
@M1eyu2018 M1eyu2018 force-pushed the add-request-monitor-for-client branch from 21083e3 to 78a804d Compare March 30, 2023 06:57
@Victor1319
Copy link
Member

I konw what you want to do, but it's hard to understand your running_mountor logic, is there any way to simplify your code?

@Victor1319 Victor1319 added the discuss Need to discuss label Apr 14, 2023
@@ -158,10 +158,12 @@ func (d *Dir) Create(ctx context.Context, req *fuse.CreateRequest, resp *fuse.Cr
var err error
var newInode uint64
metric := exporter.NewTPCnt("filecreate")
runningStat := d.super.runningMonitor.AddClientOp("create", req.Hdr().Pid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stat.BeginStat and stat.EndStat already make the time cost statitcs, which duplicate with process in AddClientOp

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function duplicated with stat.BeginStat and stat.EndStat

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AddClientOp will record not only the time cost, but also the pid, operator type and make related counter add one.
Similarly, SubClientOp will handle no only error, but also check and make related counter subtract one if time cost is greater than the timeout set in config.

@M1eyu2018
Copy link
Collaborator Author

I konw what you want to do, but it's hard to understand your running_mountor logic, is there any way to simplify your code?

@Victor1319
The advantage in this algorithm is less space and less resource consumption because only one routine can monitor all client op. If one routine monitors one client op, it's impossible because of large resource consumption.

In general,The algorithm is that put the op into the related counter by hash, and then check the op timeout when the timeout reach. if timeout, report it.

@M1eyu2018 M1eyu2018 force-pushed the add-request-monitor-for-client branch from 78a804d to a6a2213 Compare April 20, 2023 06:06
@codecov-commenter
Copy link

codecov-commenter commented Apr 20, 2023

Codecov Report

Merging #1863 (0e87281) into master (4c17460) will decrease coverage by 26.20%.
The diff coverage is 76.05%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff             @@
##           master    #1863       +/-   ##
===========================================
- Coverage   74.33%   48.14%   -26.20%     
===========================================
  Files         296      456      +160     
  Lines       36086    79664    +43578     
===========================================
+ Hits        26825    38355    +11530     
- Misses       7539    38259    +30720     
- Partials     1722     3050     +1328     
Impacted Files Coverage Δ
blobstore/access/stream_alloc.go 58.82% <0.00%> (ø)
blobstore/api/blobnode/client.go 85.71% <ø> (ø)
blobstore/api/clustermgr/client.go 0.00% <0.00%> (ø)
blobstore/api/clustermgr/config.go 0.00% <0.00%> (ø)
blobstore/api/clustermgr/disk.go 0.00% <0.00%> (ø)
blobstore/api/clustermgr/kv.go 0.00% <0.00%> (ø)
blobstore/api/clustermgr/proto.go 0.00% <ø> (ø)
blobstore/blobnode/base/qos/qos.go 49.20% <0.00%> (ø)
blobstore/blobnode/base/workutils/stripe_utils.go 85.71% <ø> (-2.39%) ⬇️
blobstore/blobnode/config.go 22.51% <0.00%> (-26.76%) ⬇️
... and 79 more

... and 173 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@M1eyu2018 M1eyu2018 force-pushed the add-request-monitor-for-client branch from a6a2213 to a1ad93f Compare May 9, 2023 04:21
Signed-off-by: M1eyu2018 <857037797@qq.com>
@M1eyu2018 M1eyu2018 force-pushed the add-request-monitor-for-client branch from a1ad93f to 0e87281 Compare May 9, 2023 04:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Need to discuss
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants