-
Notifications
You must be signed in to change notification settings - Fork 634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] add request monitor for client #1863
base: master
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for cubefs-check ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
f1b6521
to
6817ad7
Compare
6817ad7
to
21083e3
Compare
21083e3
to
78a804d
Compare
I konw what you want to do, but it's hard to understand your running_mountor logic, is there any way to simplify your code? |
@@ -158,10 +158,12 @@ func (d *Dir) Create(ctx context.Context, req *fuse.CreateRequest, resp *fuse.Cr | |||
var err error | |||
var newInode uint64 | |||
metric := exporter.NewTPCnt("filecreate") | |||
runningStat := d.super.runningMonitor.AddClientOp("create", req.Hdr().Pid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stat.BeginStat and stat.EndStat already make the time cost statitcs, which duplicate with process in AddClientOp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this function duplicated with stat.BeginStat and stat.EndStat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AddClientOp will record not only the time cost, but also the pid, operator type and make related counter add one.
Similarly, SubClientOp will handle no only error, but also check and make related counter subtract one if time cost is greater than the timeout set in config.
@Victor1319 In general,The algorithm is that put the op into the related counter by hash, and then check the op timeout when the timeout reach. if timeout, report it. |
78a804d
to
a6a2213
Compare
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## master #1863 +/- ##
===========================================
- Coverage 74.33% 48.14% -26.20%
===========================================
Files 296 456 +160
Lines 36086 79664 +43578
===========================================
+ Hits 26825 38355 +11530
- Misses 7539 38259 +30720
- Partials 1722 3050 +1328
... and 173 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
a6a2213
to
a1ad93f
Compare
Signed-off-by: M1eyu2018 <857037797@qq.com>
a1ad93f
to
0e87281
Compare
Introduction:
In order to report abnormal request(e.g timeout, fail request) in fuse client to user, add request monitor for client in this pr.
This monitor works for all requests by one routine. When detecting a timeout request, it can report the process id and the operation type correctly.
Advantage:
1、less resource consumption because of only one routine
2、locate problem quickly by its reported process id and operation type
Monitor algorithm:
1、Time is split to three time window and the time window size is equal to timeout which is set in client config.
2、If timeout is one second, and If a request starts at time window called time window0 whose remainder is zero divided by three in second, the counter called counter0 will add one. If the request ends and doesn't time out, counter0 will subtract one.
3、For counter0, when this time window ends(this second passes in the case), the monitor waits one more time window size and then checks the counter if it's greater than zero. If yes, it means that there are timeout request which starts at time window0. So the monitor report them with their process id and operation type.
4、Similarly, counter1 works at time window1 whose remainder is one divided by three in second, and counter2 works at time window1 whose remainder is two divided by three in second. Thus, the three counter takes turns working and covers all time.