Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

B re-use last session if it passes latency score threshold check #2666

Merged
merged 4 commits into from
Nov 28, 2022

Conversation

yondonfu
Copy link
Member

@yondonfu yondonfu commented Nov 24, 2022

What does this pull request do? Explain your changes. (required)

This PR implements the solution described in #2660.

Outside of this PR I think it would be a good idea to do a refactor to move as much selection logic (i.e. last session tracking, segment in flight tracking, etc.) into MinLSSelector in selection.go to keep that logic encapsulated in a single place. This felt like a hefty change though so I opted not to do that refactor in this PR.

We may also want to separately consider tweaking the value of the constant SELECTOR_LATENCY_SCORE_THRESHOLD to be slightly higher i.e. 1.1, 1.2 to be more lenient with the latency score threshold check, at least until we get to implementing #1232, if we still end up seeing a lot of swapping between Os during selection - I think the probability of this happening will be higher in scenarios with short segments (i.e. 1s) and lots of rendition profiles which would increase transcode + download time.

Specific updates (required)

See commit history.

8dcfaf4 and 5be5fe8 contain small code refactors.

3fc43f3 contains the main update.

How did you test each of these updates (required)

Updated unit tests.

Does this pull request close any open issues?

Fixes #2660

Checklist:

@yondonfu yondonfu changed the title Yf/eager selection fix B re-use last session if it passes latency score threshold check Nov 24, 2022
@codecov
Copy link

codecov bot commented Nov 24, 2022

Codecov Report

Merging #2666 (10afef8) into master (67c5829) will increase coverage by 0.05535%.
The diff coverage is 95.45455%.

Impacted file tree graph

@@                 Coverage Diff                 @@
##              master       #2666         +/-   ##
===================================================
+ Coverage   56.30905%   56.36440%   +0.05535%     
===================================================
  Files             88          88                 
  Lines          19052       19067         +15     
===================================================
+ Hits           10728       10747         +19     
+ Misses          7741        7737          -4     
  Partials         583         583                 
Impacted Files Coverage Δ
server/mediaserver.go 67.56135% <0.00000%> (ø)
server/selection.go 99.31034% <ø> (ø)
server/broadcast.go 79.15590% <100.00000%> (+0.27283%) ⬆️
core/orchestrator.go 77.86999% <0.00000%> (+0.13832%) ⬆️
discovery/db_discovery.go 70.75812% <0.00000%> (+1.08303%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 67c5829...10afef8. Read the comment docs.

Impacted Files Coverage Δ
server/mediaserver.go 67.56135% <0.00000%> (ø)
server/selection.go 99.31034% <ø> (ø)
server/broadcast.go 79.15590% <100.00000%> (+0.27283%) ⬆️
core/orchestrator.go 77.86999% <0.00000%> (+0.13832%) ⬆️
discovery/db_discovery.go 70.75812% <0.00000%> (+1.08303%) ⬆️

Copy link
Contributor

@leszko leszko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added one comment, other than that LGTM.

And I'm also in favor of doing some refactoring, but as a separate PR.

@@ -704,7 +733,9 @@ func TestTranscodeSegment_CompleteSession(t *testing.T) {
// Create stub server
ts, mux := stubTLSServer()
defer ts.Close()
transcodeDelay := 100 * time.Millisecond
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd try to make unit tests time-dependent, because the test is inherently flaky. Especially that 100ms is very low time, so this test may fail in many environments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that throwing in these types of sleeps should be avoided, but in this case...

  1. I needed to manipulate the value of the LatencyScore field calculated here. I can directly control the value of seg.Duration, but if tookAllDur is determined by the amount of time that elapses during the request so if tookAllDur is 0 or a value really close to 0 regardless of what I set as seg.Duration I could end up with a 0 value for LatencyScore. So, AFAIK adding a sleep in the /segment handler that is triggered during the request was the easiest way to also manipulate the value of tookAllDur.
  2. I actually don't think the test will be flaky with the updates (see the snippet below) to the tests in 8cbd6a2 (I rebased + force-pushed) because the values of tookAllDur and seg.Duration are set so that we can always guarantee that the calculated LatencyScore is what we want for the assertions we are making (i.e. a score < SELECTOR_LATENCY_SCORE_THRESHOLD and a score > SELECTOR_LATENCY_SCORE_THRESHOLD).
segDurMultiplier := 100.0
// The seg duration that meets the latency score threshold
// A larger seg duration gives a smaller/better latency score
// A smaller seg duration gives a larger/worse latency score
segDurLatencyScoreThreshold := transcodeDelay.Seconds() / SELECTOR_LATENCY_SCORE_THRESHOLD
// The seg duration that is better than the threshold (i.e. "fast")
segDurFastLatencyScore := segDurLatencyScoreThreshold * segDurMultiplier
// The seg duration that is worse than the threshold (i.e "slow")
segDurSlowLatencyScore := segDurLatencyScoreThreshold / segDurMultiplier

// Re-use last session if oldest segment is in-flight for < durMult * segDur
return session
// A session in the exclusion list is not selectable
if includesSession(exclude, session) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though it's a side effect of your change, I really like the readability of having these broken out into distinct checks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad you agree that this is more readable! TBH I was having trouble parsing the one-liner with multiple logical operators so felt compelled to do the small refactor.

@yondonfu
Copy link
Member Author

Outside of this PR I think it would be a good idea to do a refactor to move as much selection logic (i.e. last session tracking, segment in flight tracking, etc.) into MinLSSelector in selection.go to keep that logic encapsulated in a single place. This felt like a hefty change though so I opted not to do that refactor in this PR.

Tracking in #2669

@thomshutt thomshutt mentioned this pull request Dec 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants