Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WithClock TracerProviderOption #2052

Open
wants to merge 41 commits into
base: main
Choose a base branch
from

Conversation

Frefreak
Copy link
Member

@Frefreak Frefreak commented Jul 3, 2021

details please see #2043

@codecov
Copy link

codecov bot commented Jul 3, 2021

Codecov Report

Merging #2052 (e5c647d) into main (0c1f156) will increase coverage by 0.0%.
The diff coverage is 100.0%.

Impacted file tree graph

@@          Coverage Diff          @@
##            main   #2052   +/-   ##
=====================================
  Coverage   76.0%   76.0%           
=====================================
  Files        174     174           
  Lines      12088   12105   +17     
=====================================
+ Hits        9187    9204   +17     
  Misses      2656    2656           
  Partials     245     245           
Impacted Files Coverage Δ
sdk/trace/provider.go 82.6% <100.0%> (+0.7%) ⬆️
sdk/trace/span.go 82.2% <100.0%> (+0.1%) ⬆️
sdk/trace/time.go 100.0% <100.0%> (ø)
sdk/trace/tracer.go 100.0% <100.0%> (ø)

sdk/trace/provider.go Outdated Show resolved Hide resolved
sdk/trace/time/time.go Outdated Show resolved Hide resolved
sdk/trace/provider.go Outdated Show resolved Hide resolved
- change `SetClock` to `WithClock` style option
- `Since` fallback to use `Now().Sub` if not implemented
@Frefreak
Copy link
Member Author

Frefreak commented Jul 6, 2021

Thanks for all the suggestions & comments. Code has been updated.

- move MonotonicEndTime from sdk/internal to sdk/trace, and unexport it
- use a struct instead of interface as traceproviders's real clock field
Copy link
Member

@pellared pellared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good if someone takes another look at the "godoc" comments.

Nice work 👍

Nice to have: it might be also good to create a benchmark to see how much time+memory we lose when creating start/end span timestamps because of the new clock abstractions.

sdk/trace/time.go Outdated Show resolved Hide resolved
sdk/trace/provider.go Outdated Show resolved Hide resolved
sdk/trace/provider.go Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
sdk/trace/provider.go Outdated Show resolved Hide resolved
sdk/trace/provider.go Outdated Show resolved Hide resolved
sdk/trace/trace_test.go Outdated Show resolved Hide resolved
@Frefreak
Copy link
Member Author

Frefreak commented Jul 8, 2021

godoc & others updated.
For the benchmark, during development I ran 2 comparison check to see the performance impact:

default clock

One is to compare the default clock against original implementation (git checkout main). The difference seems negligible or hard to differentiate (obtained by running go test -run=xxxxxMatchNothingxxxxx -test.benchtime=1s -bench=. under sdk/trace). Result of one run is below:
new clock (NeverSample and DotString parts are omitted to make it shorter):

pkg: go.opentelemetry.io/otel/sdk/trace                                                                                              
cpu: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz                                                                                         
BenchmarkAttributesMapToKeyValue-8        532833              2194 ns/op                                                             
BenchmarkStartEndSpan/AlwaysSample-8     1380988               869.2 ns/op           848 B/op          9 allocs/op                            
BenchmarkSpanWithAttributes_4/AlwaysSample-8              824190              1444 ns/op            1584 B/op         17 allocs/op     
BenchmarkSpanWithAttributes_8/AlwaysSample-8              618087              1936 ns/op            2112 B/op         23 allocs/op    
BenchmarkSpanWithAttributes_all/AlwaysSample-8            784983              1794 ns/op            1936 B/op         21 allocs/op   
BenchmarkSpanWithAttributes_all_2x/AlwaysSample-8         395304              3125 ns/op            3236 B/op         32 allocs/op                                           
PASS                                                                                                                                 
ok      go.opentelemetry.io/otel/sdk/trace      21.784s

original (main branch):

pkg: go.opentelemetry.io/otel/sdk/trace
cpu: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
BenchmarkAttributesMapToKeyValue-8        501423              2167 ns/op
BenchmarkStartEndSpan/AlwaysSample-8     1429974               859.0 ns/op           848 B/op          9 allocs/op
BenchmarkSpanWithAttributes_4/AlwaysSample-8              965439              1473 ns/op            1584 B/op         17 allocs/op
BenchmarkSpanWithAttributes_8/AlwaysSample-8              594805              1935 ns/op            2112 B/op         23 allocs/op
BenchmarkSpanWithAttributes_all/AlwaysSample-8            638281              1812 ns/op            1936 B/op         21 allocs/op
BenchmarkSpanWithAttributes_all_2x/AlwaysSample-8         399723              3103 ns/op            3236 B/op         32 allocs/op
PASS
ok      go.opentelemetry.io/otel/sdk/trace      20.962s

custom clock which add "offset"

The other check is to run against a custom clock in another repo (which is the original motivation that we want to add this clock abstraction). This clock has an clockOffset that will be added to every time.Now, which is updated periodically (several minutes or hours) in another goroutine so the overhead of updating offset can be ignore. In this clock implementation the "ns/op" stat does increase a bit, and I think this "offset" clock is a typical implementation, but after all that is custom logic. Should we include this in the benchmark to compare the default clock with an "offset" clock?

Here is a snippet of the test logic (modified from benchmark_test.go):

func tracerWithNTPClock(b *testing.B, name string, sampler sdktrace.Sampler) trace.Tracer {                                          
    // clock initialization
    clock := ...                                                                                                                                                                                                
    tp := sdktrace.NewTracerProvider(sdktrace.WithSampler(sampler), sdktrace.WithClock(&clock))                                      
    return tp.Tracer(name)                                                                                                           
}

func traceBenchmark(b *testing.B, name string, fn func(*testing.B, trace.Tracer)) {                                                  
    b.Run("AlwaysSample", func(b *testing.B) {                                                                                       
        b.ReportAllocs()                                                                                                             
        fn(b, tracer(b, name, sdktrace.AlwaysSample()))                                                                              
    })                                                                                                                               
    b.Run("AlwaysSampleWithNTPClock", func(b *testing.B) {                                                                           
        b.ReportAllocs()                                                                                                             
        fn(b, tracerWithNTPClock(b, name, sdktrace.AlwaysSample()))                                                                  
    })                                                                                                                                                                                                                                                             
}

@Frefreak Frefreak changed the title add clock/SetClock to trace provider (fix #2043) add clock/WithClock to trace provider (fix #2043) Jul 8, 2021
Aneurysm9
Aneurysm9 previously approved these changes Jul 8, 2021
Copy link
Member

@pellared pellared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving a few nit comments
Approving ✔️

CHANGELOG.md Outdated Show resolved Hide resolved
sdk/trace/provider.go Outdated Show resolved Hide resolved
sdk/trace/time.go Outdated Show resolved Hide resolved
pellared
pellared previously approved these changes Jul 8, 2021
sdk/trace/time.go Outdated Show resolved Hide resolved
sdk/trace/span.go Show resolved Hide resolved
Copy link
Member

@pellared pellared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is already in a very nice shape 👍

sdk/trace/time.go Outdated Show resolved Hide resolved
sdk/trace/trace_test.go Outdated Show resolved Hide resolved
sdk/trace/span.go Outdated Show resolved Hide resolved
Frefreak and others added 3 commits August 6, 2021 21:32
Co-authored-by: Robert Pająk <pellared@hotmail.com>
sdk/trace/trace_test.go Outdated Show resolved Hide resolved
sdk/trace/trace_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@MrAlias MrAlias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking close.

sdk/trace/time.go Show resolved Hide resolved
tracer := tp.Tracer("custom-clock")

_, span := tracer.Start(context.Background(), "test-frozen-clock")
time.Sleep(time.Microsecond * 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look needed. Is there a reason to pause here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC in windows if a span ends too fast it can result in the same start time/end time. If that happens then the Equal test would appears not that useful here.

Comment on lines 339 to 344
// to generate span start/end time. Clock.Stopwatch should start and
// return a Stopwatch instance. For Stopwatch implementation,
// Stopwatch.Started should return the time.Time when the stopwatch
// started. Stopwatch.Elapsed should return the time.Duration measuring
// the elapsed time from when the stopwatch started. Its value should be
// positive to ensure monotonic start/end time of the span.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It seems like this portion of the docs could be moved to their respective types/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. I feels like the existing types' comment basically says the same thing so I remove most of the docs here.

startTime = time.Now()
span.stopwatch = tr.provider.clock.Stopwatch()
} else {
span.stopwatch = standardStopwatch(startTime)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user supplies their own start time it shouldn't mean the user supplied clock is not used. This seems like a design flaw.

Should we updated the Clock interface's Stopwatch method to accept a start time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess a given startTime might not always be useful or reasonable for some clock implementations (but I cannot really come up with any realistic examples). And I feels like this would complicate many implementations that could have been straight forward. Besides if the user provides a startTime and an endTime then the custom clock would not be used anyways.

Considering a "offset" clock, if user provide startTime, should the clock giving out the endTime based on the given startTime?

@Frefreak
Copy link
Member Author

I added a parameter to clock's Stopwatch method (for a standard default clock this basically moves the IsZero check from the caller into the clock's Stopwatch method). For a custom clock the behavior would depends on the implementation (since for some custom clock it may decide that a passed in time makes not sense).

I also thought about adding a similar parameter to Stopwatch.Stop too, but from the current End function it would need to move some logic after the IsRecoding check to the beginning of the function, which kinds of defeat the purpose of calculating an end time as soon as possible. Besides the current logic already achieves what we want: use the user provided end time if its provided. So I just leaves it as is.

Current benchmark (3 rounds each branch):
main branch:

BenchmarkStartEndSpan/AlwaysSample-8     1564064               785.6 ns/op           848 B/op          9 allocs/op
BenchmarkSpanWithAttributes_4/AlwaysSample-8              924451              1330 ns/op            1584 B/op          17 allocs/op
BenchmarkSpanWithAttributes_8/AlwaysSample-8              643656              1769 ns/op            2112 B/op          23 allocs/op
BenchmarkSpanWithAttributes_all/AlwaysSample-8            794772              1537 ns/op            1936 B/op          21 allocs/op
BenchmarkSpanWithAttributes_all_2x/AlwaysSample-8         452740              2678 ns/op            3236 B/op          32 allocs/op

BenchmarkStartEndSpan/AlwaysSample-8     1542901               760.7 ns/op           848 B/op          9 allocs/op
BenchmarkSpanWithAttributes_4/AlwaysSample-8              900636              1265 ns/op            1584 B/op          17 allocs/op
BenchmarkSpanWithAttributes_8/AlwaysSample-8              720046              1697 ns/op            2112 B/op          23 allocs/op
BenchmarkSpanWithAttributes_all/AlwaysSample-8            774172              1558 ns/op            1936 B/op          21 allocs/op
BenchmarkSpanWithAttributes_all_2x/AlwaysSample-8         453163              2698 ns/op            3236 B/op          32 allocs/op

BenchmarkStartEndSpan/AlwaysSample-8     1562756               759.7 ns/op           848 B/op          9 allocs/op
BenchmarkSpanWithAttributes_4/AlwaysSample-8              898989              1280 ns/op            1584 B/op          17 allocs/op
BenchmarkSpanWithAttributes_8/AlwaysSample-8              718977              1697 ns/op            2112 B/op          23 allocs/op
BenchmarkSpanWithAttributes_all/AlwaysSample-8            800170              1561 ns/op            1936 B/op          21 allocs/op
BenchmarkSpanWithAttributes_all_2x/AlwaysSample-8         449101              2678 ns/op            3236 B/op          32 allocs/op

clock branch (which is the default clock):

BenchmarkStartEndSpan/AlwaysSample-8     1483533               799.9 ns/op           872 B/op         10 allocs/op
BenchmarkSpanWithAttributes_4/AlwaysSample-8              844624              1335 ns/op            1608 B/op           18 allocs/op
BenchmarkSpanWithAttributes_8/AlwaysSample-8              692430              1753 ns/op            2136 B/op           24 allocs/op
BenchmarkSpanWithAttributes_all/AlwaysSample-8            737242              1601 ns/op            1960 B/op           22 allocs/op
BenchmarkSpanWithAttributes_all_2x/AlwaysSample-8         442994              2743 ns/op            3260 B/op           33 allocs/op

BenchmarkStartEndSpan/AlwaysSample-8     1494285               800.3 ns/op           872 B/op         10 allocs/op
BenchmarkSpanWithAttributes_4/AlwaysSample-8              920941              1306 ns/op            1608 B/op           18 allocs/op
BenchmarkSpanWithAttributes_8/AlwaysSample-8              698163              1752 ns/op            2136 B/op           24 allocs/op
BenchmarkSpanWithAttributes_all/AlwaysSample-8            719253              1599 ns/op            1960 B/op           22 allocs/op
BenchmarkSpanWithAttributes_all_2x/AlwaysSample-8         448485              2763 ns/op            3260 B/op           33 allocs/op

BenchmarkStartEndSpan/AlwaysSample-8     1491642               805.5 ns/op           872 B/op         10 allocs/op
BenchmarkSpanWithAttributes_4/AlwaysSample-8              941373              1319 ns/op            1608 B/op           18 allocs/op
BenchmarkSpanWithAttributes_8/AlwaysSample-8              698836              1784 ns/op            2136 B/op           24 allocs/op
BenchmarkSpanWithAttributes_all/AlwaysSample-8            767616              1595 ns/op            1960 B/op           22 allocs/op
BenchmarkSpanWithAttributes_all_2x/AlwaysSample-8         446974              2746 ns/op            3260 B/op           33 allocs/op

Since there were some refactors happened on the main branch, a more thorough review might be necessary in case I missed something important.

@Frefreak
Copy link
Member Author

Sorry for the long pause from last update. We still think this feature might be useful in some cases (although to be honest not very useful to most people).
I just merged the current main into my branch. But since the discussion of this PR has been already long and its a long time since the first time it submitted, should I reopen a PR for a clean review or is it okay to continue here?

BTW this is the current benchmark for a comparison before & after this PR:

main:

BenchmarkAttributesMapToKeyValue-8                598554              1932 ns/op
BenchmarkSpanProcessor-8                          105206             11924 ns/op           12592 B/op        105 allocs/op
BenchmarkSpanProcessorVerboseLogging-8             99156             11508 ns/op           13736 B/op        111 allocs/op
BenchmarkStartEndSpan/AlwaysSample-8             1447701               858.8 ns/op           848 B/op          9 allocs/op
BenchmarkSpanWithAttributes_4/AlwaysSample-8              822363              1461 ns/op            1760 B/op         19 allocs/op
BenchmarkSpanWithAttributes_8/AlwaysSample-8              592827              2077 ns/op            2464 B/op         27 allocs/op
BenchmarkSpanWithAttributes_all/AlwaysSample-8            791739              1563 ns/op            1936 B/op         21 allocs/op
BenchmarkSpanWithAttributes_all_2x/AlwaysSample-8         451161              2720 ns/op            3236 B/op         32 allocs/op
BenchmarkSpanWithEvents_4/AlwaysSample-8                  737056              1666 ns/op            1536 B/op         20 allocs/op
BenchmarkSpanWithEvents_8/AlwaysSample-8                  503560              2387 ns/op            2240 B/op         29 allocs/op
BenchmarkSpanWithEvents_WithStackTrace/AlwaysSample-8            1000000              1045 ns/op            1024 B/op         13 allocs/op
BenchmarkSpanWithEvents_WithTimestamp/AlwaysSample-8             1000000              1032 ns/op            1048 B/op         14 allocs/op

clock:

BenchmarkAttributesMapToKeyValue-8                597698              1813 ns/op
BenchmarkSpanProcessor-8                           99366             11682 ns/op           12832 B/op        115 allocs/op
BenchmarkSpanProcessorVerboseLogging-8            101935             11701 ns/op           13976 B/op        121 allocs/op
BenchmarkStartEndSpan/AlwaysSample-8             1479482               809.1 ns/op           872 B/op         10 allocs/op
BenchmarkSpanWithAttributes_4/AlwaysSample-8              794362              1754 ns/op            1784 B/op         20 allocs/op
BenchmarkSpanWithAttributes_8/AlwaysSample-8              571761              2154 ns/op            2488 B/op         28 allocs/op
BenchmarkSpanWithAttributes_all/AlwaysSample-8            738685              1627 ns/op            1960 B/op         22 allocs/op
BenchmarkSpanWithAttributes_all_2x/AlwaysSample-8         438031              2765 ns/op            3260 B/op         33 allocs/op
BenchmarkSpanWithEvents_4/AlwaysSample-8                  718692              1666 ns/op            1560 B/op         21 allocs/op
BenchmarkSpanWithEvents_8/AlwaysSample-8                  500578              2427 ns/op            2264 B/op         30 allocs/op
BenchmarkSpanWithEvents_WithStackTrace/AlwaysSample-8            1000000              1055 ns/op            1048 B/op         14 allocs/op
BenchmarkSpanWithEvents_WithTimestamp/AlwaysSample-8             1000000              1067 ns/op            1072 B/op         15 allocs/op

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants