Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Hasheable Float for use in metrics #1780

Closed
wants to merge 13 commits into from

Conversation

cijothomas
Copy link
Member

@cijothomas cijothomas commented May 16, 2024

Towards #1761
This adds Hash implementation for KeyValue, by providing a wrapper around F64. This is only planned to be used in Metrics aggregation.

HashKeyValue is replaces with KeyValue in sdk.

Alternatively, we could just state that float values in attributes for Metrics are not supported. It was never tested before either!

No perf regression, but slight gains, probably due to one less indirection?.

     Running benches/attribute_set.rs (/home/cijothomas/repos/opentelemetry-rust/target/release/deps/attribute_set-c5774977e4c1461a)
Gnuplot not found, using plotters backend
AttributeSet_without_duplicates
                        time:   [399.63 ns 400.22 ns 401.05 ns]
                        change: [-0.7145% -0.4216% -0.1156%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  6 (6.00%) high severe

AttributeSet_with_duplicates
                        time:   [326.62 ns 326.88 ns 327.18 ns]
                        change: [-1.9789% -1.4644% -1.0568%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe

@cijothomas cijothomas requested a review from a team as a code owner May 16, 2024 00:29
Copy link

codecov bot commented May 16, 2024

Codecov Report

Attention: Patch coverage is 90.72165% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 73.1%. Comparing base (66b00d6) to head (bc204d5).

Files Patch % Lines
opentelemetry/src/metrics/mod.rs 90.2% 9 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff          @@
##            main   #1780   +/-   ##
=====================================
  Coverage   73.0%   73.1%           
=====================================
  Files        121     121           
  Lines      18891   18923   +32     
=====================================
+ Hits       13808   13842   +34     
+ Misses      5083    5081    -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

opentelemetry/src/common.rs Outdated Show resolved Hide resolved
@TommyCpp
Copy link
Contributor

Sorry haven't get a chance to look at it. Will do it today

@TommyCpp TommyCpp self-assigned this May 20, 2024
@cijothomas cijothomas marked this pull request as draft May 20, 2024 23:19
@cijothomas
Copy link
Member Author

Sorry haven't get a chance to look at it. Will do it today

Moved to draft for now, as other changes are likely conflicting with this PR. and I'd prefer to finish those first and comeback to this. Also, I modified this PR so that the hash/eq implementations are restricted to "metrics" feature flag only in api.

@cijothomas
Copy link
Member Author

@utpilla and I discussed another alternative option - for Metrics, if the Attribute's value is f64, convert to string (to_string()) and use string's built-in hash/eq capabilities. And add a doc warning saying, Attributes with values of type f64 will be stringified for metric aggregation.

@lalitb @TommyCpp any objections for doing this?

@TommyCpp
Copy link
Contributor

for Metrics, if the Attribute's value is f64, convert to string (to_string()) and use string's built-in hash/eq capabilities. And add a doc warning saying, Attributes with values of type f64 will be stringified for metric aggregation.

Creating a String requires heap allocation and can be expensive. Also using string representation also means (key, 1.00)'s hash will be differnt from (key, 1.0)'s hash, which is counterintuitive IMO

@utpilla
Copy link
Contributor

utpilla commented May 22, 2024

Creating a String requires heap allocation and can be expensive.

That should be a one-time allocation (only when storing the key-value pair on the Hashmap), right?

Also using string representation also means (key, 1.00)'s hash will be differnt from (key, 1.0)'s hash, which is counterintuitive IMO

Is it even possible to have a float value stringified to "1.0" or "1.00"? Shouldn't it be just converted to "1"?

let num1: f64 = 1.0;
let num2: f64 = 1.00;
    
assert_eq!("1", num1.to_string());
assert_eq!("1", num2.to_string());

@lalitb
Copy link
Member

lalitb commented May 22, 2024

@lalitb @TommyCpp any objections for doing this?

what is the issue with having bitwise hashing? Apart from performance, it would be good to test that there is no precision loss in conversion from the float to string. to_string() uses the default precision of Display implementation, and can't be changed.

@TommyCpp
Copy link
Contributor

TommyCpp commented May 22, 2024

Is it even possible to have a float value stringified to "1.0" or "1.00"? Shouldn't it be just converted to "1"?

let num1: f64 = 1.0;
let num2: f64 = 1.00;
    
assert_eq!("1", num1.to_string());
assert_eq!("1", num2.to_string());

Good point you are right. Display implementation will do some conversion before stringify it

@TommyCpp
Copy link
Contributor

@lalitb @TommyCpp any objections for doing this?

why not to directly do bitwise hashing. Apart from performance, it would be good to test that there is no precision loss in conversion from the float to string. to_string() uses the default precision of Display implementation, and can't be changed.

+1 It's not clear to me what benifit the stringify brings compare with bitwise hashing?

@TommyCpp
Copy link
Contributor

That should be a one-time allocation (only when storing the key-value pair on the Hashmap), right?

The overhead is on f64 -> string, which happens for every hash of AttributeSet. We calculate the hash whenever there is a &[KeyValue] to AttributeSet conversion so the hash cost will incur on every measurement right?

measure.call(val, AttributeSet::from(attrs))

@cijothomas
Copy link
Member Author

@lalitb @TommyCpp any objections for doing this?

why not to directly do bitwise hashing. Apart from performance, it would be good to test that there is no precision loss in conversion from the float to string. to_string() uses the default precision of Display implementation, and can't be changed.

+1 It's not clear to me what benifit the stringify brings compare with bitwise hashing?

🤣 Now I am not sure. I was trying to avoid having to own the hashing logic ourselves, and steer users away from using float for dimension values in the first place! (No metric backend that I am aware of support float dimension values, it'll be anyway stringified at exporter/ingestion level)

I'll stick the bitwise hashing for now, and also include a note/warning for users to avoid f64 values for Metric KeyValues.

@cijothomas
Copy link
Member Author

That should be a one-time allocation (only when storing the key-value pair on the Hashmap), right?

The overhead is on f64 -> string, which happens for every hash of AttributeSet. We calculate the hash whenever there is a &[KeyValue] to AttributeSet conversion so the hash cost will incur on every measurement right?

measure.call(val, AttributeSet::from(attrs))

yes the cost if not one time, it is ongoing for all measurements.

@utpilla
Copy link
Contributor

utpilla commented May 22, 2024

Good point you are right. Display implementation will do some conversion before stringify it

@TommyCpp I know we are not considering this anymore but just for my own learning, I think it probably doesn't have to do stringification. I believe the in-memory representation of the float values 1.0 and 1.00 is the same:

let num1: f64 = 1.0;
let num2: f64 = 1.00;
    
println!("{}", num1.to_bits()); // 4607182418800017408
println!("{}", num2.to_bits()); // 4607182418800017408
    
assert_eq!(num1.to_bits(), num2.to_bits());

@cijothomas
Copy link
Member Author

Closing in favor of #1806

@cijothomas cijothomas closed this May 22, 2024
@TommyCpp
Copy link
Contributor

Good point you are right. Display implementation will do some conversion before stringify it

@TommyCpp I know we are not considering this anymore but just for my own learning, I think it probably doesn't have to do stringification. I believe the in-memory representation of the float values 1.0 and 1.00 is the same:

let num1: f64 = 1.0;
let num2: f64 = 1.00;
    
println!("{}", num1.to_bits()); // 4607182418800017408
println!("{}", num2.to_bits()); // 4607182418800017408
    
assert_eq!(num1.to_bits(), num2.to_bits());

Yep they should have the same underlying representation right after compiling from literal value to f64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants