Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flaky test: potentially flaky distance::dot::tests::test_dot_f32 #2243

Open
chebbyChefNEQ opened this issue Apr 23, 2024 · 1 comment
Open
Labels
flaky-test rust Rust related tasks

Comments

@chebbyChefNEQ
Copy link
Contributor

https://github.com/lancedb/lance/actions/runs/8794212752/job/24133303595

@chebbyChefNEQ chebbyChefNEQ added flaky-test rust Rust related tasks labels Apr 23, 2024
@broccoliSpicy
Copy link
Contributor

broccoliSpicy commented May 14, 2024

  // Accuracy of dot product depends on the size of the components
  // of the vector.
  // Imagine that each `x_i` can vary by `є * |x_i|`. Similarly for `y_i`.
  // (Basically, it's accurate to ±(1 + є) * |x_i|).
  // Error for `sum(x, y)` is `є_x + є_y`. Error for multiple is `є_x * x + є_y * y`.
  // See: https://www.geol.lsu.edu/jlorenzo/geophysics/uncertainties/Uncertaintiespart2.html
  // The multiplication of `x_i` and `y_i` can vary by `(є * |x_i|) * |y_i| + (є * |y_i|) * |x_i|`.
  // This simplifies to `2 * є * (|x_i| + |y_i|)`.
  // So the error for the sum of all the multiplications is `є * sum(|x_i| + |y_i|)`.
  fn max_error<T: Float + AsPrimitive<f64>>(x: &[f64], y: &[f64]) -> f32 {
      let dot = x
          .iter()
          .cloned()
          .zip(y.iter().cloned())
          .map(|(x, y)| x.abs() * y.abs())
          .sum::<f64>();
      (2.0 * T::epsilon().as_() * dot) as f32
  }

source link

actually, T::epsilon() is This is the difference between 1.0 and the next larger representable number.
https://doc.rust-lang.org/std/f64/constant.EPSILON.html

however, in IEEE754, the variance in float point number representation is not a constant, and it is also not linear to the number value, for values close to 1, the variance is small (i.e., they have high precision), for very large values, the difference between consecutive representable floating-point numbers can be quite large (i.e., the precision is lower), the reason is that the variance in fraction part will be amplified by the exponent part, and larger value has larger exponent.
so T::epsilon().as() * dot might not be appropriate here.

// The multiplication ofx_iandy_ican vary by(є * |x_i|) * |y_i| + (є * |y_i|) * |x_i|.
this may also have implications, as the є * є may accumulate in large vector dimensions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test rust Rust related tasks
Projects
None yet
Development

No branches or pull requests

2 participants