Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

math:sin, cos, sincos: Inconsistencies between documentation and implementation #166

Open
mateunho opened this issue Jun 19, 2015 · 9 comments

Comments

@mateunho
Copy link
Contributor

According to Questions and functions' docs, there's a clearly defined input value range of 0:2Pi. Since pal is meant to be super fast, one should not check nor normalize input (inside of function) which is out of the range (GIGO rule). My questions arises:

  1. Why there is a M_NORMALIZE_RADIANS macro used?
  2. Why test cases for these functions are using values from range -Pi:Pi?
@kfitch
Copy link
Contributor

kfitch commented Jun 22, 2015

I would argue this is a case where the documentation should be changed, and M_NORMALIZE_RADIANS should be cleaned/fixed. Having a function called sin(or cos) that has a different input range than the normal mathematical function is just asking for trouble. I violates the "principle of least surprise" and is likely to lead to problems for users of the library.

Having a faster version of sin(or cos) that has a distinct name to be used in those cases where the input is known to be constrained might be a good idea, If we did have such a function I think either +/-Pi or +/-2Pi would be more reasonable restrictions than 0:2Pi. Off the top of my head, here are some possible names:

  • fastsin
  • sin_restricted
  • sin_PiOrLess
  • sin_???
  • sin_profit

P.S.

Should M_NORMALIZE_RADIANS a part of the public API of PAL? Currently it lives in one of the public headers and as a result could be used by users of PAL. I suspect M_NORMALIZE_RADIANS is something that should be internal. If you look at the implementations of trig function in the msun code it takes a different approach to "normalizing." We probably we want the flexibility to tweak this macro in the future for performance reasons without risking breaking user code.

@lfochamon
Copy link
Contributor

Given the whole idea of the library, wouldn't it make sense to define a range---be it [0,2pi) or [-pi,pi)---and leave the generation/normalization of the range to the user. In other words, just p_sin and p_cos be the "fast" functions. That would seem to be more in line with the "no belt/suspender" and GIGO ideas. Specially given the benchmarks @kfitch reported in #117. Otherwise, +1 for making M_NORMALIZE_RADIANS internal.

@aolofsson
Copy link
Member

@kfitch Yes, agree that we may be breaking one principle here for the sake of another. There are many embedded libraries that have used this before. I will add links to them in the README.md to clarify. We seem to be having a culture clash here between resource embedded and desktop/server developer viewpoints?:-)

@mansourmoufid
Copy link
Contributor

I agree that the periodic trigonometric functions should assume their argument is in the correct range. But in this case it would be nice to provide an fmod function which developers can use to ensure that it is.

For example:

p_fmod_f32(x, n, 2.f * M_PI); // optional
p_cos_f32(x, y, n);

@kfitch
Copy link
Contributor

kfitch commented Jul 2, 2015

@eliteraspberries That is definitely one reasonable approach, but it brings up a few more questions:

For the case where the library user does not know for sure the range of his inputs and would use the example code you presented, could we do better performance wise? If the "fmod" operation was inside the cos function then we would have a chance to have better cache usage (or even avoid the cache by using registers). Also, the compiler might have more wiggle room to schedule things when unrolling loops and looping to fill in potential pipeline holes. And finally there are potential efficiencies to be had when combining the fmod with the cos (or other trig function). Look at the implementation under the msun directory. They actually wrap into +/-pi/2 (instead of +/-2pi), and keep track of the multiple of pi/2 needed. This lets them use various symmetries when evaluating the function. By limiting to a range closer to 0 it might be possible to use fewer terms in the Taylor series expansion approximation, which again might improve performance. Of course, the proof is in the pudding. The only way to know how much impact these potential optimizations would have would be to try it out. And, probably try it out on a few different vector lengths on a few different platforms.

For the case where the library user does know the input range to be limited to +/- 2Pi we can be very fast.

I would argue that having two versions of each trig function would be the ideal case; one with the full input range, and another limited to +/- 2Pi. This lets the library user express their intent/expectations clearly. Of course a larger API means a larger maintenance burden.

@mansourmoufid
Copy link
Contributor

@kfitch That sounds like it would please everyone, with a little more work.

In that case, I would humbly suggest to implement a static inline version of the fmod function in a header file, so it could be used inline by several functions. So the compiler does all the work.

Something like this for example:

fmod.h:

static inline void _fmod(const float *a, float *c, size_t n, float x)
{
    size_t i;
    for (i = 0; i < n; i++) {
        ...
    }
}
static inline void _fmod_2pi(const float *a, float *c, size_t n)
{
    _fmod(a, c, n, 2.f * M_PI);
}
void fmod(const float *, float *, size_t, float);
void fmod_2pi(const float *, float *, size_t);

fmod.c:

#include "fmod.h"
void fmod(const float *a, float *c, size_t n, float x)
{
    _fmod(a, c, n, x);
}
void fmod_2pi(const float *a, float *c, size_t n)
{
    _fmod_2pi(a, c, n);
}

cos.c:

#include "fmod.h"
static inline _cos(const float *a, float *c, size_t n)
{
    size_t i;
    for (i = 0; i < n; i++) {
        ...
    }
}
void cos(const float *a, float *c, size_t n)
{
    _cos(a, c, n);
}
void cos_norm(const float *a, float *c, size_t n)
{
    _fmod_2pi(a, c, n);
    _cos(a, c, n);
}

The compiler would inline _fmod and _cos into the cos_norm function, perhaps combine the two for loops, and do whatever instruction scheduling.

The library would end up being slightly bigger though. If everyone likes the idea, I can work on it tomorrow. In the end, I'm happy with whatever the project decides.

@mateunho
Copy link
Contributor Author

mateunho commented Jul 2, 2015

@eliteraspberries 👍 LGTM

@lfochamon
Copy link
Contributor

Is implementing the fmods in a header file a good idea (it's effectively what's happening)? Also, if you're providing cos_norm, do you need a public fmod_2pi (@kfitch raised that issue earlier)? Some implementations of trigonometric functions might use different ranges in the future.

@mansourmoufid
Copy link
Contributor

I think there are two separate issues here. First is the unit testing data. It should have input which is bound to [0, 2pi]. That's easy to fix, just generate new data (gold).

The second is that the implementations may not properly handle input on [0, 2pi] -- perhaps only on [-pi, pi] or [0, pi/2]. This issue has to be handled by the individual function implementation. For example the tangent implementation uses a core algorithm for input in [0, pi/4] and then uses identities and symmetries to extend that to [0, 2pi].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants