Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] NaN safety, we probably need something more than doc strings. #311

Open
smoothdeveloper opened this issue Nov 15, 2023 · 2 comments

Comments

@smoothdeveloper
Copy link
Contributor

In context of machine learning, many of the optimization algorithms rightfully preclude the presence of NaN values.

The documentation of the function may sometime mention, or not mention if a function can return NaN, and also, how it process NaN as input.

Alas, this is not systematically described, and also, people will just try functions left and right, when they are doing exploratory feature engineering.

The first focus would be to make sure the library offers some batteries included for those that don't want to find out "too late" in the pipeline (as they are long to setup, adjust, run, troubleshoot, etc.).

Without going too far in terms of how to make things perfect, and most sophisticated for long term maintenance, in all places, there is a plan that could bring some safety and long term maintainability:

  • Offering an FSharp.Stats.NumericallySafe module (people open it after FSharp.Stats and it shadows the unchecked variants), there could also be a module with assertions that would defensively throw
  • the module would call the existing APIs but wrap the values in a type that enforce the inspection via pattern matching or helper functions, borrowing idioms from F# core around option or result
  • the existing API should have CLR attributes on the functions / methods signalling "emits NaN", "accepts NaN"
  • there would be property based tests, possibly guided with code coverage, that would validate against presence of those attributes
  • there would be a page in the documentation pages that list all the functions, with filters about those "emits NaN" and other attributes

One can dream :)

In the meantime:

  • I wanted to point out that meanGeometric can emit NaN but the documentation says nothing about this, and it is not exposed under FSharp.Stats.NumericallySafe.
  • In the documentation pages, we'd want to display warning sections after describing the formula, logic, sample code, with a styling that will catch the attention.

related: #280

@smoothdeveloper
Copy link
Contributor Author

<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>

I think we can ensure consistency based on presence of this, which seems to be in place (but it is not really discoverable in code, nor in the documentation pages.

We can also define F# analyzer that looks for functions like sqrt, that are bound to produce NaN.

If someone who groks maths (not me) could list here the F# and BCL functions that produce NaN that are used in this library, it would help with the implementation of such analyzer.

@smoothdeveloper
Copy link
Contributor Author

One issue with open FSharp.Stats.NumericallySafe approach, is you can only switch in your code using #if precompiler directives, or otherwise, you need to pass references to functions, rebind them in your own module based on some context.

There are scenarios where I'd want this to be done without recompiling nor forcing to rebind each function of interest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant