Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to require median to return a value from passed collection #98

Open
sebsebmc opened this issue Dec 7, 2021 · 4 comments
Open

Comments

@sebsebmc
Copy link

sebsebmc commented Dec 7, 2021

Python's statistics has median_low and median_high to give the lower and higher value in the passed array if the array has an even number of elements. I do not see an analogous method in Julia or a way to specify any sort of behavior that will guarantee that the result of median comes from the underlying array.

Of course, it is possible to check the length of the array to see if its even but that doesn't solve the problem of efficiently getting the 2 values in the array that are the median.

It would be useful to be able to specify that I want the return value from median to be a value from the passed collection.

@fredrikekre fredrikekre transferred this issue from JuliaLang/julia Dec 8, 2021
@mschauer
Copy link
Member

mschauer commented Dec 8, 2021

Yes, we should expose that. A possible function name would be medians giving a range or a pair, or is this too subtile?

@sebsebmc
Copy link
Author

sebsebmc commented Dec 8, 2021

median_low and median_high are nice in that they will always return 1 item, regardless of whether the collection has an even or odd number of elements. On the other hand returning both in one call is of course faster than having to make 2 calls.

@nalimilan
Copy link
Member

See also previous discussion at JuliaLang/julia#19359. I'd rather add arguments to median (like tie=first or middle=first) rather than new functions. The new PR is JuliaLang/julia#30329, but I had some remaining objections at JuliaLang/julia#30329 (comment), which will probably require some design discussion again. Help welcome to revive it!

@aplavin
Copy link
Contributor

aplavin commented Dec 19, 2022

The "median, but necessarily from the original dataset" is also called medoid. So this could be the name of a new function.

Compared to median_low and _high, medoid directly generalizes to multiple dimensions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants