Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve corspearman performance (Matrix & Matrix-Matrix cases) #850

Closed
wants to merge 9 commits into from

Conversation

PGS62
Copy link
Contributor

@PGS62 PGS62 commented Feb 20, 2023

if n = size(x,2) then this PR reduces the number of calls made by corspearman(x) to tiedrank from n + n(n-1)/2 to n. Likewise, if m = size(y,2) , the number of calls made by corspearman(x,y) falls from n + n*m to n + m. Hence a big speedup for multi-column input.

In the tests below, the new version of corspearman is in a package KendallTau.

julia> using StatsBase, KendallTau, BenchmarkTools

julia> x = rand(10,10);KendallTau.corspearman(x)==StatsBase.corspearman(x)#compile
true

julia> x = rand(1000,1000);

julia> @btime res_old = StatsBase.corspearman(x);
  18.005 s (2002003 allocations: 7.63 GiB)

julia> @btime res_new = KendallTau.corspearman(x);
  43.331 ms (4014 allocations: 38.50 MiB)

julia> res_new == res_old
true

corspearman improvements

After the change, the tests in test/rankcorr.jl all pass. The desired handling of NaN values is maintained too.

if n = size(x,2) then this release reduces the number of calls made by corspearman(x) to tiedrank from n + n(n-1)/2  to n. Hence better speed.
src/rankcorr.jl Outdated Show resolved Hide resolved
src/rankcorr.jl Outdated Show resolved Hide resolved
src/rankcorr.jl Outdated Show resolved Hide resolved
PGS62 and others added 3 commits February 21, 2023 10:06
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
My testing showed that this suggested change does indeed reduce allocations and results in the method `tiedrank_nan` being a bit faster.

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
@PGS62
Copy link
Contributor Author

PGS62 commented Feb 21, 2023

Thank you @devmotion for looking at this PR.

My real interest is to improve the performance of corkendall, though whilst working on that I happened to notice the inefficiency in corspearman - hence this PR. If by any chance you have time to look at issue #849 that would be a great help.

src/rankcorr.jl Outdated Show resolved Hide resolved
src/rankcorr.jl Outdated Show resolved Hide resolved
test/rankcorr.jl Outdated Show resolved Hide resolved
PGS62 and others added 2 commits February 13, 2024 10:29
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
@PGS62
Copy link
Contributor Author

PGS62 commented Feb 16, 2024

Hello @nalimilan and other maintainers of StatsBase,

Is it possible for this PR to be accepted while there are failures against Julia nightly? I'd like to make a PR for changes to corkendall and it would be simpler to do that if this PR had gone through, since the relevant code is in the same file.

Thanks!

@PGS62
Copy link
Contributor Author

PGS62 commented Mar 29, 2024

I plan to submit an alternative PR in which corspearman has better performance than the version in this PR. So I'm closing this PR.

@PGS62 PGS62 closed this Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants