Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve performance of recode! for array dest #355

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ahnlabb
Copy link

@ahnlabb ahnlabb commented May 19, 2021

Performance improvements to fix #354.

Before:

  178.212 μs (47 allocations: 1.15 MiB)
  130.184 ms (1098374 allocations: 32.00 MiB)
  155.115 ms (998374 allocations: 28.95 MiB)
  154.162 ms (998378 allocations: 30.48 MiB)

After:

  191.479 μs (47 allocations: 1.15 MiB)
  187.252 μs (60 allocations: 394.39 KiB)
  1.384 ms (6 allocations: 288 bytes)
  1.689 ms (10 allocations: 1.53 MiB)

Todo:

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, and sorry for the delay. I think if we add a special method, better use the most efficient implementation (see my comment).

Comment on lines +90 to 98
for p in opt_pairs
if x ≅ p.first
return p.second
end
end
for p in opt_pairs
if recode_in(x, p.first)
return p.second
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could change the behavior in case of overlap between pairs. Why did you change this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this was a while a go I'll need to take some time with it and rerun my (micro-)benchmarks to be sure but unless my memory fails me recode_in was a performance bottleneck and splitting the checks (in addition to switching to map!) made a noticeable difference for highly optimizable cases. You're absolutely right that it is a breaking change, and should have been highlighted in the PR since it warrants discussion. I spent some time trying to get recode_in to optimize away but was not satisfied with the result. The most troublesome part is of course the any(x ≅ y for y in collection) for the case when collection is a primitive. I'll get back to you with data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Maybe better do this in a separate PR since it's a bit more tricky.

src/recode.jl Show resolved Hide resolved
Comment on lines +68 to +76
pairs = map(pairs) do p
p.first => convert(T, p.second)
end
recoded = recode(src, default, pairs...)
if T >: Missing
dest .= unwrap.(recoded)
else
dest .= missing_check.(unwrap.(recoded))
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than doing this, to avoid making a copy and two passes over the data, we should call recode on levels(src), and then do something like:

@inbounds for i in eachindex(dest, src)
    dest[i] = newlevels[src.refs[i]+1]
end

The actual implementation needs to be a bit more complex so that the first entry in newlevels is missing (to handle the case when src.refs is 0).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than doing this, to avoid making a copy and two passes over the data

By copy do you mean copy of the src.levels? In this implementation no copy of the actual array (or refs) is made which is the main reason why it is so much faster (as outlined in my StackOverflow answer) is because all the actual copying of the refs happens only once at the last line dest .= unwrap.(recoded) the recoded variable shares the refs with src.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recode(src, default, pairs...) allocates a new vector, right? That's relatively fast, but it's even better to avoid it.

Copy link
Author

@ahnlabb ahnlabb Jul 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I remembered the details wrong. In the StackOverflow example I did:

mapping = Dict("X"=>1, "Y"=>2, "Z"=>3)
b = CategoricalArray{Int64,1,UInt32}(undef, 0)
b.refs = a.refs
levels!(b.pool, [mapping[l] for l in levels(a.pool)])

which is similar to what you're suggesting. However, in this PR we initialize the CategoricalArray that will be put in the recoded variable with something like CategoricalArray{S, N, R}(undef, size(a)) so the refs are not shared between src and recoded.

EDIT: Like I noted on SO using levels! does not work in the general case

@nalimilan
Copy link
Member

Any news here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

recode! slow when src is AbstractArray
2 participants