Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow PartialRadixSort for Partial Array #692

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kf-vikas
Copy link

SUMMARY

  • The method PartialRadixSort sorted the complete array. But there could be scenarios where sort is needed for a part of the array.

Automated Checks

  • I have run ./gradlew test and made sure that my PR does not break any unit test.
  • I have run ./gradlew checkstyleMain or the equivalent and corrected the formatting warnings reported.

@lemire
Copy link
Member

lemire commented Dec 20, 2023

That's reasonable, but it would be best to have a use case. I am not sure that generalizing code that is only used in a specific manner is necessarily helpful.

@kf-vikas
Copy link
Author

kf-vikas commented Dec 20, 2023

That's reasonable, but it would be best to have a use case. I am not sure that generalizing code that is only used in a specific manner is necessarily helpful.

So, the use case that lead to this effort is as following:

  1. RoaringBitMap (RBM) library has an API add(). This was initially used. When we profiled the library, add seemed to be taking the most amount of time. So, we explored other possible APIs - addN came to the forefront.
  2. Now, to use addN we have two options - pre create a fixed size array. Keep filling it. Once filled - sort it and do addN, other option is to create a dynamically growing List but this results in a copy while growing, which we didn't want to.
  3. So, we went with sort and do addN. While sorting, partialRadixSort came to our use case. But this always assumed that the sort needs to happen for the entire array. But since we had pre-created that array, partialRadixSort as is, couldn't be used. That is why we modified the routine here.

@lemire
Copy link
Member

lemire commented Dec 20, 2023

@kf-vikas Can you give me an example of code where you'd use this function?

My concern is that your PR adds functionality that is not used, not tested, not benchmarked. If we could at least illustrate the use case..

@richardstartin
Copy link
Member

This method was never intended to be a public sorting utility, and I don’t object to copying it (I wrote it so I think I can grant permission to copy it.)

Can I ask about the use case though? Are you inserting values into the bitmap in ascending order? If so, you can use the bitmap writer API which avoids the binary searches which make the add method expensive.

@lemire
Copy link
Member

lemire commented Dec 21, 2023

@kf-vikas Let me elaborate on my concern. As @richardstartin stated, this is not part of the public API. This means, in particular, that it could change or disappear in a future release. Now, we can make it part of our public API, but we would want something like a test that illustrates the use case.

We also want to promote good usage of what is already available. To be clear, increasing the surface area of our public API is not a goal. In some instances, that can be a negative.

@kf-vikas
Copy link
Author

kf-vikas commented Dec 22, 2023

@kf-vikas Can you give me an example of code where you'd use this function?

My concern is that your PR adds functionality that is not used, not tested, not benchmarked. If we could at least illustrate the use case..

@lemire - we tried it in one of our internal implementations. I am not sure if I can share the code.

Let me explain what lead to this.

One of our use case happened to insert a lot of random values in the Roaring Bitmap. On profiling, it was found that add call was quite CPU intensive. So we started exploring alternatives. One of the option was to do batch insert (using addN) API. But then, based on previous knowledge from @ashishkf, it was suggested to sort the array before insertion.
However, the array (to sort) was pre-allocated and was not guaranteed that it will be completely filled. Hence we looked at the method partialRadixSort and thought to augment it.

Orthogonal to this, I would like to know your thoughts/recommendations on how to insert random numbers?

Thank you.

@kf-vikas
Copy link
Author

This method was never intended to be a public sorting utility, and I don’t object to copying it (I wrote it so I think I can grant permission to copy it.)

Can I ask about the use case though? Are you inserting values into the bitmap in ascending order? If so, you can use the bitmap writer API which avoids the binary searches which make the add method expensive.

@richardstartin - our use case is insertion of random numbers in the RoaringBitmap. Options:

  • add
  • addN - we haven't verified this yet
  • sort + addN

Your thoughts are welcome!

@kf-vikas
Copy link
Author

@kf-vikas Let me elaborate on my concern. As @richardstartin stated, this is not part of the public API. This means, in particular, that it could change or disappear in a future release. Now, we can make it part of our public API, but we would want something like a test that illustrates the use case.

We also want to promote good usage of what is already available. To be clear, increasing the surface area of our public API is not a goal. In some instances, that can be a negative.

@lemire - if the concern is of a missing unit test, I am happy to add it to the diff.

The use case is something I described earlier. Please let me know if the explanation is not clear.

Thank you.

@lemire
Copy link
Member

lemire commented Dec 23, 2023

Adding a test would help a lot.

Copy link
Member

@lemire lemire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a test of the new functionality before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants