Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safety implications of unsorted indices #232

Open
mulimoen opened this issue Sep 26, 2020 · 5 comments
Open

Safety implications of unsorted indices #232

mulimoen opened this issue Sep 26, 2020 · 5 comments

Comments

@mulimoen
Copy link
Collaborator

mulimoen commented Sep 26, 2020

This is a discussion issue for how to handle unsorted indices in arrays and matrices. The situation as it is today is relaxed, we try to avoid it, having unsorted indices won't cause safety issues, but might produce unexpected results/panics in some algorithms.

The primary motivation for bringing this up is to allow/disallow future optimisations to take advantage of the sorted property, and how the library should behave when these invariants are broken.

Should we formalise this behaviour and document it or should we disallow creation of unsorted structures?

Ref #231, #35, #16, #136

@vbarrielle
Copy link
Collaborator

As you pointed out, there are two separate concerns here:

  • actual memory safety. We want to be able to rely on CsMat invariants to accelerate the more critical algorithms with eg unchecked array accesses. For the moment, I have not seen an algorithm that relies on the sorted indices property to be memory safe, the important property is having the indices in bounds.
  • correctness. Some algorithms rely on the sorted property to work, for instance finding an existing non-zero using dichotomic search.
  • performance. I've always assumed the sorted property could help some algorithms to perform better, but I'm not sure I've seen
    that many algorithms.

Here are the possible ways forward in my opinion, including the options you mentioned:

  • Allow unsorted indices, but document algorithms that won't work in that case. Pros: less checks when creating a sparse matrix. Cons: unexpected failures, hard to debug.
  • Disallow the creation of unsorted structures. Pros: this is the closest to the current situation, except bugs like serde might deserialize into invalid structures #231. Cons: makes the library less flexible, adds some runtime checks that can be costly. Though as you showed in Check if needs to be sorted #228 it is way less expensive to check than to sort (O(n) vs O(nln(n)), and we already need O(n) checks for safety).
  • Track the sorted state, either dynamically, or statically (Type checked invariants  #16). Pros: maximum flexibility. Cons: less ergonomic library, particularly if we use the type system to track this state. Maybe adding an has_sorted_indices: bool field to the matrix would be enough. Then algorithms that need sorted indices would be able to fail, or sort, when they get unsorted indices.

Do you see other alternatives?

@mulimoen
Copy link
Collaborator Author

mulimoen commented Sep 26, 2020

The usage of unsafe in this crate is minimal, which is great for memory safety. We should not have problems with this, as long as all indices are in bounds when passing to external bindings.

The sorted indices are very nice for so many algorithms, dot-product, kronecker product, arrays from a matrix slice*, more efficient use of cache.

I think option 2 would be preferable, as most algorithms that works on sorted indices will probably also produce sorted indices. We can avoid the runtime checks by allowing an unsafe constructor. This would also be the least amount of breakage and API change as you mention.

In #35 you mention the need for unsorted indices. Has this changed?

[*]: If we introduce a new slicetype with an offset field

@vbarrielle
Copy link
Collaborator

The sorted indices are very nice for so many algorithms, dot-product, kronecker product, arrays from a matrix slice*, more efficient use of cache.
I think option 2 would be preferable, as most algorithms that works on sorted indices will probably also produce sorted indices. We can avoid the runtime checks by allowing an unsafe constructor. This would also be the least amount of breakage and API change as you mention.

Yes I think I agree, the sorted indices is a very convenient property in many algorithms. The unsafe constructor is the way to go.

In #35 you mention the need for unsorted indices. Has this changed?

There are some decompositions that can only produce unsorted indices as far as I know. But the case is quite rare, and they don't need that many operations, so when these decompositions get implemented we can define a limited sparse matrix type with unsorted indices to help in these decompositions.

@maboesanman
Copy link
Contributor

It seems to me like some algorithms requiring sorted indices and some not could be handled with a private is_sorted Field, which is assumed to be correctly set. When creating a matrix from raw, there could be three options:

  • from_raw Which checks if the array is sorted and sets is_sorted accordingly.
  • from_unsorted_raw which performs no check and sets is_sorted to false
  • unsafe from_sorted_raw_unchecked which performs no check and sets is_sorted to true

Additionally, a method to sort if needed would be useful. Possibly sorted_view And sort Methods?

@vbarrielle
Copy link
Collaborator

Just for reference the C++ lib eigen has similar questions: https://gitlab.com/libeigen/eigen/-/issues/694#note_254554694

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants