Safety implications of unsorted indices #232

mulimoen · 2020-09-26T10:06:14Z

This is a discussion issue for how to handle unsorted indices in arrays and matrices. The situation as it is today is relaxed, we try to avoid it, having unsorted indices won't cause safety issues, but might produce unexpected results/panics in some algorithms.

The primary motivation for bringing this up is to allow/disallow future optimisations to take advantage of the sorted property, and how the library should behave when these invariants are broken.

Should we formalise this behaviour and document it or should we disallow creation of unsorted structures?

Ref #231, #35, #16, #136

vbarrielle · 2020-09-26T11:48:42Z

As you pointed out, there are two separate concerns here:

actual memory safety. We want to be able to rely on CsMat invariants to accelerate the more critical algorithms with eg unchecked array accesses. For the moment, I have not seen an algorithm that relies on the sorted indices property to be memory safe, the important property is having the indices in bounds.
correctness. Some algorithms rely on the sorted property to work, for instance finding an existing non-zero using dichotomic search.
performance. I've always assumed the sorted property could help some algorithms to perform better, but I'm not sure I've seen
that many algorithms.

Here are the possible ways forward in my opinion, including the options you mentioned:

Allow unsorted indices, but document algorithms that won't work in that case. Pros: less checks when creating a sparse matrix. Cons: unexpected failures, hard to debug.
Disallow the creation of unsorted structures. Pros: this is the closest to the current situation, except bugs like serde might deserialize into invalid structures #231. Cons: makes the library less flexible, adds some runtime checks that can be costly. Though as you showed in Check if needs to be sorted #228 it is way less expensive to check than to sort (O(n) vs O(nln(n)), and we already need O(n) checks for safety).
Track the sorted state, either dynamically, or statically (Type checked invariants #16). Pros: maximum flexibility. Cons: less ergonomic library, particularly if we use the type system to track this state. Maybe adding an has_sorted_indices: bool field to the matrix would be enough. Then algorithms that need sorted indices would be able to fail, or sort, when they get unsorted indices.

Do you see other alternatives?

mulimoen · 2020-09-26T12:16:33Z

The usage of unsafe in this crate is minimal, which is great for memory safety. We should not have problems with this, as long as all indices are in bounds when passing to external bindings.

The sorted indices are very nice for so many algorithms, dot-product, kronecker product, arrays from a matrix slice*, more efficient use of cache.

I think option 2 would be preferable, as most algorithms that works on sorted indices will probably also produce sorted indices. We can avoid the runtime checks by allowing an unsafe constructor. This would also be the least amount of breakage and API change as you mention.

In #35 you mention the need for unsorted indices. Has this changed?

[*]: If we introduce a new slicetype with an offset field

vbarrielle · 2020-09-27T20:14:53Z

The sorted indices are very nice for so many algorithms, dot-product, kronecker product, arrays from a matrix slice*, more efficient use of cache.
I think option 2 would be preferable, as most algorithms that works on sorted indices will probably also produce sorted indices. We can avoid the runtime checks by allowing an unsafe constructor. This would also be the least amount of breakage and API change as you mention.

Yes I think I agree, the sorted indices is a very convenient property in many algorithms. The unsafe constructor is the way to go.

In #35 you mention the need for unsorted indices. Has this changed?

There are some decompositions that can only produce unsorted indices as far as I know. But the case is quite rare, and they don't need that many operations, so when these decompositions get implemented we can define a limited sparse matrix type with unsorted indices to help in these decompositions.

maboesanman · 2020-10-01T12:53:33Z

It seems to me like some algorithms requiring sorted indices and some not could be handled with a private is_sorted Field, which is assumed to be correctly set. When creating a matrix from raw, there could be three options:

from_raw Which checks if the array is sorted and sets is_sorted accordingly.
from_unsorted_raw which performs no check and sets is_sorted to false
unsafe from_sorted_raw_unchecked which performs no check and sets is_sorted to true

Additionally, a method to sort if needed would be useful. Possibly sorted_view And sort Methods?

vbarrielle · 2020-10-06T14:46:09Z

Just for reference the C++ lib eigen has similar questions: https://gitlab.com/libeigen/eigen/-/issues/694#note_254554694

vbarrielle mentioned this issue Jan 21, 2021

Relax Copy bound to Clone #121

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Safety implications of unsorted indices #232

Safety implications of unsorted indices #232

mulimoen commented Sep 26, 2020 •

edited

vbarrielle commented Sep 26, 2020

mulimoen commented Sep 26, 2020 •

edited

vbarrielle commented Sep 27, 2020

maboesanman commented Oct 1, 2020

vbarrielle commented Oct 6, 2020

Safety implications of unsorted indices #232

Safety implications of unsorted indices #232

Comments

mulimoen commented Sep 26, 2020 • edited

vbarrielle commented Sep 26, 2020

mulimoen commented Sep 26, 2020 • edited

vbarrielle commented Sep 27, 2020

maboesanman commented Oct 1, 2020

vbarrielle commented Oct 6, 2020

mulimoen commented Sep 26, 2020 •

edited

mulimoen commented Sep 26, 2020 •

edited