Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimal way to resize matrix #31

Open
lube opened this issue Jun 26, 2022 · 1 comment
Open

Optimal way to resize matrix #31

lube opened this issue Jun 26, 2022 · 1 comment

Comments

@lube
Copy link

lube commented Jun 26, 2022

Hi, is there any recommended way to resize sparse matrixes? Thinking about backing storage i would think coo matrixes should be easier to resize right?

@ckingdev
Copy link

If by resize, you mean to essentially drop some number of the last rows and/or columns from the matrix and any data they may contain (or add empty ones)? Enlarging any of the usual sparse formats is simple- only the variables storing the size of the matrix need to be changed, and for compressed formats if the compressed axis is enlarged, the last value of the indptr array will need to be copied and appended for each row/column added.

For removing rows/columns, there is an additional consideration- nonzero values in the dropped rows/columns will need to be removed. In general, this will require iterating over the nonzero values and either setting the value in the data array to zero (and pruning zeros at the end) or copying the kept values to another array. Unfortunately, it's not possible in general to know how many values will be kept before iterating over them since the values will not be sorted by both axes.

Once that is handled, for COO/DOK matrices the work is done. For compressed formats, the indptr array needs to have the last values dropped, but this can be done with a slice if memory allocation is not an issue.

Doing this may require careful handling of arrays and slices if there are memory constraints. Simply slicing the underlying storage arrays to drop rows/columns will not reduce the memory usage as the slice is just a view into the original data array. Copying to a new, correctly sized array will allow releasing the original array if necessary, but requires more memory in the interim and likely passing through the data twice to determine the correct size and then copying the appropriate data. Which approach you take will depend on your constraints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants