Sparse BlockOperator #1808

TiKeil · 2022-11-29T19:16:14Z

This PR revisits ideas from #889.

In #889, @pmli proposed to use pydata/sparse for implementing a sparse BlockOperator. One advantage of this is that we do not need a specialized class and can just incorporate it into BlockOperatorBase.

In this PR, I implemented a first idea on how to do this. Actually, worked out pretty well and there is less code duplication than I thought.

Basically, if specified by the make_sparse argument in the __init__, the BlockOperator does not create ZeroOperator for every None entry. Instead it creates a COO matrix from sparse. Then, the respective coords with non-zero entries are stored. The same is now done for the dense case to be able to follow the same for loops for sparse and dense.
If the dense version is needed, the to_dense() method can be used.

Apart from the tests that I pushed, the code needs further testing.

Another debate would be the default value for make_sparse. For now, it is probably better to stick with False. However, I do not see a reason why True should not be the default in the future. Also because if the dense version is strictly required, one can use the to_dense() mechanism.

The BlockDiagonalOperator already uses the sparse structure since the apply, apply_adjoint, and assemble
overwrites for this class can then immediately be dropped.

Another thing would be to think about whether some methods from pydata/sparse can be used to simplify things even further.

Anything that looks strange to you?

pmli · 2022-11-30T21:02:26Z

Thanks for working on this @TiKeil.

Could also a COO format (e.g., a triple of row indices, column indices, and Operators) be an input to __init__? I suppose this would be beneficial for very large sparse block operators. Also, this might be an alternative to using make_sparse.

TiKeil · 2022-12-06T09:48:26Z

Thanks for working on this @TiKeil.

Could also a COO format (e.g., a triple of row indices, column indices, and Operators) be an input to __init__? I suppose this would be beneficial for very large sparse block operators. Also, this might be an alternative to using make_sparse.

Thanks for the suggestion. This would mean to force the user to use it in sparse form or would you do it optional?

I have been thinking about this as well but I thought that it might be easier to basically hide the sparse functionality entirely in the Operator. Also, having it as it is makes it easier to do the assertion checks in the beginning and to find the respective source and range spaces.

pmli · 2022-12-06T15:45:28Z

Thanks for working on this @TiKeil.
Could also a COO format (e.g., a triple of row indices, column indices, and Operators) be an input to __init__? I suppose this would be beneficial for very large sparse block operators. Also, this might be an alternative to using make_sparse.

Thanks for the suggestion. This would mean to force the user to use it in sparse form or would you do it optional?

I meant doing it similarly to coo_matrix, which accepts different types as the first argument (in particular, a NumPy array and a COO format).

I have been thinking about this as well but I thought that it might be easier to basically hide the sparse functionality entirely in the Operator. Also, having it as it is makes it easier to do the assertion checks in the beginning and to find the respective source and range spaces.

I guess it depends on whether there are applications (which I don't know about) with very large sparse block operators where forming zero blocks is too costly.

TiKeil · 2022-12-06T15:57:56Z

I meant doing it similarly to coo_matrix, which accepts different types as the first argument (in particular, a NumPy array and a COO format).

Ah ok, I see. That would mean that it would be no longer possible to use the "old" way where no sparsity is used and instead the ZeroOperator blocks are constructed, right? Of course, the old functionality where these ZeroOperators are constructed could then be moved to the to_dense() method.

I have been thinking about this as well but I thought that it might be easier to basically hide the sparse functionality entirely in the Operator. Also, having it as it is makes it easier to do the assertion checks in the beginning and to find the respective source and range spaces.

I guess it depends on whether there are applications (which I don't know about) with very large sparse block operators where forming zero blocks is too costly.

You mean zero blocks in the sense of None in an np.ndarray ? I have not tried for my application actually, but you are raising a good point here. Of course the (rows, cols, data) format is way better than constructing an np.ndarray with a lot of None elements.

I will work on it, thanks !

pmli · 2022-12-06T16:17:20Z

I meant doing it similarly to coo_matrix, which accepts different types as the first argument (in particular, a NumPy array and a COO format).

Ah ok, I see. That would mean that it would be no longer possible to use the "old" way where no sparsity is used and instead the ZeroOperator blocks are constructed, right? Of course, the old functionality where these ZeroOperators are constructed could then be moved to the to_dense() method.

I would say that it would be good to keep the old input style (list of lists of Operators and Nones), but things can change internally. I'm not sure what would be the use of the to_dense method.

TiKeil · 2022-12-06T17:37:40Z

I meant doing it similarly to coo_matrix, which accepts different types as the first argument (in particular, a NumPy array and a COO format).

Ah ok, I see. That would mean that it would be no longer possible to use the "old" way where no sparsity is used and instead the ZeroOperator blocks are constructed, right? Of course, the old functionality where these ZeroOperators are constructed could then be moved to the to_dense() method.

I would say that it would be good to keep the old input style (list of lists of Operators and Nones), but things can change internally. I'm not sure what would be the use of the to_dense method.

I agree that the input style should be the same, but if we handle it like in the coo_matrix, there would not be a choice whether None is transferred to a ZeroOperator or not. I think we should do it like this.

But: Potentially a user-code assumes that every block in a BlockOperator is an operator and operates on it, regardless of a potential sparsity structure. I am not saying that such a user-code makes much sense, but at least we can still support it by the to_dense method, which then gives a BlockOperator that has Operators in every block without an exception.

TiKeil · 2022-12-07T13:46:53Z

This worked out quite nicely. The only bad technical thing is that the sparse.COO matrix does not allow to take None as the zero element. The zero element is instead 0, as set here. Thus, we currently need to transform all Nones into 0. In the future it is thus always better to define a BlockOperator by starting with np.zeros((N,N), dtype=object) instead of np.empty((N,N), dtype=object) (as was usually done in the past).

TiKeil · 2022-12-07T13:52:57Z

I obviously failed to get the CI running with the sparse package. Perhaps someone can help me such that I see what other errors might occur with this new implementation of BlockOperators.

renefritze · 2022-12-12T09:41:15Z

The developer documentation talks a little bit about how to get new dependencies into the CI. I think it would be great if you look into that, tell me what's unclear, so I can update that now.
I already saw that the Conda part is a little outdated.

TiKeil · 2022-12-13T15:40:52Z

to_dense is probably needed for the to_matrix rule.

project_to_subbasis probably does not work, yet.

renefritze · 2022-12-16T12:43:44Z

The developer documentation talks a little bit about how to get new dependencies into the CI. I think it would be great if you look into that, tell me what's unclear, so I can update that now. I already saw that the Conda part is a little outdated.

I automated most of the process in #1845

github-actions · 2023-06-07T11:07:41Z

This pull request modifies pyproject.toml or requirements-ci-oldest-pins.in.
In case dependencies were changed, make sure to call

make ci_requirements

and commit the changed files to ensure that CI runs with the updated dependencies.

sdrave · 2023-06-07T11:14:06Z

@TiKeil, good and bad news: the good news is that sparse can now apparently be installed again via conda-forge. However, current sparse now hard-depends on numba. So adding sparse as a hard dependency would make pyMOR depend on numba and, thus, llvm. I don't think that this would be a wise decision for pyMOR.

Thus, overall I think we should see if it is feasible to run our own sparse data structure. @TiKeil, do you want to look into this in the near future? Otherwise, I'd remove the release milestone.

TiKeil · 2023-06-07T13:58:29Z

Sad to hear. Would it instead be possible to have it as an optional requirement and leave sparse 'BlockOperators' as an optional thing as I intended to do it in the first place ?

For instance have a 'make_sparse' method that can only be used if the user has the optional package. As a default we could stay with our current dense implementation.

Or is the hard numba dependency also an issue for optional requirements ?

Regarding providing our own sparse implementation: that seems possible but is probably nothing I will do in the next days.

TiKeil · 2023-06-12T08:19:40Z

Thanks @pmli for looking into this. I lately tried with CSC and did not think about trying other formats. I propose, I revert my latest changes and use COO from scipy sparse instead.

TiKeil · 2023-06-13T09:30:30Z

Very annoying observation:

>>> import numpy as np
>>> import scipy.sparse as sps
>>> A = np.array([['', 'a'], ['b', '']], dtype=object)
>>> B = sps.coo_array(A)
>>> print(repr(B))
<2x2 sparse array of type '<class 'numpy.object_'>'
	with 2 stored elements in COOrdinate format>
>>> B = sps.coo_matrix((B.data, (B.row, B.col)))
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "/Users/admin/Projects/pymor-dev/venv/lib/python3.10/site-packages/scipy/sparse/_coo.py", line 160, in __init__
    self.data = getdata(obj, copy=copy, dtype=dtype)
  File "/Users/admin/Projects/pymor-dev/venv/lib/python3.10/site-packages/scipy/sparse/_sputils.py", line 141, in getdata
    getdtype(data.dtype)
  File "/Users/admin/Projects/pymor-dev/venv/lib/python3.10/site-packages/scipy/sparse/_sputils.py", line 126, in getdtype
    raise ValueError(
ValueError: object dtype is not supported by sparse matrices

TiKeil · 2023-06-16T14:02:57Z

pre-commit.ci autofix

TiKeil · 2023-06-16T14:06:43Z

This worked half-way-well. Current problems are:

scipy.sparse.coo_matrix does not support init values of type (data, (row, col)) (as shown in the comment above). Thus, sometimes, we need to construct the sparse object from scratch.
scipy.sparse.coo_matrix does not support slicing, which is particularly a problem for the assemble_lincomb and the project rules. So far, I implemented a probably inefficient slicing. It assumes that all coordinates only appear once, which is always the case for us.

I appreciate comments in that direction but I still think that we have a chance to merge this soon, even with these workarounds.

… None

for more information, see https://pre-commit.ci

TiKeil · 2023-06-16T14:09:53Z

For me: check if project_to_subbasis works.

sdrave · 2023-06-19T08:47:36Z

@TiKeil, @pmli, there is no constructor for sparse arrays that allows constructing those from sparse data, then I'd assume that a lot of other stuff does not work / is broken as well. In that case I'd go with the roll-your-own-sparse-format approach. (@TiKeil, otherwise, you should take a look at the CI failures.)

TiKeil · 2023-06-19T10:05:02Z

The CI failures are all due to the fact that the coo_matrix format does not support indexing... I agree that the missing sparse-array construction is somehow worrying. It could be that it is rather a bug that the coo_matrix can be constructed with a (dense) array of dtype=object and is not intended anyway. I would also tend to say that our own sparse format might actually be the best choice (if the strong package dependence of sparse stays like this).

pmli · 2023-06-27T14:51:14Z

I agree, it seems that there is a bug in coo_matrix (this scipy/scipy#15828 changed the warning to an error). Using a special sparse format seems necessary.

github-actions bot added the autoupdate label Nov 29, 2022

TiKeil marked this pull request as draft November 29, 2022 19:19

TiKeil added the pr:new-feature Introduces a new feature label Nov 29, 2022

github-actions bot added the infrastructure Buildsystem, packages and CI label Nov 29, 2022

TiKeil force-pushed the sparse_block_operator branch from 1e9dbb1 to 8397c4d Compare December 5, 2022 14:09

TiKeil force-pushed the sparse_block_operator branch from 5cf4dcd to dcfb64e Compare December 16, 2022 10:45

pmli added this to the 2023.1 milestone Dec 16, 2022

TiKeil force-pushed the sparse_block_operator branch from dcfb64e to 27aa2d0 Compare February 3, 2023 13:57

sdrave force-pushed the main branch 4 times, most recently from 18e66f2 to e80291b Compare March 7, 2023 20:03

sdrave removed the autoupdate label May 12, 2023

TiKeil force-pushed the sparse_block_operator branch from 96bd2ad to 2479320 Compare May 16, 2023 14:55

github-actions bot removed the infrastructure Buildsystem, packages and CI label May 16, 2023

github-actions bot added the infrastructure Buildsystem, packages and CI label Jun 7, 2023

TiKeil force-pushed the sparse_block_operator branch from cd5e08e to 285ea37 Compare June 16, 2023 14:02

TiKeil requested review from sdrave and pmli June 16, 2023 14:06

Tim Keil and others added 18 commits June 16, 2023 16:07

[operators/block] first version of sparse BlockOperator

ef0b4e4

[operators/blocks] add dense method for sparse BlockOperator

dc5ee3c

[operators.block] minimize lines of code for sparse BlockOperator

0f928d8

[operators.block] do not construct sparse per default

41fee38

[operators.block] make BlockDiagonalOperator sparse

5718ae5

[operators.block] add a check for parametric COO matrices

bf758c9

[operators.blocks] minor fix in assembly

cf4ea56

[algorithms.lincomb] adapt AssembleLincombRules to sparse BlockOperator

075097c

[operators.block] adapt project rule

96d48e6

[operators.block] add to_sparse method

15126b7

[operators.block] use sparse.COO matrix in any case, use 0 instead of…

23e787d

… None

[pymortest] clean up tests

05598a3

[algorithms.projection] update for None involved (tbc)

cd4a7b0

[algorithms/to_matrix] adjust to sparse BlockOperator

4715882

[operators/block] only operate on numpy array

f56b1e4

[operators.block] use scipy.sparse.coo_matrix for BlockOperator

44ff94e

[pymortests] update test for sparse BlockOperator

b8ccaa9

[pre-commit.ci] auto fixes from pre-commit.com hooks

6abc805

for more information, see https://pre-commit.ci

TiKeil force-pushed the sparse_block_operator branch from 46770a9 to 6abc805 Compare June 16, 2023 14:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparse BlockOperator #1808

Sparse BlockOperator #1808

TiKeil commented Nov 29, 2022 •

edited

pmli commented Nov 30, 2022

TiKeil commented Dec 6, 2022 •

edited

pmli commented Dec 6, 2022

TiKeil commented Dec 6, 2022

pmli commented Dec 6, 2022

TiKeil commented Dec 6, 2022 •

edited

TiKeil commented Dec 7, 2022

TiKeil commented Dec 7, 2022

renefritze commented Dec 12, 2022

TiKeil commented Dec 13, 2022 •

edited

renefritze commented Dec 16, 2022

github-actions bot commented Jun 7, 2023

sdrave commented Jun 7, 2023

TiKeil commented Jun 7, 2023

TiKeil commented Jun 12, 2023

TiKeil commented Jun 13, 2023 •

edited

TiKeil commented Jun 16, 2023

TiKeil commented Jun 16, 2023

TiKeil commented Jun 16, 2023

sdrave commented Jun 19, 2023

TiKeil commented Jun 19, 2023 •

edited

pmli commented Jun 27, 2023

Sparse BlockOperator #1808

Are you sure you want to change the base?

Sparse BlockOperator #1808

Conversation

TiKeil commented Nov 29, 2022 • edited

pmli commented Nov 30, 2022

TiKeil commented Dec 6, 2022 • edited

pmli commented Dec 6, 2022

TiKeil commented Dec 6, 2022

pmli commented Dec 6, 2022

TiKeil commented Dec 6, 2022 • edited

TiKeil commented Dec 7, 2022

TiKeil commented Dec 7, 2022

renefritze commented Dec 12, 2022

TiKeil commented Dec 13, 2022 • edited

renefritze commented Dec 16, 2022

github-actions bot commented Jun 7, 2023

sdrave commented Jun 7, 2023

TiKeil commented Jun 7, 2023

TiKeil commented Jun 12, 2023

TiKeil commented Jun 13, 2023 • edited

TiKeil commented Jun 16, 2023

TiKeil commented Jun 16, 2023

TiKeil commented Jun 16, 2023

sdrave commented Jun 19, 2023

TiKeil commented Jun 19, 2023 • edited

pmli commented Jun 27, 2023

TiKeil commented Nov 29, 2022 •

edited

TiKeil commented Dec 6, 2022 •

edited

TiKeil commented Dec 6, 2022 •

edited

TiKeil commented Dec 13, 2022 •

edited

TiKeil commented Jun 13, 2023 •

edited

TiKeil commented Jun 19, 2023 •

edited