Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse BlockOperator #1808

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

Sparse BlockOperator #1808

wants to merge 18 commits into from

Conversation

TiKeil
Copy link

@TiKeil TiKeil commented Nov 29, 2022

This PR revisits ideas from #889.

In #889, @pmli proposed to use pydata/sparse for implementing a sparse BlockOperator. One advantage of this is that we do not need a specialized class and can just incorporate it into BlockOperatorBase.

In this PR, I implemented a first idea on how to do this. Actually, worked out pretty well and there is less code duplication than I thought.

Basically, if specified by the make_sparse argument in the __init__, the BlockOperator does not create ZeroOperator for every None entry. Instead it creates a COO matrix from sparse. Then, the respective coords with non-zero entries are stored. The same is now done for the dense case to be able to follow the same for loops for sparse and dense.
If the dense version is needed, the to_dense() method can be used.

Apart from the tests that I pushed, the code needs further testing.

Another debate would be the default value for make_sparse. For now, it is probably better to stick with False. However, I do not see a reason why True should not be the default in the future. Also because if the dense version is strictly required, one can use the to_dense() mechanism.

The BlockDiagonalOperator already uses the sparse structure since the apply, apply_adjoint, and assemble
overwrites for this class can then immediately be dropped.

Another thing would be to think about whether some methods from pydata/sparse can be used to simplify things even further.

Anything that looks strange to you?

@TiKeil TiKeil marked this pull request as draft November 29, 2022 19:19
@TiKeil TiKeil added the pr:new-feature Introduces a new feature label Nov 29, 2022
@github-actions github-actions bot added the infrastructure Buildsystem, packages and CI label Nov 29, 2022
@pmli
Copy link
Member

pmli commented Nov 30, 2022

Thanks for working on this @TiKeil.

Could also a COO format (e.g., a triple of row indices, column indices, and Operators) be an input to __init__? I suppose this would be beneficial for very large sparse block operators. Also, this might be an alternative to using make_sparse.

@TiKeil
Copy link
Author

TiKeil commented Dec 6, 2022

Thanks for working on this @TiKeil.

Could also a COO format (e.g., a triple of row indices, column indices, and Operators) be an input to __init__? I suppose this would be beneficial for very large sparse block operators. Also, this might be an alternative to using make_sparse.

Thanks for the suggestion. This would mean to force the user to use it in sparse form or would you do it optional?

I have been thinking about this as well but I thought that it might be easier to basically hide the sparse functionality entirely in the Operator. Also, having it as it is makes it easier to do the assertion checks in the beginning and to find the respective source and range spaces.

@pmli
Copy link
Member

pmli commented Dec 6, 2022

Thanks for working on this @TiKeil.
Could also a COO format (e.g., a triple of row indices, column indices, and Operators) be an input to __init__? I suppose this would be beneficial for very large sparse block operators. Also, this might be an alternative to using make_sparse.

Thanks for the suggestion. This would mean to force the user to use it in sparse form or would you do it optional?

I meant doing it similarly to coo_matrix, which accepts different types as the first argument (in particular, a NumPy array and a COO format).

I have been thinking about this as well but I thought that it might be easier to basically hide the sparse functionality entirely in the Operator. Also, having it as it is makes it easier to do the assertion checks in the beginning and to find the respective source and range spaces.

I guess it depends on whether there are applications (which I don't know about) with very large sparse block operators where forming zero blocks is too costly.

@TiKeil
Copy link
Author

TiKeil commented Dec 6, 2022

I meant doing it similarly to coo_matrix, which accepts different types as the first argument (in particular, a NumPy array and a COO format).

Ah ok, I see. That would mean that it would be no longer possible to use the "old" way where no sparsity is used and instead the ZeroOperator blocks are constructed, right? Of course, the old functionality where these ZeroOperators are constructed could then be moved to the to_dense() method.

I have been thinking about this as well but I thought that it might be easier to basically hide the sparse functionality entirely in the Operator. Also, having it as it is makes it easier to do the assertion checks in the beginning and to find the respective source and range spaces.

I guess it depends on whether there are applications (which I don't know about) with very large sparse block operators where forming zero blocks is too costly.

You mean zero blocks in the sense of None in an np.ndarray ? I have not tried for my application actually, but you are raising a good point here. Of course the (rows, cols, data) format is way better than constructing an np.ndarray with a lot of None elements.

I will work on it, thanks !

@pmli
Copy link
Member

pmli commented Dec 6, 2022

I meant doing it similarly to coo_matrix, which accepts different types as the first argument (in particular, a NumPy array and a COO format).

Ah ok, I see. That would mean that it would be no longer possible to use the "old" way where no sparsity is used and instead the ZeroOperator blocks are constructed, right? Of course, the old functionality where these ZeroOperators are constructed could then be moved to the to_dense() method.

I would say that it would be good to keep the old input style (list of lists of Operators and Nones), but things can change internally. I'm not sure what would be the use of the to_dense method.

@TiKeil
Copy link
Author

TiKeil commented Dec 6, 2022

I meant doing it similarly to coo_matrix, which accepts different types as the first argument (in particular, a NumPy array and a COO format).

Ah ok, I see. That would mean that it would be no longer possible to use the "old" way where no sparsity is used and instead the ZeroOperator blocks are constructed, right? Of course, the old functionality where these ZeroOperators are constructed could then be moved to the to_dense() method.

I would say that it would be good to keep the old input style (list of lists of Operators and Nones), but things can change internally. I'm not sure what would be the use of the to_dense method.

I agree that the input style should be the same, but if we handle it like in the coo_matrix, there would not be a choice whether None is transferred to a ZeroOperator or not. I think we should do it like this.

But: Potentially a user-code assumes that every block in a BlockOperator is an operator and operates on it, regardless of a potential sparsity structure. I am not saying that such a user-code makes much sense, but at least we can still support it by the to_dense method, which then gives a BlockOperator that has Operators in every block without an exception.

@TiKeil
Copy link
Author

TiKeil commented Dec 7, 2022

This worked out quite nicely. The only bad technical thing is that the sparse.COO matrix does not allow to take None as the zero element. The zero element is instead 0, as set here. Thus, we currently need to transform all Nones into 0. In the future it is thus always better to define a BlockOperator by starting with np.zeros((N,N), dtype=object) instead of np.empty((N,N), dtype=object) (as was usually done in the past).

@TiKeil
Copy link
Author

TiKeil commented Dec 7, 2022

I obviously failed to get the CI running with the sparse package. Perhaps someone can help me such that I see what other errors might occur with this new implementation of BlockOperators.

@renefritze
Copy link
Member

The developer documentation talks a little bit about how to get new dependencies into the CI. I think it would be great if you look into that, tell me what's unclear, so I can update that now.
I already saw that the Conda part is a little outdated.

@TiKeil
Copy link
Author

TiKeil commented Dec 13, 2022

to_dense is probably needed for the to_matrix rule.

project_to_subbasis probably does not work, yet.

@renefritze
Copy link
Member

The developer documentation talks a little bit about how to get new dependencies into the CI. I think it would be great if you look into that, tell me what's unclear, so I can update that now. I already saw that the Conda part is a little outdated.

I automated most of the process in #1845

@pmli pmli added this to the 2023.1 milestone Dec 16, 2022
@sdrave sdrave force-pushed the main branch 4 times, most recently from 18e66f2 to e80291b Compare March 7, 2023 20:03
@sdrave sdrave removed the autoupdate label May 12, 2023
@github-actions github-actions bot removed the infrastructure Buildsystem, packages and CI label May 16, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jun 7, 2023

This pull request modifies pyproject.toml or requirements-ci-oldest-pins.in.
In case dependencies were changed, make sure to call

make ci_requirements

and commit the changed files to ensure that CI runs with the updated dependencies.

@github-actions github-actions bot added the infrastructure Buildsystem, packages and CI label Jun 7, 2023
@sdrave
Copy link
Member

sdrave commented Jun 7, 2023

@TiKeil, good and bad news: the good news is that sparse can now apparently be installed again via conda-forge. However, current sparse now hard-depends on numba. So adding sparse as a hard dependency would make pyMOR depend on numba and, thus, llvm. I don't think that this would be a wise decision for pyMOR.

Thus, overall I think we should see if it is feasible to run our own sparse data structure. @TiKeil, do you want to look into this in the near future? Otherwise, I'd remove the release milestone.

@TiKeil
Copy link
Author

TiKeil commented Jun 7, 2023

Sad to hear. Would it instead be possible to have it as an optional requirement and leave sparse 'BlockOperators' as an optional thing as I intended to do it in the first place ?

For instance have a 'make_sparse' method that can only be used if the user has the optional package. As a default we could stay with our current dense implementation.

Or is the hard numba dependency also an issue for optional requirements ?

Regarding providing our own sparse implementation: that seems possible but is probably nothing I will do in the next days.

@TiKeil
Copy link
Author

TiKeil commented Jun 12, 2023

Thanks @pmli for looking into this. I lately tried with CSC and did not think about trying other formats. I propose, I revert my latest changes and use COO from scipy sparse instead.

@TiKeil
Copy link
Author

TiKeil commented Jun 13, 2023

Very annoying observation:

>>> import numpy as np
>>> import scipy.sparse as sps
>>> A = np.array([['', 'a'], ['b', '']], dtype=object)
>>> B = sps.coo_array(A)
>>> print(repr(B))
<2x2 sparse array of type '<class 'numpy.object_'>'
	with 2 stored elements in COOrdinate format>
>>> B = sps.coo_matrix((B.data, (B.row, B.col)))
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "/Users/admin/Projects/pymor-dev/venv/lib/python3.10/site-packages/scipy/sparse/_coo.py", line 160, in __init__
    self.data = getdata(obj, copy=copy, dtype=dtype)
  File "/Users/admin/Projects/pymor-dev/venv/lib/python3.10/site-packages/scipy/sparse/_sputils.py", line 141, in getdata
    getdtype(data.dtype)
  File "/Users/admin/Projects/pymor-dev/venv/lib/python3.10/site-packages/scipy/sparse/_sputils.py", line 126, in getdtype
    raise ValueError(
ValueError: object dtype is not supported by sparse matrices

@TiKeil
Copy link
Author

TiKeil commented Jun 16, 2023

pre-commit.ci autofix

@TiKeil
Copy link
Author

TiKeil commented Jun 16, 2023

This worked half-way-well. Current problems are:

  • scipy.sparse.coo_matrix does not support init values of type (data, (row, col)) (as shown in the comment above). Thus, sometimes, we need to construct the sparse object from scratch.
  • scipy.sparse.coo_matrix does not support slicing, which is particularly a problem for the assemble_lincomb and the project rules. So far, I implemented a probably inefficient slicing. It assumes that all coordinates only appear once, which is always the case for us.

I appreciate comments in that direction but I still think that we have a chance to merge this soon, even with these workarounds.

@TiKeil TiKeil requested review from sdrave and pmli June 16, 2023 14:06
@TiKeil
Copy link
Author

TiKeil commented Jun 16, 2023

For me: check if project_to_subbasis works.

@sdrave
Copy link
Member

sdrave commented Jun 19, 2023

@TiKeil, @pmli, there is no constructor for sparse arrays that allows constructing those from sparse data, then I'd assume that a lot of other stuff does not work / is broken as well. In that case I'd go with the roll-your-own-sparse-format approach. (@TiKeil, otherwise, you should take a look at the CI failures.)

@TiKeil
Copy link
Author

TiKeil commented Jun 19, 2023

The CI failures are all due to the fact that the coo_matrix format does not support indexing... I agree that the missing sparse-array construction is somehow worrying. It could be that it is rather a bug that the coo_matrix can be constructed with a (dense) array of dtype=object and is not intended anyway. I would also tend to say that our own sparse format might actually be the best choice (if the strong package dependence of sparse stays like this).

@pmli
Copy link
Member

pmli commented Jun 27, 2023

I agree, it seems that there is a bug in coo_matrix (this scipy/scipy#15828 changed the warning to an error). Using a special sparse format seems necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr:new-feature Introduces a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants