Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

symmetric backend very slow for PEPS tensors #947

Open
saeedjahromi opened this issue Oct 3, 2021 · 4 comments
Open

symmetric backend very slow for PEPS tensors #947

saeedjahromi opened this issue Oct 3, 2021 · 4 comments

Comments

@saeedjahromi
Copy link

Hey dev team.
I have been working on a PEPS algorithms for simulating 2D fermionic systems with the symmetric backend and it turned out the symmetric backend is very slow for PEPS tensors. I did some profiling and found that the bottleneck is the slow performance of the functions in the blocksparse_utils.py file which mainly contributes to the _find_diagonal_sparse_blocks() and _find_transposed_diagonal_sparse_blocks() functions.

I was wondering if there is any solution to speed up the symmetric backend or any advice or suggestion from your side?

@mganahl
Copy link
Collaborator

mganahl commented Oct 4, 2021

Hi @saeedjahromi, thank you for the message! Great to hear that you are using the library for PEPS.

You are saying that the code is very slow. What is the baseline you are comparing it to (plain numpy, other symmetric tensor code)? Also, are you doing variational optimization, CTM, or something else? How large are your bond dimensions?

The functions you are mentioning are computing the symmetry blocks of a given tensor (when reshaped into a matrix). This step can indeed become a bottleneck if your bond dimensions are small and your tensor have only a few (~<=3) legs.

To give some context: Our code uses an approach that is somewhat different from many other publicly available packages. We don't store the individual symmetry blocks of a tensors separately. Instead, all non-zero elements are stored in a 1d data-array (numpy array). If we want to perform contractions, decompositions a.s.o. we use the charge information of each tensor leg to work out which elements of this data-array go into which symmetry block. Working out this mapping can become the bottleneck for small bond dimensions. That said, this approach can have significant advantages if the tensors have higher ranks (~>=4) and/or more than one simultaneous symmetry (e.g. two or more species for fermions, charge conservation + Z2 conservation a.s.o.).

One thing you can do to increase speed is to turn on a "caching" option within the block-sparse code. You can do this with tensornetwork.block_sparse.enable_caching(). If this is turned on, the functions you mentioned above are cached on the inputs. This works very well if there is no truncation step in your algorithm, e.g. via an SVD. If there is an SVD you can still use caching, but the chances of getting a cache hit are significantly reduced. This is because SVDs/truncations often involve more than one tensor in your network, and lead to redistribution of charges across the involved tensors. This makes caching less efficient. Furthermore, due to the constant charge redistribution, the cache may fill up and use a lot of memory. If this happens you can clear the cache using tensornetwork.block_sparse.clear_cache(). You can also just disable caching for the SVD with tensornetwork.block_sparse.disable_caching().

Let me know if this helps!

@saeedjahromi
Copy link
Author

saeedjahromi commented Oct 6, 2021

Hi @mganahl Thanks for your response and helpful remarks. I have designed a fermionic (symmetric) iPEPS code which currently uses simple update based on local SVD for updating the PEPS tensors and later on a CTMRG algorithm for approximating the contraction of the whole network and variational calculation of the expectation values.

The bottleneck in the simple update is that since it is an iterative process in which the local PEPS tensors, the hamiltonian gate, and the surrounding lambdas are joined, and split by local SVD. In this process, the order of quantum numbers will change and to have a consistent update and good convergence, one has to call the contiguous() method to reorder the charges. This function then needs to call _find_diagonal_sparse_blocks() and _find_transposed_diagonal_sparse_blocks() functions.

On top of that in CTMRG, one needs to build the reduced tensors by joint a PEPS tensor and its conjugate along the physical leg and then join the corresponding virtual legs to perform a double-layer CTM. Apparently, the process of building reduced tensors is also a bottleneck for the CTMRG since the code again has to call _find_diagonal_sparse_blocks() and _find_transposed_diagonal_sparse_blocks() functions and find the correct charges for the legs that are joined.

I tried to run the code for different bond dimensions for up to D=8. It seems while ncon() can be faster (compared to numpy backend) for the contraction of single layer tensors, it is much slower for contracting tensors that their individual legs have been constructed from joining other legs, e.x., reduced tensors of CTMRG.

I have another implementation of fermionic iPEPS based on https://github.com/mhauru/abeliantensors which has a block-wise implementation of symmetric tensors and your symmetric backend is slower than this implementation for joining and splitting of tensor legs.
However I liked your code much better especially given the fact that it allows mixed symmetries such as U1xZ2, etc,...

@mganahl
Copy link
Collaborator

mganahl commented Oct 6, 2021 via email

@saeedjahromi
Copy link
Author

Sure I will prepare a script to benchmark both codes against each other say for CTMRG algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants