Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Methods] Undesired conversion to a dense matrix #261

Open
nulam opened this issue Jan 29, 2024 · 3 comments
Open

[Methods] Undesired conversion to a dense matrix #261

nulam opened this issue Jan 29, 2024 · 3 comments
Labels

Comments

@nulam
Copy link

nulam commented Jan 29, 2024

What happened + What you expected to happen

Current implementation of BottomUpSparse reconciliation method causes an undesired conversion to a dense matrix, potentially leading to an OOM on large datasets. The lack of speedup was already observed before as noted in makoren's comment in the class definition.

The culprit is the idx_bottom indexed assignment:

P[idx_bottom] = S[idx_bottom]

Locally I managed to fix it by replacing this line with the following

for idx in idx_bottom:
    P[idx] = S.getrow(idx)

However, the performance is quite suboptimal, but it's sufficient for my use case, hence I would rather raise this issue than create a PR :)

Versions / Dependencies

hierarchicalforecast==0.4.1

Reproduction script

using an S matrix with n>200k leads to OOM with BottomUpSparse reconciliation

Issue Severity

Medium: It is a significant difficulty but I can work around it.

@nulam nulam added the bug label Jan 29, 2024
@nulam nulam changed the title [BottomUpSparse] Undesired conversion to a dense matrix [Methods] Undesired conversion to a dense matrix Jan 29, 2024
@nulam
Copy link
Author

nulam commented Feb 6, 2024

Eventually I managed to achieve a respectable performance through

    def _get_PW_matrices(self, S, idx_bottom):
        n_hiers, n_bottom = S.shape
        P = sparse.lil_matrix(S.shape, dtype=np.float32)
        rows = np.array(idx_bottom)
        cols = np.arange(n_bottom)
        data = np.ones_like(rows, dtype=np.float32)
        P = sparse.csr_matrix((data, (rows, cols)), shape=(n_hiers, n_bottom)).T
        W = sparse.identity(n_hiers, dtype=np.float32)
        return P, W

However I noticed a note about possibly deprecating the idx_bottom argument in the future, should I open a PR?

@mcsqr
Copy link
Contributor

mcsqr commented Feb 20, 2024

@nulam Thanks for using this! Why don't you open a PR, I haven't worked on this recently but I'll have a look (I now got my hands on 200k dataset as well), and then hopefully someone from nixtla finds the time to check if they have any problem with idx_bottom or anything else.

@AzulGarza
Copy link
Member

hey @nulam and @mcsqr! currently, the efforts to remove the idx_bottom are paused. would you like to open a PR with your improvements, @nulam?❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants