Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance impact #38

Open
corbett5 opened this issue Apr 29, 2023 · 2 comments
Open

Performance impact #38

corbett5 opened this issue Apr 29, 2023 · 2 comments

Comments

@corbett5
Copy link

I just came across this repository after looking for a solution to this exact problem, very excited! However when I was trying it out every line now goes through get_ipython().run_cell_magic('space', ...). This adds a few layers to every exception backtrace. Not a huge deal. However, this also comes with a pretty severe performance hit. See the following example

%%time
# %%space <first>

result = 0
for i in range(10000000):
    string = f'{10 * i / (3 * i + 7)}{87 * i * i / 258}'
    result += ord(string[i % len(string)])

print(result)

Without the %%space <first> the results are

CPU times: user 19.9 s, sys: 296 ms, total: 20.2 s
Wall time: 22.5 s

With %%space <first> the results are

CPU times: user 51.9 s, sys: 546 ms, total: 52.4 s
Wall time: 55.4 s
@davidesarra
Copy link
Owner

Hi Ben! Thanks for opening this issue 👍

The performance hit comes from the capability of the code in the space to access not only their own private namespace but also the user namespace, i.e. the notebook namespace. Usually the performance hit isn't noticeable. However, in snippets like the example above it becomes very noticeable as it performs tens of millions of lookups.

At the moment I see three options:

  1. Speed up the current approach

    At the cost of being less Pythonic, we could speed up the execution by about 20-25% in the case of the example above.

  2. Isolated Space

    We could consider introducing in addition to the existing nested space type an isolated space type that doesn't have access to the user namespace. While it may be a bit fiddly to extend the current API, isolated spaces would have nearly the same performance as code outside spaces.

  3. Leave it as it is

    Depending on the workaround available and the rarity of the affected use cases, it may be appropriate to leave it as it. I appreciate that the example provided above may contrived as it is generally avoided having Python loops with very many iterations (in favour of using libraries that move these loops outside Python).

I'd love to hear your opinion especially on 2) and 3). Namely:

  • When you have a need like in the example above do you need access to the user namespace?
  • What are the typical use cases where the performance penalty of using a space becomes noticeable?
  • How often do these use cases come up?
  • What's your current workaround? Not isolating the code in a space?

Thank you in advance for your feedback! 😄

@corbett5
Copy link
Author

Sorry for the long delay.

I think that leaving be is fine. For example if you put the code into a function and then define the function in a cell without a space magic then you get no performance hit. For my use cases, this is pretty much how I would use this magic anyways, define a bunch of global functions, then call them from some namespaces.

That said, I think the documentation should make note of this somewhere, both in terms of the performance hit and the added frames to backtraces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants