Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update venv scripts #1074

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from
Draft

Conversation

thehesiod
Copy link
Collaborator

@thehesiod thehesiod commented Jan 17, 2024

ensures compiled requirements file always match sources used to generate it

@thehesiod
Copy link
Collaborator Author

@jakob-keller thoughts? this uses another aio-libs module which seems to do what we want

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should make local debugging less painful

@thehesiod
Copy link
Collaborator Author

for some reason I can't add you as a reviewer

@jakob-keller

This comment was marked as resolved.

@jakob-keller
Copy link
Collaborator

@jakob-keller thoughts? this uses another aio-libs module which seems to do what we want

Good idea. I did not know about that library.

Do we understand the performance implications? AFAIK, S3 Express is all about lower latencies. It would be a shame, if that cache added more overhead than necessary. A benchmark would be great.

As a user, I would be slightly concerned about adding another dependency: Fewer is better, IMO. In particular, if there is another fairly straightforward solution available as #1073.

@jakob-keller
Copy link
Collaborator

Do we understand the performance implications? AFAIK, S3 Express is all about lower latencies. It would be a shame, if that cache added more overhead than necessary. A benchmark would be great.

Here's a simple benchmark.py:

import asyncio
import functools

import async_lru


async def main(acache, n: int = 1000) -> None:
    """Async main coroutine."""
    await asyncio.gather(*(acache("test") for _ in range(n)))


async def noop(_arg: str) -> None:
    """Trivial noop cooroutine."""


@functools.lru_cache(maxsize=100)
def _cache_1073(_arg: str) -> asyncio.Task[None]:
    return asyncio.create_task(noop(_arg))


async def cache_1073(_arg: str) -> None:
    """functools.lru_cache wraps asyncio.Tasks - #1073."""
    return await _cache_1073(_arg)


@async_lru.alru_cache(maxsize=100)
async def cache_1074(_arg: str) -> None:
    """async-lru based - #1074."""
    return await noop(_arg)

Run with

python -m timeit -s "import asyncio; from benchmark import main, cache_1073" "asyncio.run(main(cache_1073))"

vs.

python -m timeit -s "import asyncio; from benchmark import main, cache_1074" "asyncio.run(main(cache_1074))"

Under CPython 3.11.7 on macOS, #1073 turns out to be slightly faster than #1074: ~10%. That is not significant IMO and should not determine the decision.

@thehesiod
Copy link
Collaborator Author

@jakob-keller I don't think #1073 is correct, it's avoiding a needed lock to ensure each lru entry is only awaited once

@thehesiod
Copy link
Collaborator Author

IMO any extra effort should be put into speeding up/correcting https://github.com/aio-libs/async-lru. Looking at it I think there are many potential options. external deps aren't great indeed, but at least it's from the same ecosystem (aiolibs)

@thehesiod
Copy link
Collaborator Author

If I get some time I'll look into that module, perhaps we can inline something simpler too

@jakob-keller
Copy link
Collaborator

@jakob-keller I don't think #1073 is correct, it's avoiding a needed lock to ensure each lru entry is only awaited once

I'm still convinced that both approaches are valid fixes for #1072: see #1073 (comment)

Copy link
Collaborator

@jakob-keller jakob-keller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR and all its contents. I have added a few comments.

IMO, the "piping" related changes should be in another PR. Would you mind splitting the PR up?

aiobotocore/utils.py Outdated Show resolved Hide resolved
install-requires.txt Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
@jakob-keller
Copy link
Collaborator

for some reason I can't add you as a reviewer

I guess that's because I'm not a collaborator.

Now I am. Thank you! 🎉

@thehesiod thehesiod marked this pull request as draft January 17, 2024 22:18
@thehesiod thehesiod changed the title fix for lru cache update venv scripts Jan 17, 2024
@thehesiod
Copy link
Collaborator Author

thanks for all the investigative work and instructiveness!

@thehesiod thehesiod marked this pull request as ready for review January 17, 2024 22:29
@thehesiod
Copy link
Collaborator Author

@jakob-keller ok re-engineered based on discussions

Copy link

codecov bot commented Jan 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (110b0a2) 86.13% compared to head (5749e2c) 86.13%.
Report is 1 commits behind head on master.

❗ Current head 5749e2c differs from pull request most recent head e5f5608. Consider uploading reports for the commit e5f5608 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1074   +/-   ##
=======================================
  Coverage   86.13%   86.13%           
=======================================
  Files          60       60           
  Lines        5863     5863           
=======================================
  Hits         5050     5050           
  Misses        813      813           
Flag Coverage Δ
unittests 86.13% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

envsubst < requirements-dev.in > ${scratch_file}

# also hash setup.py as it has the install_requires and extras
SHA_SUM=$(sha1sum ${scratch_file} setup.py | head -c 40)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sha1sum produces two lines with two items each, which are piped to head which takes the first 40 bytes only. Looking at CI runs, this appears to be fragile and it completely ignores the hash of setup.py.

May I propose this alternative?

cat ${scratch_file} setup.py | sha1sum | awk '{print $1}'

  1. Concatenate both inputs
  2. Pipe into sha1sum
  3. Extract first word (the 40 character hash)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦🏼‍♂️ thanks, that is a very odd default behavior of sha1sum. I'll re-work

@thehesiod thehesiod marked this pull request as draft January 27, 2024 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants