-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move to async network requests #349
Comments
Allowing async requests would be a great improvement; thanks for bringing this up!
From the top of my mind (I may be overlooking stuff):
Please let us know what you think about the above considerations. If you need a short-term workaround for your particular problem, you may consider using "plain" multithreading, see e.g. https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor. |
Thanks Jonatan - all good questions, and I don't have an answer yet as I'm new to this part of Python. But I'm willing to give it a try and do some research to see how other libraries have done similar things. You're right, for now I can work around the issue as you suggested but it would be nicer to make my code cleaner the future 😄 |
It would be great if you have the time and resources to research this, as it is not entirely trivial. I'm happy to help if you have questions about the stravalib code base. |
Hi folks -- I'll offer my $0.02 here, since I've spent a fair bit of time with both blocking (i.e. @futureshape , you're right that the requests library is a blocking library. The way to achieve higher-throughput / responsiveness would be to use multiple threads. You're also right, though, that for an application like this an asyncio framework would make a lot of sense. I do not know, though, whether an asyncio framework will actually be faster than a well-designed multi-threaded application; however, the concurrency syntax provided by asyncio would make it easier to implement this. (But there are also some caveats and it's easy for people who don't understand asyncio to essentially make everything single-threaded.) The way this would typically work is that you would use class Client:
async def get_activity(self, id: str) -> Activity:
...
activity = await get_activity(id) # resolves Activity before proceding
# Or, gather activities concurrently:
coro1 = get_activity(id1)
coro2 = get_activity(id2)
activity1, activity2 = await asyncio.gather(coro1, coro2) The challenges here are:
So, I don't think it's a bad idea, but it's not a simple choice -- and last I checked there weren't great options for libraries that wanted to support both paradigms. I do like asyncio and much of my work currently uses async frameworks, so that fits well. And I'd say in the last few years the level of support in python has reached a threshold where it's becoming a good default choice for many frameworks like this. |
If we change to |
I guess so. I wouldn't want to force users to deal with the added complexity of async requests, they should be able to choose. |
if we did try to support both, to hans' point above, the implementation
(our api) would potentially become more complex? As would testing.
Unless we wrapped some functionality around a plain multithread solution as
jonatan suggested above. Is that an option too or is that approach inferior
?
|
Yeah, I haven't seen an elegant way to support both sync and async. (I've seen libraries that support multiple asyncio frameworks, but even that can get fairly complex.) Probably the path to supporting both means two different client classes with different underpinnings. Unfortunately if models are set up to support lazily loading things over the request, they also would need to be specific. That sounds tangled. (This, incidentally, is similar to why SqlAlchemy has such challenges w/ asyncio, despite the db drivers supporting asyncio for awhile -- the lazy loading of attrs means that the entire ORM framework needs to pick to be either sync or async and that's a tall order.) Obviously this is not my project anymore (I'm eternally grateful to you all for breathing new life into this and for taking this on!) , but personally, I don't think there's any clear mandate to support asyncio: I think it's easy enough to fix any performance issues by using multiple threads. And folks that want to use in context of asyncio have options like |
I wasn't entirely clear, but my suggestion was to use multithreading on the user application side, not as part of stravalib. E.g., when you have a bunch of activity ids, and you want to fetch their streams as fast as possible (within your rate limits), you could use something like with ThreadPoolExecutor(max_workers=10) as executor:
streams = executor.map(client.get_activity_streams, ids) which would prevent each call of |
Currently network requests in https://github.com/stravalib/stravalib/blob/master/stravalib/protocol.py are done using the
requests
library whihch blocks while data is being downloaded.This isn't a problem when the libray is being used to bulk-download data but is more of an issue when I've attempted to use stravalib to build interactive applications and want to keep the rest of the UI responsive (e.g. showing a progress display, having a way to cancel a long request and move somewhere else).
It looks like there's a library that supports async requests and is pretty much a drop-in replacement for
requests
https://www.python-httpx.org/compatibility/I'm happy to give it a go in a branch and make a PR, but I'm opening this first as requested in the contribution guideline, and to get feedback from the rest of the maintainers. Does anyone think it's a bad idea to make this library use async network requests? Anything that could trip us up while trying to do it?
The text was updated successfully, but these errors were encountered: