Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client should *not* un-paginate large results. Should return a generator that does this for you. #433

Open
1 task done
tnixon opened this issue Apr 4, 2024 · 1 comment

Comments

@tnixon
Copy link

tnixon commented Apr 4, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Is your feature request related to a problem? Please describe.

When fetching historical data, even simple queries (fetching all trades / quotes for a single symbol on a single day) can have very large result sets which are paginated by the API. The data client attempts to un-paginate these and load them all into a single return structure. This is very slow (probably the main cause for #204). It also means that the client consumer has no choice but to allow this process to run (single-threaded) until it completes, or potentially fails with an OOM or similar.

Describe the solution you'd like.

The client should return a data structure that gives easy access to the paginated results, without actually loading them. The consumer can then decide how to access these results - possibly by looping through them in a single-threaded manner, but potentially also by parallelizing this data-loading to make it more efficient. It would also give the consumer the option of serializing each page of results and so avoid the OOM issue of building very large data structures in memory.

A Python generator seems a natural way to provide this functionality. The client can return an object that contains a generator which will (when accessed) fetch the appropriate pages of data from the API. This might look something like:

client = StockHistoricalDataClient(...)

trades_request = StockTradesRequest(symbol_or_symbols='NVDA')
trades_resultset = client.get_stock_trades_resultset(trades_request)

for(page_data in trades_resultset):
    # do something with the data (summarize it, serialize it, etc.)

note here I'm assuming that trades_resultset is a generator

Describe an alternate solution.

Another way to address this is to provide a client method for fetching an individual page of results, something like:

client = StockHistoricalDataClient(...)

trades_request = StockTradesRequest(symbol_or_symbols='NVDA')
trades_resultset = client.get_stock_trades_resultset(trades_request)

for(page in trades_resultset.pages):
    page_data = client.get_stock_trades_data(trades_request, page)
    # do something with the data (summarize it, serialize it, etc.)

note in this example I'm assuming that trades_resultset is an object that contains a reference to an iterator over page symbols.

Anything else? (Additional Context)

Give the user the option of how to handle fetching large data. Don't force them to wait on a single-threaded and potentially failure-bound process.

@tnixon
Copy link
Author

tnixon commented Apr 4, 2024

PS - I am willing to prepare a PR on this (as soon as I can carve out some time).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant