Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lake] Parallelize fetching data across >>1 coins #932

Open
1 task
trentmc opened this issue Apr 22, 2024 · 3 comments
Open
1 task

[Lake] Parallelize fetching data across >>1 coins #932

trentmc opened this issue Apr 22, 2024 · 3 comments
Labels
Type: Enhancement New feature or request

Comments

@trentmc
Copy link
Member

trentmc commented Apr 22, 2024

Background / motivation

We currently fetch data from Binance one token at at time.

So if there are 10 coins (BTC, ETH, DOT, ..) then it takes 10x longer.

TODOs / DoD

  • Parallelize fetching data, in lake/ohclv_data_factory.py. It's likely a straightforward conversion of a for loop

Related github issues

  • #804 "[Sim, make $] Make benchmarking parallel, via threading"

Note: we could also parallelize grabbing data within a feed. However this isn't as easy, because the algorithm needs to check what's there already. So consider this for later. But also maybe not needed because at some point we'll have historical data repo / bundle.

@trentmc trentmc added the Type: Enhancement New feature or request label Apr 22, 2024
@trentmc
Copy link
Member Author

trentmc commented Apr 22, 2024

cc @calina-c @idiom-bytes

@calina-c
Copy link
Contributor

calina-c commented Apr 23, 2024

I don't recommend doing this now, since the structure is changing for lake either way. It will only result in either difficult conflicts or entirely lost while fixing said conflicts. I agree it is a good thing, but we should wait until the lake/ETL part is done.

@trentmc
Copy link
Member Author

trentmc commented Apr 23, 2024

I don't recommend doing this now, since the structure is changing for lake either way. It will only result in either difficult conflicts or entirely lost while fixing said conflicts. I agree it is a good thing, but we should wait until the lake/ETL part is done.

OK. Makes sense.

@trentmc trentmc changed the title [Lake] Parallelize fetching data across >>1 tokens [Lake] Parallelize fetching data across >>1 coins Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants