Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IN optimization and controlling task size during multipartition scan #139

Open
parekuti opened this issue Feb 22, 2017 · 1 comment
Open

Comments

@parekuti
Copy link
Contributor

Currently there is a limit on how may partitions are supported during multipartition scan. If we increase the limit then will degrade the performance. Can we start thinking about how far we can go without degrading performance or cause issues? Also can we have a plan to add more tasks to get more cores during multipartition scan.
For example =>
0-200 or new limit --> default plan
New limit 400 --> same plan.. Some how create more tasks but fewer tasks than 5000
400 --> full table scan default behavior

@velvia
Copy link
Member

velvia commented Feb 24, 2017

Basically multi partition queries always runs on one Spark partition. WE want to enable bigger multi partition queries which can spread to multiple Spark partitions without invoking filtered full table scans. This will require some intelligent logic.

velvia pushed a commit that referenced this issue Oct 1, 2018
…queries using new shardKeyColumns DatasetOption (#139)

* Replace chunk_size DatasetOption with shardKeyColumns; new CLI option to set during dataset creation
* feat(coordinator): Compute shardKeyHash from query filters
* Fix a MatchError found during flushing/ingestion
* Add chunk-length histogram for ChunkSink writes
* feat(cli): Add --everyNSeconds option to repeatedly query for data
* Don't read or write options column for C* datasets table - not needed anymore
* Make sure no negative watermarks are written in all cases
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants