Skip to content
This repository has been archived by the owner on May 27, 2020. It is now read-only.

Support paging in SCLI #327

Open
zengzh opened this issue Jun 13, 2017 · 14 comments
Open

Support paging in SCLI #327

zengzh opened this issue Jun 13, 2017 · 14 comments

Comments

@zengzh
Copy link

zengzh commented Jun 13, 2017

Hi @ealonsodb

ElasticSearch accepts “from” and “size” parameters so that users can retrieve certain number of results starting from a particular position. https://www.elastic.co/guide/en/elasticsearch/guide/current/pagination.html

Does SCLI have this feature? For example, can I issue a query as follows:

SELECT * FROM tweets WHERE expr(tweets_index, '{
query: {type: "match", field: "body", value: "FIFA"},
**limit:{offset:"100", pagesize:"100"}**
}');

Which retrieves the tweets about FIFA that are returned in 100 tweets/page and skip the first 100 tweets?
If not, does stratio folks have plan to support this? Thanks.

@ealonsodb
Copy link
Contributor

Hi @zengzh:

As stated in doc SCLI supports CQL paging.

In your use case, the match query acts as a 'boolean' relevance (it matches or not) query. It does not make sense to sort them by relevance. Maybe searching documentation should help you to understand this.

Hope this helps

@zengzh
Copy link
Author

zengzh commented Jun 13, 2017

Thanks @ealonsodb for quick reply.

Sorry for the inappropriate example. Maybe a better one is the following:

SELECT * FROM tweets WHERE expr(tweets_index, '{
query: {type: "phrase", field: "body", value: "big data gives organizations"},
**limit:{offset:"20", pagesize:"80"}**
}');

According to CQL paging, paging on displays query results in 100-line chunks followed by the more prompt. This functionality is limited in 2 aspects:

  1. The page size is fixed (pagesize: "80" in the example)
  2. No way to specify the number of results for skip (offset: "20" in the example)

Any ways to break the above limitations?

@ealonsodb
Copy link
Contributor

ealonsodb commented Jun 13, 2017

Execute PAGING 50 in cqlsh and see what happens!!
Indeed the 100 page size is a cqlsh.py variable you can change

The query you are executing is a relevance query, so results from different cassandra nodes must be sorted in coordinator node.
What i mean, even providing an offset, there is no way to know the starting point in each node data subset(so, it is compulsory to execute that first page query and discard those results).

Paging functionality is covered by CQL paging and you can very easily skip whatever results you want in client.

Hope this helps

@zengzh
Copy link
Author

zengzh commented Jun 15, 2017

Thanks @ealonsodb
It surprises me that the official document does not mention page size can be customized.

Cassandra supports paging but does not encourage offset queries .

I understand that even providing an offset in SCLI, it still needs to compute the first page and discard those results (keys). But, this avoids to retrieve the whole set of tuples from Cassandra and discard them. To this point of view, computing and discarding results from SCLI instead of computing/discarding tuples from Cassandra is helpful, right?

@ealonsodb ealonsodb reopened this Jul 3, 2017
@ealonsodb
Copy link
Contributor

Hi @zengzh:
You are totally right. Thank you for changing our mind about this feature.
We have coded in #342
Could you please take a look?

@zengzh
Copy link
Author

zengzh commented Jul 6, 2017

Thanks @ealonsodb.
I see that you mentioned skip "is not compatible with paging or top-K queries". Can you explain why is that? Did you add any validation check? If so, what it is?

@ealonsodb
Copy link
Contributor

Hi @zengzh:
The main problem with paging and topK queries is that cassandra resolve inconsistencies between different nodes data in coordinator after any 2i related functions. If the 2i skips some rows, deterministic behaviour(to see the same results in the same order in different executions of the same query) may be lost.

Hope this helps

@zengzh
Copy link
Author

zengzh commented Jul 19, 2017

Hi @ealonsodb
Sorry that I do not fully understand. What are 2i related functions? Can you give an example of paging or top-k queries that return non-deterministic results because of skip? If I specify the sorting field, will the results still be non-deterministic?
Thanks very much!

@ealonsodb
Copy link
Contributor

Hi @zengzh:

  • 2i is the acronym for secondary index in cassandra. This is the unique contact point between cassandra and our product. Our implementation of Index interface. We are lock to that cassandra Index implementation.

When querying our product you can use query or filter.

  • When you use query you are asking for the most fitted rows that match your query.
  • If you use filter it is just give me any that match the query, not sorted
  • Filter plus sort acts exactly the same as a query.

There is plenty of information at internet searching by: "lucene query versus filter".

The main problem is that data consistency in executed after 2i related sorting postProcess.
The second problem is that this case is strange and does not happen in stable cluster.
What i mean here is that skip would works well if every node is up and data consistency between nodes is correct but will start to fail if there are some data inconsistencies.

Hope this helps

@zengzh
Copy link
Author

zengzh commented Jul 20, 2017

Thanks @ealonsodb
So better resolve inconsistencies before using skip.

@ealonsodb
Copy link
Contributor

Hi @zengzh
Woaw. i have never thinked about it in that way. Give some time to test it deeply and maybe, with a big experimental warning about it I will merge.

Thank you for change my mind

@zengzh
Copy link
Author

zengzh commented Aug 2, 2017

Thanks @ealonsodb
May I know when this skip feature will be merged into the release version?

@zengzh
Copy link
Author

zengzh commented Aug 31, 2017

Hi @ealonsodb @adelapena ,

It has been a while since this feature had been developed but remained unreleased. May I know the latest status and when it will be available?

Look forward to your reply. Many thanks.

@feruud-sr
Copy link

Hi @ealonsodb @adelapena, do you have plans to merge this feature soon? we are excited and impatient about this, Thank you a lot! 😬

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants