Support paging in SCLI #327

zengzh · 2017-06-13T02:19:38Z

ElasticSearch accepts “from” and “size” parameters so that users can retrieve certain number of results starting from a particular position. https://www.elastic.co/guide/en/elasticsearch/guide/current/pagination.html

Does SCLI have this feature? For example, can I issue a query as follows:

SELECT * FROM tweets WHERE expr(tweets_index, '{
query: {type: "match", field: "body", value: "FIFA"},
**limit：{offset:"100", pagesize："100"}**
}');

Which retrieves the tweets about FIFA that are returned in 100 tweets/page and skip the first 100 tweets?
If not, does stratio folks have plan to support this? Thanks.

The text was updated successfully, but these errors were encountered:

ealonsodb · 2017-06-13T08:34:29Z

Hi @zengzh:

As stated in doc SCLI supports CQL paging.

In your use case, the match query acts as a 'boolean' relevance (it matches or not) query. It does not make sense to sort them by relevance. Maybe searching documentation should help you to understand this.

Hope this helps

zengzh · 2017-06-13T10:19:06Z

Thanks @ealonsodb for quick reply.

Sorry for the inappropriate example. Maybe a better one is the following:

SELECT * FROM tweets WHERE expr(tweets_index, '{
query: {type: "phrase", field: "body", value: "big data gives organizations"},
**limit：{offset:"20", pagesize："80"}**
}');

According to CQL paging, paging on displays query results in 100-line chunks followed by the more prompt. This functionality is limited in 2 aspects:

The page size is fixed (pagesize: "80" in the example)
No way to specify the number of results for skip (offset: "20" in the example)

Any ways to break the above limitations?

ealonsodb · 2017-06-13T10:57:44Z

Execute PAGING 50 in cqlsh and see what happens!!
Indeed the 100 page size is a cqlsh.py variable you can change

The query you are executing is a relevance query, so results from different cassandra nodes must be sorted in coordinator node.
What i mean, even providing an offset, there is no way to know the starting point in each node data subset(so, it is compulsory to execute that first page query and discard those results).

Paging functionality is covered by CQL paging and you can very easily skip whatever results you want in client.

Hope this helps

zengzh · 2017-06-15T02:36:11Z

Thanks @ealonsodb
It surprises me that the official document does not mention page size can be customized.

Cassandra supports paging but does not encourage offset queries .

I understand that even providing an offset in SCLI, it still needs to compute the first page and discard those results (keys). But, this avoids to retrieve the whole set of tuples from Cassandra and discard them. To this point of view, computing and discarding results from SCLI instead of computing/discarding tuples from Cassandra is helpful, right?

ealonsodb · 2017-07-05T08:40:01Z

Hi @zengzh:
You are totally right. Thank you for changing our mind about this feature.
We have coded in #342
Could you please take a look?

zengzh · 2017-07-06T08:26:22Z

Thanks @ealonsodb.
I see that you mentioned skip "is not compatible with paging or top-K queries". Can you explain why is that? Did you add any validation check? If so, what it is?

ealonsodb · 2017-07-17T06:03:34Z

Hi @zengzh:
The main problem with paging and topK queries is that cassandra resolve inconsistencies between different nodes data in coordinator after any 2i related functions. If the 2i skips some rows, deterministic behaviour(to see the same results in the same order in different executions of the same query) may be lost.

Hope this helps

zengzh · 2017-07-19T06:19:12Z

Hi @ealonsodb：
Sorry that I do not fully understand. What are 2i related functions? Can you give an example of paging or top-k queries that return non-deterministic results because of skip? If I specify the sorting field, will the results still be non-deterministic?
Thanks very much!

ealonsodb · 2017-07-19T12:45:31Z

Hi @zengzh:

2i is the acronym for secondary index in cassandra. This is the unique contact point between cassandra and our product. Our implementation of Index interface. We are lock to that cassandra Index implementation.

When querying our product you can use query or filter.

When you use query you are asking for the most fitted rows that match your query.
If you use filter it is just give me any that match the query, not sorted
Filter plus sort acts exactly the same as a query.

There is plenty of information at internet searching by: "lucene query versus filter".

The main problem is that data consistency in executed after 2i related sorting postProcess.
The second problem is that this case is strange and does not happen in stable cluster.
What i mean here is that skip would works well if every node is up and data consistency between nodes is correct but will start to fail if there are some data inconsistencies.

Hope this helps

zengzh · 2017-07-20T01:08:09Z

Thanks @ealonsodb
So better resolve inconsistencies before using skip.

ealonsodb · 2017-07-20T05:53:05Z

Hi @zengzh
Woaw. i have never thinked about it in that way. Give some time to test it deeply and maybe, with a big experimental warning about it I will merge.

Thank you for change my mind

zengzh · 2017-08-02T08:07:00Z

Thanks @ealonsodb
May I know when this skip feature will be merged into the release version?

zengzh · 2017-08-31T02:41:05Z

Hi @ealonsodb @adelapena ,

It has been a while since this feature had been developed but remained unreleased. May I know the latest status and when it will be available?

Look forward to your reply. Many thanks.

feruud-sr · 2018-01-22T18:30:01Z

Hi @ealonsodb @adelapena, do you have plans to merge this feature soon? we are excited and impatient about this, Thank you a lot! 😬

ealonsodb added the waiting for feedback label Jun 13, 2017

ealonsodb closed this as completed Jun 15, 2017

ealonsodb reopened this Jul 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support paging in SCLI #327

Support paging in SCLI #327

zengzh commented Jun 13, 2017

ealonsodb commented Jun 13, 2017

zengzh commented Jun 13, 2017 •

edited

ealonsodb commented Jun 13, 2017 •

edited

zengzh commented Jun 15, 2017 •

edited

ealonsodb commented Jul 5, 2017

zengzh commented Jul 6, 2017

ealonsodb commented Jul 17, 2017

zengzh commented Jul 19, 2017

ealonsodb commented Jul 19, 2017

zengzh commented Jul 20, 2017

ealonsodb commented Jul 20, 2017

zengzh commented Aug 2, 2017

zengzh commented Aug 31, 2017

feruud-sr commented Jan 22, 2018

Support paging in SCLI #327

Support paging in SCLI #327

Comments

zengzh commented Jun 13, 2017

ealonsodb commented Jun 13, 2017

zengzh commented Jun 13, 2017 • edited

ealonsodb commented Jun 13, 2017 • edited

zengzh commented Jun 15, 2017 • edited

ealonsodb commented Jul 5, 2017

zengzh commented Jul 6, 2017

ealonsodb commented Jul 17, 2017

zengzh commented Jul 19, 2017

ealonsodb commented Jul 19, 2017

zengzh commented Jul 20, 2017

ealonsodb commented Jul 20, 2017

zengzh commented Aug 2, 2017

zengzh commented Aug 31, 2017

feruud-sr commented Jan 22, 2018

zengzh commented Jun 13, 2017 •

edited

ealonsodb commented Jun 13, 2017 •

edited

zengzh commented Jun 15, 2017 •

edited