Skip to content
This repository has been archived by the owner on May 27, 2020. It is now read-only.

Sorting performance is down after upgrade to 3.11.1 version. #392

Open
PhiVanTran opened this issue Jun 22, 2018 · 0 comments
Open

Sorting performance is down after upgrade to 3.11.1 version. #392

PhiVanTran opened this issue Jun 22, 2018 · 0 comments

Comments

@PhiVanTran
Copy link

Hi all,
My team use Apache cassandra and cassandra-lucene-indexing for production.
Before, we use Apache Cassandra lucene 3.7.0 and cassandra-lucene-indexing 3.7.0. It was good, no problems performance.
After, we upgrade Cassandra to 3.11.2 and use cassandra-lucene-indexing 3.11.1. As you mention, both are compatible.

Issue happen at here. Sorting on Cassandra DB is no good. It is slow about 10x times.

Environment:

- Nodes: 5

- 8 cores CPU, 32 GB RAM

- Vnode, 256 tokens.

- Replicate factor: 2

- Total rows for indexing: 2M.

CREATE TABLE event (
    pk1 bigint,
    pk2 text,
    pk3_clustering timestamp,
    col1 text,
    col2 text,
    col3 text,
    lucene text,
    PRIMARY KEY ((pk1, pk2), pk3_clustering)
) WITH CLUSTERING ORDER BY (pk3_clustering DESC)
    AND bloom_filter_fp_chance = 0.1
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy', 'sstable_size_in_mb': '160', 'tombstone_compaction_interval': '86400', 'tombstone_threshold': '0.1'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 3110400
    AND gc_grace_seconds = 14400
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';
CREATE CUSTOM INDEX event_index ON event (lucene) USING 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = {'refresh_seconds': '60', 'indexing_threads': '0', 'schema': '{
    fields: {
     pk3_clustering: {type: "date", pattern: "yyyy/MM/dd HH:mm:ssZ"},
     pk2: {type: "string", case_sensitive: false},
     pk1: {type: "bigint"},
     col1: {type: "string", case_sensitive: false},
     col2: {type: "string", case_sensitive: false},
     col3: {type: "string", case_sensitive: false}
    }

Performance:

CQL query

  SELECT pk1 FROM event WHERE expr(event_index, '{"sort":{"fields":[{"type":"simple","field":"pk3_clustering","reverse":true}]}}') limit 100;

Cassandra version: 3.7.0, Cassandra-lucene-indexing: 3.7.0

Processing time: 300 ms

Cassandra version: 3.11.2, Cassandra-lucene-indexing: 3.11.1

Processing time: 3366 ms

Tracing

I see a abnormal trace logs on the node run CQL query:

 Sending REQUEST_RESPONSE message to /172.31.26.147 [MessagingService-Outgoing-/172.31.26.147-Small] | 2018-06-22 04:03:44.243000 | 172.31.26.153 |          39280 | 127.0.0.1
Lucene post-process 14813 collected rows to 100 rows [Native-Transport-Requests-1] | 2018-06-22 04:03:47.564000 | 172.31.26.147 |        3366212 | 127.0.0.1
Request complete | 2018-06-22 04:03:47.565067 | 172.31.26.147 |        3367067 | 127.0.0.1

Is it root cause ? Because I don't see the log on old version (3.7.0).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant