Skip to content
This repository has been archived by the owner on May 27, 2020. It is now read-only.

Very slow process of compaction after index setup #390

Open
karpa13a opened this issue May 10, 2018 · 5 comments
Open

Very slow process of compaction after index setup #390

karpa13a opened this issue May 10, 2018 · 5 comments

Comments

@karpa13a
Copy link

karpa13a commented May 10, 2018

Good day
C* is 3.11; plugin according version. ubuntu 16.04, java 1.8 latest version
one DC, 3 nodes, keyspace with rf=3
at EC2 with 2 CPU and 4Gb memory each.

cluster works well, data inserted by batches each 15 mins, no problems with compactions and performance, datasize around 15M rows
but im facing with strange behavior after creating lucene index:
ive created index

CREATE CUSTOM INDEX gsm_index ON gsm ()
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
   'refresh_seconds': '1',
   'schema': '{
      fields: {
         sid: {type: "string"},
         timestamp: {type: "date", pattern: "yyyy/MM/dd"},
         place: {type: "geo_point", latitude: "latitude", longitude: "longitude"}
      }
   }',
   'indexing_threads': '4'
};

index created and works well
on next day i see LA more than 3 (on each node), with queue of 8 compactions.
i was dropped index and all compactions where done in 15 mins.
ive recreated index and got same result on next day.
table simple as follows:

CREATE TABLE gsm (
   sid text,
   timestamp timestamp,
   latitude double,
   longitude double,
   /other columns defenitions/,
   PRIMARY KEY (sid, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC)

do i need update EC2 instance with more power? or i hit a bug?

@FourSeventy
Copy link

What type of disks are you using? I alleviated similar compaction problems by switching to solid state drives.

@karpa13a
Copy link
Author

@FourSeventy unfortunately but it's not an IO bottleneck(
CPU bound tasks(

@karpa13a
Copy link
Author

unfortunately
updating node from t2.medium(2 cpu) to t2.xlarge(4 cpu) didnt help.
it just eat 350% of CPU.

this makes lucene indexes totally unusable(

may be i can do some kind of debug?

btw it's ok, that MemtableFlushWriter spams log file in around 2 mins? when there is no reads/updates

INFO  [MemtableFlushWriter:372] 2018-05-18 07:24:56,673 Index.scala:127 - Flushing Lucene index  /gsm_index/
INFO  [MemtableFlushWriter:373] 2018-05-18 07:26:00,154 Index.scala:127 - Flushing Lucene index /gsm_index/
INFO  [MemtableFlushWriter:374] 2018-05-18 07:27:57,105 Index.scala:127 - Flushing Lucene index /gsm_index/
INFO  [MemtableFlushWriter:375] 2018-05-18 07:29:52,975 Index.scala:127 - Flushing Lucene index /gsm_index/

@karpa13a
Copy link
Author

okay
i created index without "place: {type: "geo_point", latitude: "latitude", longitude: "longitude"}" part
and now compactions didnt stuck.

what was wrong with geo_point?
currently index saved once in 3 hours:
INFO [MemtableFlushWriter:508] 2018-05-20 12:00:02,154 Index.scala:127 - Flushing Lucene index ...
INFO [MemtableFlushWriter:515] 2018-05-20 15:00:02,968 Index.scala:127 - Flushing Lucene index ...

@nirmalsinghkps
Copy link

So what’s the Cassandra version and what’s the plugin version did we use to avoid compatibility issues? Any suggestions

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants