Skip to content
This repository has been archived by the owner on May 27, 2020. It is now read-only.

Indexing is taking too long for a 2 GB data ? anything can be done?? #391

Open
nirmalsinghkps opened this issue May 26, 2018 · 1 comment

Comments

@nirmalsinghkps
Copy link

For a 2 GB data with 3 columns trying to index , its keep on running at back ground been more than 6 hours Still I dont see entry at system."IndexInfo" , quite confused on whats happening at back ground and is this plugin a right candidate for heavy tables with huge data.

1. How to know the progress of index creation ?
2. How frequent this index will be updated , after its FIRST indexing ?
3. Is this plugin an ideal candidate to index when a table has more than 250 Gb of data

@phambryan
Copy link

phambryan commented May 31, 2018

  1. You can watch the progress by modifying trace statements to INFO. Recompile the plugin.
    https://github.com/Stratio/cassandra-lucene-index/blob/branch-3.0.14/plugin/src/main/scala/com/stratio/cassandra/lucene/IndexWriter.scala

  2. This is depending on your settings. But default refresh is triggered every 60s scanning for updates.

  3. This is a partitioning. Keep your partition size no larger than 10G (C* 3.11) with strong CPU/NVME storage. 250G is a lot of data if you're doing 128k columns that's still 2M rows; so only index what you need, and use filter to narrow data set.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants