Using LeafReader only for first segment #171

SOLR4189 · 2018-06-18T14:41:10Z

Hi,
I try to use LUWAK like UpdateProcessor in SOLR. I noticed that LUWAK matches X first documents only (for example, from DocumentBatch with 3000 docs it matched 163 docs only). I debugged LUWAK code, and noticed that the problem in the next code:

private static class MultiDocumentBatch extends DocumentBatch {
.     .     .
  private LeafReader build(IndexWriter writer) throws IOException {
.     .     .
     writer.forceMerge(1);
     LeafReader reader = DirectoryReader.open(directory).leaves().get(0).reader();

I changed this code to:
LeafReader reader = SlowCompositeReaderWrapper.wrap(DirectoryReader.open(directory))
and it works for all docs in batch (but very slowly)

Was someone faced with this problem? Maybe do you have another solution for this problem?

The text was updated successfully, but these errors were encountered:

mjustice3 · 2018-06-18T21:08:57Z

I don't use MultiDocumentBatch but did you try adding a writer.commit() after writer.forceMerge(1)?

SOLR4189 · 2018-06-21T04:14:56Z

You are right. Now it works. So it is bug in LUWAK-1.5 that I use.

romseygeek · 2018-08-11T07:46:26Z

This is interesting, because as I understand it, forceMerge should only return after the new segments are committed. I also can't reproduce this in a test - creating a batch of 10000 identical docs and then running a query over them returns all 10000 in the batched result. Do you think you could post a reproducible test case so that I can work out what's going on?

SOLR4189 · 2018-08-17T07:32:25Z

I can't publish my code, but I'll try to explain how I use LUWAK. I wrapped LUWAK into SOLR UpdateProcessor:

In function processAdd (this function will be called for each document in bulk) each SolrInputDocument I convert to LuceneDocuments and add the result to list of LuceneDocuments.
In function finish (this function will be called one time in the end of bulk) I build LUWAK DocumentBatch from the list of LuceneDocuments and pass it to match function of LUWAK monitor and results of function I write to file in format <docId,queryId>

So when does the bug happen? When I build LUWAK DocumentBatch. I debugged this code and saw that DirectoryReader.open(directory).leaves().get(0).reader() got only 163 documents from batch and another docs of batch were in DirectoryReader.open(directory).leaves().get(1).reader(), i.e. writer.force(1) didn't merge segments.

I don't know why you can't reproduce this in a test - maybe the issue is size of document? I have 3000 docs in batch, 5-15 KB each doc.

P. S. The solution of mjustice3 is good for me.

SOLR4189 mentioned this issue Jun 24, 2018

Three possible bugs in DocumentBatch #173

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using LeafReader only for first segment #171

Using LeafReader only for first segment #171

SOLR4189 commented Jun 18, 2018 •

edited

mjustice3 commented Jun 18, 2018

SOLR4189 commented Jun 21, 2018

romseygeek commented Aug 11, 2018

SOLR4189 commented Aug 17, 2018

Using LeafReader only for first segment #171

Using LeafReader only for first segment #171

Comments

SOLR4189 commented Jun 18, 2018 • edited

mjustice3 commented Jun 18, 2018

SOLR4189 commented Jun 21, 2018

romseygeek commented Aug 11, 2018

SOLR4189 commented Aug 17, 2018

SOLR4189 commented Jun 18, 2018 •

edited