Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using LeafReader only for first segment #171

Open
SOLR4189 opened this issue Jun 18, 2018 · 4 comments
Open

Using LeafReader only for first segment #171

SOLR4189 opened this issue Jun 18, 2018 · 4 comments

Comments

@SOLR4189
Copy link

SOLR4189 commented Jun 18, 2018

Hi,
I try to use LUWAK like UpdateProcessor in SOLR. I noticed that LUWAK matches X first documents only (for example, from DocumentBatch with 3000 docs it matched 163 docs only). I debugged LUWAK code, and noticed that the problem in the next code:

private static class MultiDocumentBatch extends DocumentBatch {
.     .     .
  private LeafReader build(IndexWriter writer) throws IOException {
.     .     .
     writer.forceMerge(1);
     LeafReader reader = DirectoryReader.open(directory).leaves().get(0).reader();

I changed this code to:
LeafReader reader = SlowCompositeReaderWrapper.wrap(DirectoryReader.open(directory))
and it works for all docs in batch (but very slowly)

Was someone faced with this problem? Maybe do you have another solution for this problem?

@mjustice3
Copy link
Contributor

I don't use MultiDocumentBatch but did you try adding a writer.commit() after writer.forceMerge(1)?

@SOLR4189
Copy link
Author

You are right. Now it works. So it is bug in LUWAK-1.5 that I use.

@romseygeek
Copy link
Collaborator

This is interesting, because as I understand it, forceMerge should only return after the new segments are committed. I also can't reproduce this in a test - creating a batch of 10000 identical docs and then running a query over them returns all 10000 in the batched result. Do you think you could post a reproducible test case so that I can work out what's going on?

@SOLR4189
Copy link
Author

I can't publish my code, but I'll try to explain how I use LUWAK. I wrapped LUWAK into SOLR UpdateProcessor:

  • In function processAdd (this function will be called for each document in bulk) each SolrInputDocument I convert to LuceneDocuments and add the result to list of LuceneDocuments.

  • In function finish (this function will be called one time in the end of bulk) I build LUWAK DocumentBatch from the list of LuceneDocuments and pass it to match function of LUWAK monitor and results of function I write to file in format <docId,queryId>

So when does the bug happen? When I build LUWAK DocumentBatch. I debugged this code and saw that DirectoryReader.open(directory).leaves().get(0).reader() got only 163 documents from batch and another docs of batch were in DirectoryReader.open(directory).leaves().get(1).reader(), i.e. writer.force(1) didn't merge segments.

I don't know why you can't reproduce this in a test - maybe the issue is size of document? I have 3000 docs in batch, 5-15 KB each doc.

P. S. The solution of mjustice3 is good for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants