Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc out of order issue from Lucene 8.10.1 #13338

Open
SaiSatwik opened this issue May 2, 2024 · 7 comments
Open

Doc out of order issue from Lucene 8.10.1 #13338

SaiSatwik opened this issue May 2, 2024 · 7 comments
Labels

Comments

@SaiSatwik
Copy link

SaiSatwik commented May 2, 2024

Description

We are seeing docs out of order error multiple times on Opensearch 1.2.3. Seems issue is coming from lucene, but no clue what could be happening under the hood. No much significant spikes seen in disk i/o logs around the time of issue.

2024-04-18T03:47:45,301][WARN ][o.o.i.e.Engine           ] [node_1] [index_1][0] failed engine [merge failed]
org.apache.lucene.index.MergePolicy$MergeException: java.lang.IllegalArgumentException: cannot write negative vLong (got: -8)
        at org.opensearch.index.engine.InternalEngine$EngineMergeScheduler$2.doRun(InternalEngine.java:2719) [opensearch-1.2.3.jar:1.2.3]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:792) [opensearch-1.2.3.jar:1.2.3]
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50) [opensearch-1.2.3.jar:1.2.3]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
        at java.lang.Thread.run(Unknown Source) [?:?]
Caused by: java.lang.IllegalArgumentException: cannot write negative vLong (got: -8)
        at org.apache.lucene.store.DataOutput.writeVLong(DataOutput.java:225) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
        at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$StatsWriter.add(BlockTreeTermsWriter.java:505) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
        at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlock(BlockTreeTermsWriter.java:767) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
        at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlocks(BlockTreeTermsWriter.java:628) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
        at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.finish(BlockTreeTermsWriter.java:976) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
        at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:321) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
        at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
        at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:197) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
        at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:244) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
        at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4757) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4361) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
        at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5920) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
        at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
        at org.opensearch.index.engine.OpenSearchConcurrentMergeScheduler.doMerge(OpenSearchConcurrentMergeScheduler.java:118) ~[opensearch-1.2.3.jar:1.2.3]
        at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
[2024-04-18T03:47:45,311][WARN ][o.o.i.c.IndicesClusterStateService] [node_1] [index_1][0] marking and sending shard failed due to [shard failure, reason [merge failed]]
org.apache.lucene.index.MergePolicy$MergeException: java.lang.IllegalArgumentException: cannot write negative vLong (got: -8)
        at org.opensearch.index.engine.InternalEngine$EngineMergeScheduler$2.doRun(InternalEngine.java:2719) ~[opensearch-1.2.3.jar:1.2.3]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:792) ~[opensearch-1.2.3.jar:1.2.3]
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50) ~[opensearch-1.2.3.jar:1.2.3]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
        at java.lang.Thread.run(Unknown Source) [?:?]

        [2024-04-18T03:47:45,311][WARN ][o.o.i.c.IndicesClusterStateService] [node_1] [index_1][0] marking and sending shard failed due to [shard failure, reason [merge failed]]
        org.apache.lucene.index.MergePolicy$MergeException: java.lang.IllegalArgumentException: cannot write negative vLong (got: -8)
                at org.opensearch.index.engine.InternalEngine$EngineMergeScheduler$2.doRun(InternalEngine.java:2719) ~[opensearch-1.2.3.jar:1.2.3]
                at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:792) ~[opensearch-1.2.3.jar:1.2.3]
                at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50) ~[opensearch-1.2.3.jar:1.2.3]
                at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
                at java.lang.Thread.run(Unknown Source) [?:?]
        Caused by: java.lang.IllegalArgumentException: cannot write negative vLong (got: -8)
                at org.apache.lucene.store.DataOutput.writeVLong(DataOutput.java:225) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$StatsWriter.add(BlockTreeTermsWriter.java:505) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlock(BlockTreeTermsWriter.java:767) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlocks(BlockTreeTermsWriter.java:628) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.finish(BlockTreeTermsWriter.java:976) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:321) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:197) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:244) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4757) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4361) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5920) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                at org.opensearch.index.engine.OpenSearchConcurrentMergeScheduler.doMerge(OpenSearchConcurrentMergeScheduler.java:118) ~[opensearch-1.2.3.jar:1.2.3]
                at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
        [2024-04-18T03:47:45,316][WARN ][o.o.c.r.a.AllocationService] [node_1] failing shard [failed shard, shard [index_1][0], node[IRwTrJMmS--VNRb1epvARw], [P], s[STARTED], a[id=Q7KGk5YTT8ydnBPQwVb86g], message [shard failure, reason [merge failed]], failure [MergeException[java.lang.IllegalArgumentException: cannot write negative vLong (got: -8)]; nested: IllegalArgumentException[cannot write negative vLong (got: -8)]; ], markAsStale [true]]
        org.apache.lucene.index.MergePolicy$MergeException: java.lang.IllegalArgumentException: cannot write negative vLong (got: -8)
                at org.opensearch.index.engine.InternalEngine$EngineMergeScheduler$2.doRun(InternalEngine.java:2719) ~[opensearch-1.2.3.jar:1.2.3]
                at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:792) ~[opensearch-1.2.3.jar:1.2.3]
                at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50) ~[opensearch-1.2.3.jar:1.2.3]
                at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
                at java.lang.Thread.run(Unknown Source) [?:?]
        Caused by: java.lang.IllegalArgumentException: cannot write negative vLong (got: -8)
                at org.apache.lucene.store.DataOutput.writeVLong(DataOutput.java:225) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$StatsWriter.add(BlockTreeTermsWriter.java:505) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlock(BlockTreeTermsWriter.java:767) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlocks(BlockTreeTermsWriter.java:628) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.finish(BlockTreeTermsWriter.java:976) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:321) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:197) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:244) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4757) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4361) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5920) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                at org.opensearch.index.engine.OpenSearchConcurrentMergeScheduler.doMerge(OpenSearchConcurrentMergeScheduler.java:118) ~[opensearch-1.2.3.jar:1.2.3]
                at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
        [2024-04-18T03:47:45,327][INFO ][o.o.c.r.a.AllocationService] [node_1] Cluster health status changed from [GREEN] to [RED] (reason: [shards failed [[index_1][0]]]).
        [2024-04-18T03:47:46,359][INFO ][o.o.c.r.a.AllocationService] [node_1] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[index_1][0]]]).
        [2024-04-19T12:38:04,037][WARN ][o.o.i.e.Engine           ] [node_1] [index_1][0] failed engine [merge failed]
        org.apache.lucene.index.MergePolicy$MergeException: org.apache.lucene.index.CorruptIndexException: docs out of order (26 <= 152 ) (resource=RateLimitedIndexOutput(FSIndexOutput(path="/nonconfig/search/nodes/0/indices/AIZJtOLxQ0SG0oeNd7dfgQ/0/index/_dhx_Lucene84_0.doc")))
                at org.opensearch.index.engine.InternalEngine$EngineMergeScheduler$2.doRun(InternalEngine.java:2719) [opensearch-1.2.3.jar:1.2.3]
                at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:792) [opensearch-1.2.3.jar:1.2.3]
                at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50) [opensearch-1.2.3.jar:1.2.3]
                at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
                at java.lang.Thread.run(Unknown Source) [?:?]
                Caused by: org.apache.lucene.index.CorruptIndexException: docs out of order (26 <= 152 ) (resource=RateLimitedIndexOutput(FSIndexOutput(path="/nonconfig/search/nodes/0/indices/AIZJtOLxQ0SG0oeNd7dfgQ/0/index/_dhx_Lucene84_0.doc")))
                        at org.apache.lucene.codecs.lucene84.Lucene84PostingsWriter.startDoc(Lucene84PostingsWriter.java:231) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                        at org.apache.lucene.codecs.PushPostingsWriterBase.writeTerm(PushPostingsWriterBase.java:146) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                        at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:907) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                        at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:318) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                        at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                        at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:197) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                        at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:244) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                        at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4757) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4361) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                        at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5920) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                        at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                        at org.opensearch.index.engine.OpenSearchConcurrentMergeScheduler.doMerge(OpenSearchConcurrentMergeScheduler.java:118) ~[opensearch-1.2.3.jar:1.2.3]
                        at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                [2024-04-19T12:38:04,048][WARN ][o.o.i.c.IndicesClusterStateService] [node_1] [index_1][0] marking and sending shard failed due to [shard failure, reason [merge failed]]
                org.apache.lucene.index.MergePolicy$MergeException: org.apache.lucene.index.CorruptIndexException: docs out of order (26 <= 152 ) (resource=RateLimitedIndexOutput(FSIndexOutput(path="/nonconfig/search/nodes/0/indices/AIZJtOLxQ0SG0oeNd7dfgQ/0/index/_dhx_Lucene84_0.doc")))
                        at org.opensearch.index.engine.InternalEngine$EngineMergeScheduler$2.doRun(InternalEngine.java:2719) ~[opensearch-1.2.3.jar:1.2.3]
                        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:792) ~[opensearch-1.2.3.jar:1.2.3]
                        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50) ~[opensearch-1.2.3.jar:1.2.3]
                        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
                        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
                        at java.lang.Thread.run(Unknown Source) [?:?]
                Caused by: org.apache.lucene.index.CorruptIndexException: docs out of order (26 <= 152 ) (resource=RateLimitedIndexOutput(FSIndexOutput(path="/nonconfig/search/nodes/0/indices/AIZJtOLxQ0SG0oeNd7dfgQ/0/index/_dhx_Lucene84_0.doc")))
                        at org.apache.lucene.codecs.lucene84.Lucene84PostingsWriter.startDoc(Lucene84PostingsWriter.java:231) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                        at org.apache.lucene.codecs.PushPostingsWriterBase.writeTerm(PushPostingsWriterBase.java:146) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                        at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:907) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                        at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:318) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                        at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                        at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:197) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                        at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:244) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                        at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4757) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4361) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                        at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5920) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                        at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                        at org.opensearch.index.engine.OpenSearchConcurrentMergeScheduler.doMerge(OpenSearchConcurrentMergeScheduler.java:118) ~[opensearch-1.2.3.jar:1.2.3]
                        at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
                [2024-04-19T12:38:04,049][WARN ][o.o.c.r.a.AllocationService] [node_1] failing shard [failed shard, shard [index_1][0], node[IRwTrJMmS--VNRb1epvARw], [P], s[STARTED], a[id=Q7KGk5YTT8ydnBPQwVb86g], message [shard failure, reason [merge failed]], failure [MergeException[org.apache.lucene.index.CorruptIndexException: docs out of order (26 <= 152 ) (resource=RateLimitedIndexOutput(FSIndexOutput(path="/nonconfig/search/nodes/0/indices/AIZJtOLxQ0SG0oeNd7dfgQ/0/index/_dhx_Lucene84_0.doc")))]; nested: CorruptIndexException[docs out of order (26 <= 152 ) (resource=RateLimitedIndexOutput(FSIndexOutput(path="/nonconfig/search/nodes/0/indices/AIZJtOLxQ0SG0oeNd7dfgQ/0/index/_dhx_Lucene84_0.doc")))]; ], markAsStale [true]]
                org.apache.lucene.index.MergePolicy$MergeException: org.apache.lucene.index.CorruptIndexException: docs out of order (26 <= 152 ) (resource=RateLimitedIndexOutput(FSIndexOutput(path="/nonconfig/search/nodes/0/indices/AIZJtOLxQ0SG0oeNd7dfgQ/0/index/_dhx_Lucene84_0.doc")))
                        at org.opensearch.index.engine.InternalEngine$EngineMergeScheduler$2.doRun(InternalEngine.java:2719) ~[opensearch-1.2.3.jar:1.2.3]
                        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:792) ~[opensearch-1.2.3.jar:1.2.3]
                        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50) ~[opensearch-1.2.3.jar:1.2.3]
                        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
                        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
                        at java.lang.Thread.run(Unknown Source) [?:?]

Version and environment details

Java Version:

openjdk 11.0.22 2024-01-16 LTS
OpenJDK Runtime Environment (build 11.0.22+12-LTS)
OpenJDK 64-Bit Server VM (build 11.0.22+12-LTS, mixed mode)

lscpu output:

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         43 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  12
  On-line CPU(s) list:   0-11
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz
    CPU family:          6
    Model:               85
    Thread(s) per core:  1
    Core(s) per socket:  1
    Socket(s):           12
    Stepping:            7
    BogoMIPS:            4190.15
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon ljmpq nopl xtopology tsc_reliable
                          nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpc
                         id_single ssbd ibrs ibpb stibp ibrs_enhanced tsc_adjust bmi1 avx2 smep bmi2 invpcid avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xsaves arat p
                         ku ospke md_clear flush_l1d arch_capabilities
Virtualization features:
  Hypervisor vendor:     VMware
  Virtualization type:   full
Caches (sum of all):
  L1d:                   384 KiB (12 instances)
  L1i:                   384 KiB (12 instances)
  L2:                    12 MiB (12 instances)
  L3:                    429 MiB (12 instances)
NUMA:
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-11
Vulnerabilities:
  Gather data sampling:  Unknown: Dependent on hypervisor status
  Itlb multihit:         KVM: Mitigation: VMX unsupported
  L1tf:                  Mitigation; PTE Inversion
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
  Retbleed:              Mitigation; Enhanced IBRS
  Spec rstack overflow:  Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization (limited, manual)
  Spectre v2:            Mitigation; Enhanced / Automatic IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
  Srbds:                 Not affected
  Tsx async abort:       Not affected

opensearch 1.2.3 is using lucene-core-8.10.1

@SaiSatwik SaiSatwik changed the title Doc out of order issue from lucene Doc out of order issue from Lucene 8.10.1 May 2, 2024
@mkhludnev
Copy link
Member

Hi @SaiSatwik
Are you running a some sort of test?

@SaiSatwik
Copy link
Author

Hi @mkhludnev

No. We have seen this issue in a VM where single node opensearch deployment is running. Due to this issue opensearch index went into RED state, leading to failure in any query on this index.

@mkhludnev
Copy link
Member

Pardon. I had a wrong clue about a change in test framework occurring later. Now, looking into the version you mention, I realized, it's wrong. Have no idea. Don't you have index sorting configured?

@SaiSatwik
Copy link
Author

@mkhludnev , we do not have index sorting configured. But, could you please help me understand how can this issue could be related to index sorting configuration?

@mkhludnev
Copy link
Member

honestly, have no idea. How many docs you have in this index? May it exceed 2bns?

@SaiSatwik
Copy link
Author

No. It would be hardly in thousands (max 50k).

@jpountz
Copy link
Contributor

jpountz commented May 8, 2024

This log is scary: out-of-order doc IDs, doc freq greater than total term freq, there is a major issue here. Are you doing something exotic on this index? Can you replicate this bug consistently? If so, does it still replicate if you use a more modern JDK (trying to check if there could be a JDK bug). Or can you run with assertions enabled and see if it reports a problem earlier?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants