Investigate opensearch regression, not treating spaces as && within fields #349

akotlar · 2023-11-13T04:18:28Z

Fix opensearch regression making
heterozygotes:(4805 && 1805) cadd > 20
and
heterozygotes:(4805 1805) cadd > 20 (no &&) work the same
Previously, in Elasticsearch 5.6 (b10), these were equivalent

akotlar · 2024-03-06T02:40:50Z

This is related: elastic/elasticsearch#29148

akotlar · 2024-03-07T02:42:06Z

Fixed in https://github.com/bystrogenomics/bystro-web/pull/384 by creating a pre-processor for the query_string queries that transforms separate terms into parentheses-wrapped terms, which triggers elasticsearch/opensearch to search those terms individually, just as before. See the linked PR for more details. We also now have a small test suite to check that we are transforming things correctly, and the first set of transforms we check are:

const testCases = [
            { input: "exonic pathogenic", expected: "(exonic) (pathogenic)" },
            { input: "(exonic pathogenic)", expected: "(exonic pathogenic)" },
            { input: 'refseq.name2:GAA', expected: '(refseq.name2:GAA)' },
            { input: 'refseq.name2:"GAA"', expected: '(refseq.name2:"GAA")' },
            { input: 'gene:"HELLO"', expected: '(gene:"HELLO")' },
            { input: '"Hello"', expected: '("Hello")' },
            { input: '+(chrom:chr17 pos:39580562)', expected: '+(chrom:chr17 pos:39580562)' },
            { input: 'exonic AND cadd:>20.2', expected: '(exonic) AND (cadd:>20.2)' },
            { input: '-(gene:BRCA1) OR +(gene:BRCA2)', expected: '-(gene:BRCA1) OR +(gene:BRCA2)' },
            { input: '*pathogenic*', expected: '(*pathogenic*)' },
            { input: 'BRCA1? AND BRCA2?', expected: '(BRCA1?) AND (BRCA2?)' }
        ];

As seen above, terms that are already wrapped in parentheses are not affected. In this way we get the best of both worlds: by default queries behave as before, with the user being able to freely type queries like exonic pathogenic cadd > 20, while also now supporting synonyms that are phrases of multiple space separated terms, in which case we would now wrap those in parentheses (some long disease name), or if we want an exact match, in quote "some long disease name". I will add documentation for this.

live on https://bystro-dev.emory.edu

akotlar mentioned this issue Nov 13, 2023

Annotation Sprint 3 Task List #302

Closed

7 tasks

akotlar changed the title ~~Fix opensearch regression making~~ Fix opensearch regression, not treating spaces as && within fields Nov 13, 2023

akotlar added this to the Sprint 4 milestone Nov 13, 2023

akotlar added the search label Nov 13, 2023

akotlar self-assigned this Nov 13, 2023

akotlar changed the title ~~Fix opensearch regression, not treating spaces as && within fields~~ Investigate opensearch regression, not treating spaces as && within fields Nov 13, 2023

akotlar mentioned this issue Nov 13, 2023

Sprint 4 - Alex Task List #348

Closed

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate opensearch regression, not treating spaces as && within fields #349

Investigate opensearch regression, not treating spaces as && within fields #349

akotlar commented Nov 13, 2023 •

edited

akotlar commented Mar 6, 2024

akotlar commented Mar 7, 2024 •

edited

Investigate opensearch regression, not treating spaces as && within fields #349

Investigate opensearch regression, not treating spaces as && within fields #349

Comments

akotlar commented Nov 13, 2023 • edited

akotlar commented Mar 6, 2024

akotlar commented Mar 7, 2024 • edited

akotlar commented Nov 13, 2023 •

edited

akotlar commented Mar 7, 2024 •

edited