Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MOD-6540: Support EMPTY indexing for TEXT fields (#4622) #4637

Merged
merged 5 commits into from May 22, 2024

Conversation

raz-mon
Copy link
Collaborator

@raz-mon raz-mon commented May 12, 2024

Describe the changes in the pull request

Adds support for EMPTY TEXT indexing - please refer to #4622's comment for more information.

Also:

  • Removes support of EMPTY TAG indexing for empty JSON arrays and objects.
  • Cleans up a bunch of code (dead mainly).

Which issues this PR fixes

  1. MOD-6580.
  2. MOD-6540

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

Copy link

codecov bot commented May 12, 2024

Codecov Report

Attention: Patch coverage is 90.14085% with 14 lines in your changes are missing coverage. Please review.

Project coverage is 86.15%. Comparing base (eb6a4b9) to head (950e9b5).

Files Patch % Lines
src/spec.c 93.50% 5 Missing ⚠️
src/inverted_index.c 0.00% 3 Missing ⚠️
src/doc_table.c 50.00% 1 Missing ⚠️
src/document.c 80.00% 1 Missing ⚠️
src/json.c 66.66% 1 Missing ⚠️
src/numeric_index.c 50.00% 1 Missing ⚠️
src/query.c 91.66% 1 Missing ⚠️
src/query_optimizer.c 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4637      +/-   ##
==========================================
+ Coverage   86.07%   86.15%   +0.08%     
==========================================
  Files         190      190              
  Lines       34534    34508      -26     
==========================================
+ Hits        29724    29730       +6     
+ Misses       4810     4778      -32     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@raz-mon raz-mon marked this pull request as draft May 16, 2024 05:50
@raz-mon raz-mon marked this pull request as ready for review May 16, 2024 05:50
Copy link
Collaborator

@oshadmi oshadmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update FT.CREATE in commands.json with ISEMPTY argument?
(should also be aligned with documentation PR redis/docs#199)

@fcostaoliveira
Copy link

fcostaoliveira commented May 16, 2024

Automated performance analysis summary

This comment was automatically generated given there is performance data available.

In summary:

  • Detected a total of 29 stable tests between versions.
  • Detected a total of 7 highly unstable benchmarks.
  • Detected a total of 2 improvements above the improvement water line.
  • Detected a total of 3 regressions bellow the regression water line 5.0.

You can check a comparison in detail via the grafana link

Comparison between master and razmon-feature_empty_text_indexing.

Time Period from 30 days ago. (environment used: oss-standalone)

Test Case Baseline master (median obs. +- std.dev) Comparison razmon-feature_empty_text_indexing (median obs. +- std.dev) % change (higher-better) Note
ftsb-10K-enwiki_abstract-hashes-fulltext-sortby 68 +- 1.0% (7 datapoints) 68.00 -0.1% No Change
ftsb-10K-enwiki_abstract-hashes-term-prefix 7216 +- 7.3% (7 datapoints) 7260.00 0.6% waterline=7.3%. No Change
ftsb-10K-enwiki_abstract-hashes-term-suffix 2028 +- 4.1% (7 datapoints) 2165.00 6.8% IMPROVEMENT
ftsb-10K-enwiki_abstract-hashes-term-suffix-withsuffixtrie 69641 +- 4.1% (7 datapoints) 71209.00 2.3% No Change
ftsb-10K-enwiki_abstract-hashes-term-wildcard 11933 +- 2.6% (7 datapoints) 12261.00 2.8% No Change
ftsb-10K-enwiki_pages-hashes-fulltext-mixed_simple-1word-query_write_1_to_read_20.yml 1265 +- 6.9% (7 datapoints) 1238.00 -2.2% waterline=6.9%. No Change
ftsb-10K-enwiki_pages-hashes-load 47738 +- 3.6% (7 datapoints) 47970.00 0.5% No Change
ftsb-1K-enwiki_abstract-hashes-term-contains 1718 +- 3.4% (7 datapoints) 1825.00 6.3% IMPROVEMENT
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query 307 +- 12.7% UNSTABLE (7 datapoints) 340.00 11.0% UNSTABLE (very high variance)
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query-non-sortable 41 +- 7.4% (7 datapoints) 38.00 -7.3% waterline=7.4%. potential REGRESSION
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query@100_ops_sec 100 +- 0.0% (7 datapoints) 100.00 0.0% No Change
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-union-query 2666 +- 2.6% (7 datapoints) 2423.00 -9.1% REGRESSION
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-union-query-non-sortable 963 +- 10.1% UNSTABLE (7 datapoints) 912.00 -5.4% UNSTABLE (very high variance)
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-union-query@100_ops_sec 100 +- 0.0% (7 datapoints) 100.00 -0.0% No Change
ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query 966 +- 12.3% UNSTABLE (7 datapoints) 933.00 -3.4% UNSTABLE (very high variance)
ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query@100_ops_sec 100 +- 0.0% (7 datapoints) 100.00 -0.0% No Change
ftsb-1M-enwiki_abstract-hashes-load 22305 +- 2.8% (7 datapoints) 21859.00 -2.0% No Change
ftsb-1M-nyc_taxis-ftadd-load 20740 +- 2.7% (7 datapoints) 20987.00 1.2% No Change
ftsb-1M-nyc_taxis-hashes-load 22186 +- 2.4% (7 datapoints) 21265.00 -4.2% potential REGRESSION
search-aggregate-post-filter-simple.yml 93421 +- 4.4% (7 datapoints) 81512.00 -12.7% REGRESSION
search-filtering-tag-numeric 199 +- 9.6% (7 datapoints) 210.00 5.7% waterline=9.6%. potential IMPROVEMENT
search-filtering-tag-numeric-filter-pipeline 18690 +- 2.4% (7 datapoints) 18342.00 -1.9% No Change
search-ftsb-10K-enwiki_abstract-hashes-term-withoutsuffix-trie 40324 +- 4.2% (7 datapoints) 37789.00 -6.3% REGRESSION
search-ftsb-10K-enwiki_abstract-hashes-term-withsuffix-trie 38815 +- 4.8% (7 datapoints) 39012.00 0.5% No Change
search-ftsb-1700K-docs-union-iterators-q3 6.2 +- 0.9% (7 datapoints) 6.20 0.8% No Change
search-ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query-non-sortable@50_ops_sec 50 +- 10.7% UNSTABLE (7 datapoints) 48.00 -3.3% UNSTABLE (very high variance)
search-ftsb-1M-enwiki_abstract-hashes-fulltext-2word-union-query-non-sortable@100_ops_sec 100 +- 0.0% (7 datapoints) 100.00 0.0% No Change
search-ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query-non-sortable 151 +- 3.9% (7 datapoints) 150.00 -0.5% No Change
search-ftsb-370K-docs-union-iterators-q4 6.7 +- 1.1% (7 datapoints) 6.70 -0.6% No Change
search-ftsb-5500K-docs-union-iterators-q2 0.93 +- 0.8% (2 datapoints) 0.92 -0.5% No Change
search-geo 186 +- 3.2% (7 datapoints) 187.00 0.2% No Change
search-high-cardinality-negation-term-comparison_union_all_other_terms 7.3 +- 7.5% (5 datapoints) 7.20 -1.4% waterline=7.5%. No Change
search-numeric 4510 +- 20.7% UNSTABLE (7 datapoints) 3074.00 -31.8% UNSTABLE (very high variance)
search-numeric-optimize 9059 +- 2.0% (7 datapoints) 8895.00 -1.8% No Change
search-numeric-sortby 3012 +- 33.1% UNSTABLE (7 datapoints) 2884.00 -4.2% UNSTABLE (very high variance)
search-numeric-sortby-desc 3037 +- 9.9% (7 datapoints) 2854.00 -6.1% waterline=9.9%. potential REGRESSION
search-numeric-sortby-desc-optimize 33 +- 28.2% UNSTABLE (7 datapoints) 49.00 48.3% UNSTABLE (very high variance)
search-numeric-sortby-optimize 19 +- 2.9% (6 datapoints) 20.00 4.6% potential IMPROVEMENT
vecsim-arxiv-titles-384-angular-filters-m16-ef-128-fulltext-filter 474 +- 2.2% (7 datapoints) 484.00 2.0% No Change
vecsim-arxiv-titles-384-angular-filters-m16-ef-128-numeric-filter 105 +- 3.9% (7 datapoints) 108.00 2.6% No Change
vecsim-arxiv-titles-384-angular-filters-m16-ef-128-tag-filter 58518 +- 4.6% (7 datapoints) 59854.00 2.3% No Change

@raz-mon raz-mon enabled auto-merge May 16, 2024 10:37
@raz-mon
Copy link
Collaborator Author

raz-mon commented May 16, 2024

@oshadmi Not sure, who uses the commands.json file?

Update: The file was updated in order to keep in sync with docs.

@raz-mon raz-mon disabled auto-merge May 16, 2024 13:00
]
cmd_assert(env, cmd, expected)

if dialect == 2:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to be removed after fixing MOD-6967 (FT.AGGREGATE fails if apply and sortby with dialect > 2)

@oshadmi
Copy link
Collaborator

oshadmi commented May 16, 2024

Reviewing benchmark: vecsim-arxiv-titles-384-angular-filters-m16-ef-128-fulltext-filter shows -8.6% but since it is unstable after consulting with @fcostaoliveira, we can proceed.

@raz-mon raz-mon enabled auto-merge May 17, 2024 15:28
@raz-mon raz-mon added this pull request to the merge queue May 18, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 18, 2024
@oshadmi oshadmi added this pull request to the merge queue May 19, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 19, 2024
@raz-mon raz-mon added this pull request to the merge queue May 22, 2024
github-merge-queue bot pushed a commit that referenced this pull request May 22, 2024
* MOD-6540: Support EMPTY indexing for TEXT fields (#4622)

* Resolving conflicts

* Updatet commands.json file
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 22, 2024
@raz-mon raz-mon added this pull request to the merge queue May 22, 2024
Merged via the queue into master with commit 10261d2 May 22, 2024
31 checks passed
@raz-mon raz-mon deleted the razmon-feature_empty_text_indexing branch May 22, 2024 07:26
Copy link

Backport failed for 2.10, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 2.10
git worktree add -d .worktree/backport-4637-to-2.10 origin/2.10
cd .worktree/backport-4637-to-2.10
git switch --create backport-4637-to-2.10
git cherry-pick -x 9a7150f6bba5b55f6193363cbbe4be1ee4e8dce4 c046422546dc5742abf1f6d63f57ac3a652858a8 950e9b563788a0b92386f6b31d92e293f34a0883

raz-mon added a commit that referenced this pull request May 22, 2024
* MOD-6540: Support EMPTY indexing for TEXT fields (#4622)

* Resolving conflicts

* Updatet commands.json file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants