New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search Index Refactoring #3556
base: master
Are you sure you want to change the base?
Search Index Refactoring #3556
Conversation
idx_cleanName() was called only from Doku_Indexer::addMetaKeys(), lookupKey(), getPages(), histogram()
This function is not called from elsewhere.
Note: idx_listIndexLengths() is used in inc/infoutils.php file
Note: idx_getIndex() is used in inc/infoutils.php file
This function is not called from elsewhere.
This function is not called from elsewhere.
Note: idx_listIndexLengths() is used in iinc/TaskRunner.php and inc/Remote/ApiCore.php file Also used in _test files.
class FulltextSearch class MetaSearch
because used in inc/infoutils.php file
updateTuple() accept int for the second argument
This reverts commit 115f491.
provides convert($query), recert_simple(), and termParser(). No needs to pass $Indexer in method's arguments.
pageLookup() does not use fulltext index, but metadata index
Warning: Parameter 1 to dokuwiki\Search\MetaSearch::callback_pageLookup() expected to be a reference, value given in /path/to/dokuwiki/inc/Extension/Event.php on line 135
PageIndex, PagewordIndex, MetadataIndex inherit the AbstractIndex class
all extending acstract classes should use a static pidCache array
Fixed inconsistent handling of falsy values on fperm setting
Third-party plugins may use this method. The [cloud plugin](https://github.com/dokufreaks/plugin-cloud) uses idx_indexLength().
* master: (111 commits) Update translation translation update don't crush tables too narrow. fixes #3250 translation update Thorough tests for EO, DE, PT and ES translation update Optimized pageRestoreConfirm function Tests for Portuguese and Spanish Changes according to revisions in moisesbr-dw#2 adjust callstack depth for deprecation message further better deprecation messages for self required plugin base files don't test on old PHP releases anymore increase minimum PHP version to 7.2 fixed tests for cleanID and romanization for Greeklish Improved the transliteration from greek to latin. extension cli: do not try to upgrade bundled plugins Public access to patterns in external link parser test the collator fallback always cleanup for collator tests wrap sorting functions into their own class ...
Exceptions are better to handle than errors. What I don't like is that we now have an unfortunate mix of return code and exception signalling for errors. Some methods still return false for errors while others now throw exceptions (always returning true otherwise).
getPID(), saveIndex(), saveIndexKey(), getPageWords() return always true, otherwise exceptions.
Just ignore $value argument if $key argument is array . Ignore enpty key of $key argument. Ensure to treat any null value of $key array as empty string.
Indexer, FulltextIndex, MetadataIndex uses common directory to store *.idx files, but this does not mean they should be singleton objects to avoid lock confrictions.
will reduce access to static $pidCache
frequently used in ajax call, singleton is not effective to reduce multiple instantiations.
singleton is not effective to reduce multiple instantiations, especially for MetadataSearch which is frequently used in ajax call.
this was already fixed by 5afd958 on 2021-02-05
@ssahara one general question: The INDEX_MARK_DELETED mechanism is new, isn't it? I am wondering what it's good for. Previously deletion simply set an empty line. That means that line positions were never reused, but I think that's fine. Having to look for deleted entries makes things more complicated than needed IMHO. Thoughts? |
Yes, the INDEX_MARK_DELETED mechanism is new, because I thought that the 'numeric pageId' should be reused to avoid having sparse |
This is work in progress based on #2943
Goals:
To me the Indexing/Search System consists of several building blocks
Concerns between these different levels should be clearly differentiated currently all these things are very much mixed and spread all over the place.
A series of smaller PRs against this branch should be made before this can be merged into master.