[Self] Ideas #2

ajitid · 2021-07-18T20:00:26Z

Hey, this issue is to collate all the ideas that I have in mind. If you have an idea other than the ones mentioned, please create a separate issue for it. If you have suggestions on improving any of the following ideas, drop down a comment.

More (and clearer) examples

This library can make a search on 22000 English words without breaking a sweat. However the demo that shows it is not evident enough. Furthermore, there are many improvements in examples done under experiments/* branches on which we haven't finalized yet. There is also an example of a basic selector but not a more involved one like:

(v) => `${v.firstName} ${v.lastName}`

// ...

// map result items like as selector and highlight matched chars

Add TypeScript based examples too.

Clear code separation and contributing guidelines

Currently this project is structured like generic webapp rather than a library. This is intentional, but might create confusion for some. We can split docs and lib under src/ to make it much clearer.
npx patch-package is required to run for tests to work correctly. However this isn't documented yet. A contributing doc needs to be created.
Adding an issue template would be useful too.

Multiple Selectors and Key Weights

This is something junegunn/fzf doesn't has and this library might too favor covering features other than this.

An idea is to use an array of slightly modified version of existing Fzf class which instead of returning and sorting data appends each item's result to already existing array, something like

list[i].matches.push(_match);
list[i].overallScore += _match.result.score;

and these modified Fzf instances share query's runes. The reason of opting for a list is to have each instance having a different selector.

So the proposed API would look like:

// lib/main.ts
interface Options<U> {
	// ... 
	weight: number // defaults to 1, 0 < weight <= 1
}

// app.ts
import { FzfGroup } from 'fzf'

const fzfGroup = new FzfGroup(list, [/* options for each FZF instance */])
const entries = fzfGroup.find("stuff")

The weight will be multiplied to the score.

Command palette

The motivation to port FZF to JS was to create a command palette, which I still didn't had a chance to work on.

Offloading to web worker/using WASM for performance

There is a WASM based finder and few others which can use web workers like this and this one.

I tried using FZF for JS on web workers (see this branch) and found degradation in performance (perceived performance, not measured performance). I suspect that it could be due to web workers being mainly designed to offload tasks from main thread and these workers might not get CPU and memory resources to a level that a main thread can. I need to investigate how the web worker based libs I mentioned in previous paragraph is able to achieve better perf.

Note to self: Merely converting to code to WASM might even degrade performance. See https://surma.dev/things/js-to-asc/index.html. Threading on WASM is certainly needed.

Relevant names and links:

I'm also preferring to create a version based on (async) generators over promises as an alternative to the current sync version.

Once web worker is done, maybe we could try WebGPU (or WebGL) https://www.youtube.com/watch?v=K2JzIUIHIhc?
See also: Intent to Ship WebGPU in Chromium, WebGPU Explainer

Create and expose `reindex` or keep FZF for JS work on static list?

Efficiently doing re-indexing could be incredibly difficult to do and there is no easy API to expose this to the user. Few libraries like the full text searching library Minisearch allows something along these lines. junegunn/fzf, due to its nature of usage, too doesn't work on dynamic list.

Introduce caching for query patterns and for results

(Related: defer positions creation and create it after limit is processed on the list)

Description to be filled.

Typo tolerance

This is something that was declined by original author (see junegunn/fzf#2272 (comment)).

My initial thought is while it could be implemented, it will be restricted to fuzzy-only, non-extended options. That means:

{
  extended: false,
  algo: 'v1' | 'v2'
}

I'll prefer Sublime-like, only 1 character typo tolerance. Here are few resources:

Fuzzysort - does everything but this
A blog post and a reply by jskinner (author of Sublime Text)

There can be multiple typo tolerance mechanisms — allow mistyping same letter two times (caat instead of cat), neighbor letters being transposed (cta instead of cat), a letter being mistyped (cet instead of cat), etc. Even one if like, can impose many rules around it. Though the Sublime one is much simpler:

Only neighboring transpose is allowed (abuot can match with about, submduleo won't match submodule)
First char transpose in query is not allowed (baout won't match about, obut won't match about)
Not more than one transpose is allowed (sbumoduel won't match submodule)

More resources:

https://stackoverflow.com/questions/487003/how-to-detect-a-typo-in-a-product-search-and-suggest-possible-corrections
https://www.reddit.com/r/ReverseEngineering/comments/5mn2g9/reverse_engineering_sublime_text/
https://elixirforum.com/t/seqfuzz-sublime-text-like-fuzzy-search-my-first-hex-package/32348
https://www.objc.io/blog/2020/08/18/fuzzy-search/
https://github.com/android-password-store/sublime-fuzzy
Google search - dice coefficient vs levenshtein
- dice (npm): string-similarity
- levenshtein (npm): fastest-levenshtein, leven

Stemming and stop words

These terms are taken from https://github.com/bvaughn/js-search

Stop words: User can use a selector to sanitize the items, i.e. remove words that aren't semantically meaningful.
Stemming: FzfGroup API discussed under Multiple Selectors and Key Weights can help here.

However both of these features have same priority as Multiple Selectors and Key Weights.

Old fuse.js interactive docs

People liked interactive doc on its homepage. Maybe add a Try link in docs providing access to a CodeSandbox with options bounded to React state for interactive experience?

Resources:

Other

junegunn/fzf CLI --help

The text was updated successfully, but these errors were encountered:

ajitid · 2021-07-21T19:26:59Z

This lists the items that are done ✅:

Right indexing in the presence of diacritics

Will appear in a release done after v0.5.1.

While we normalize^[1] query and strings in the list internally, we return indices assuming the user has already done the normalization before they highlight the matched characters.

We need to mention this in the docs. Whether the list items have diacritics or not, I think it is a good idea to add .normalize() anyway.

1: Normalization here does not mean removing diacritics (accents) from the string, it means to merge latin character with a diacritic

Normalization

normalize is now on by default. (#8)

Normalizing characters by default makes more sense. If the decision weighs in towards keeping normalize disabled we can thinking of bundling without normalize map and rather importing it instead when needed like day.js or date-fns do for locale. On importing, will patch and fill the empty normalize map and thus reducing bundle size in many cases.

Partially match words in any order

Covered as part of extended search. (#15)

As of version 0.3.1, if you have a list containing:-

krsty silver
krsty-silver
krsty/silver

and you query for "kr si", you get all items back. However if you search for "si kr" you get none. This behavior does not matches with the one we see in junegunn/fzf (Golang) which returns all items back. This issue is already addressed in feat/reverse-order-words-match branch. However the implementation isn't cross-verified with junegunn/fzf. We need to check:-

If there is an even more performant way to do it
Whether the score still matches with its Golang counterpart or not

nvim-telescope/telescope-fzf-native.nvim at first glance seems to show correct behaviour too which is unexpected for me as they only port main algorithm. Check if their code can be referred as well.

Update: This feature is actually a part of Extended Search which is not implemented yet.

Extended Search

It now can be used by passing extended: true in options. It is intentionally made opt-in, as for people who are unfamiliar with junegunn/fzf patterns for this search might initially find the results confusing. (#15)

fuse.js, telescope-fzf-native and junegunn/fzf have support for pattern matching. FZF for JS lacks this currently.

Tiebreaker

It is now available as an option (#17). To respect order of the items that are given in a list by the user, we don't apply any tiebreaker by default.

Currently this library expects the user to send a list in an order that is already sorted by their own descending scoring model, meaning if they want to match 'aa' with ['aa', 'aab'] and prefer to have aab appear first, they should send ['aab', 'aa'].

Here is a snippet from junegunn/fzf CLI's fzf --help:

--tiebreak=CRI[,..]   Comma-separated list of sort criteria to apply
                      when the scores are tied [length|begin|end|index]
                      (default: length)

Naming convention

We've decided to keep both terms as is.

Other name for `selector`

Candidates: accessor, access, getBy, getFn, getStr, use, on, queryOn, ~~pick~~

Reason for rejection:

pick - can be misinterpreted as user is limited to choosing only one value from item object, whereas selector can return a combination of strings v => v.firstName + " " + v.lastName

Other name for `entries`

Candidates: matches, ~~results~~

Reason for rejection:

results - result is already part of returned object from FZF algos

Partially match words in any order (again)

We've decided to not to include as it will complicate APIs. (#54)

While extended search is useful, there seems to be a good case of one not needing all the special patterns but only wanting the ability to fuzzy match by typing query words in any order. (Personally, this is what I would prefer over a basic match.)

This is useful, for example, when the user is searching by people name and types last name before first name. This is also useful if the names have apostrophe in them.

We would also like to put this information in the doc of when the user of the library should not opt for extended search and for disabling diacritics.

Sorter

Sorting now can be turned off (#31). Allowing a custom sorter will complicate its relationship with tiebreaker, thus this part of feature is rejected.

Either allow user to pass a custom sorter or disable the sorter entirely (using null). ~~This is somewhat related to Tiebreaker.~~

Update: Contrary to what I thought, Tiebreakers aren't plain sorter implementations in junegunn/fzf.

Requirement for various transformations on the results will always be there (sort, threshold, etc.). To solve this, instead of adding options to Fzf itself, a callbag like interface will be given (see example) or common utility functions will simply be exposed so that users can create a wrapper around find. Lunr too uses concept of a pipeline.

Option to disable `forward`

Added (#29)

When searching in file paths or URIs, users will usually prefer end of the path to match first.

Add tests for API

Both statement and branch coverage are above 90%.

The algorithm has tests carried over from junegunn/fzf but the exposed API doesn't.

caschbre · 2021-10-20T01:14:02Z

Regarding your web worker notes above... one thing to consider is that offloading the search to a worker can free up the main thread for the user. I actually do that when using fusejs because in some cases the user is searching across multiple properties on a large array of items. While the overall search performance might be slower, the UI freezing up on the user is more perceptible than search being slower.

ajitid · 2021-10-20T06:09:11Z

@caschbre to get a perceptible lag-free UI, an async finder would be introduced in the next version of FZF.

This was referenced Jul 28, 2021

NOT sort by score option #28

Closed

feat: expose an option to toggle sorting by score #31

Merged

ajitid added the enhancement New feature or request label Oct 14, 2021

ajitid mentioned this issue Oct 20, 2021

Search across multiple item properties #86

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Self] Ideas #2

[Self] Ideas #2

ajitid commented Jul 18, 2021 •

edited

ajitid commented Jul 21, 2021 •

edited

caschbre commented Oct 20, 2021

ajitid commented Oct 20, 2021 •

edited

[Self] Ideas #2

[Self] Ideas #2

Comments

ajitid commented Jul 18, 2021 • edited

More (and clearer) examples

Clear code separation and contributing guidelines

Multiple Selectors and Key Weights

Command palette

Offloading to web worker/using WASM for performance

Create and expose reindex or keep FZF for JS work on static list?

Introduce caching for query patterns and for results

Typo tolerance

Stemming and stop words

Old fuse.js interactive docs

Other

ajitid commented Jul 21, 2021 • edited

Right indexing in the presence of diacritics

Normalization

Partially match words in any order

Extended Search

Tiebreaker

Naming convention

Other name for selector

Other name for entries

Partially match words in any order (again)

Sorter

Option to disable forward

Add tests for API

caschbre commented Oct 20, 2021

ajitid commented Oct 20, 2021 • edited

ajitid commented Jul 18, 2021 •

edited

Create and expose `reindex` or keep FZF for JS work on static list?

ajitid commented Jul 21, 2021 •

edited

Other name for `selector`

Other name for `entries`

Option to disable `forward`

ajitid commented Oct 20, 2021 •

edited