-
-
Notifications
You must be signed in to change notification settings - Fork 607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Function Score Query #2395
Comments
Out of curiosity, do you plan to pass a Python callable to this eventually? If so, I fear from personal experience this might prohibitively slow due to the overhead of getting the GIL and crossing the Python-Rust boundary. |
Yes, that's my ultimate goal. Actually I have tried to compile the source in tantivy-py as follow: @numba.njit
def score_add_10(score: float) -> float:
return score + 10
function_score_query = Query.function_score_query(
const_score_query, lambda _, score, __: score_add_10(score)
) and the numba trick works. I understand that currently pyO3 needs you to acquire a GIL in Python, but I'm not sure if this is still the case in the future. If this PR passes, more investigations should be done on the performance issue when calling Python from Rust with JIT / other compiled code. Other than that, I think this feature should exist while providing templates for other languages are just an extra benefit. Also, as ElasticSearch's documentation said (which also same as Lucene), the function score query should be called only after a majority of documents are filtered out. I also expect users only map the scoring after the retrieval stage as iterating through all the documents is slow. |
It certainly still does and even though we aware of nogil CPython builds, there are a lot of issues around that still unresolved, so I wouldn't hold my breath. This is particularly problematic as
I did not add this to argue against the feature itself, just wanted to share some unhappy experiences trying to inject behaviour as Python code into Rust code. |
I think for integrating python code there could be some alternatives. The first thing come up in my mind is that we can create some pre-built 'function factory' that perform function currying, so user just plug-in their parameters and the function is executed in rust. Say users want a For more complicated usecase, they might just create their own pyO3 distribution with the additional function signature that suits their case. Although this seems quite similar to implementing their own Query Struct, but still their work is much less that they don't need figure out the whole querying logic like Weight and Scorer. |
Is your feature request related to a problem? Please describe.
Recently I found that tantivy is lacking some common search engine properties like flexible scoring mechanism on retrieving the relevant docs. Currently users are able to tweak the score through
TopDoc
'stweak_score
method, the method evaluates the score at the last level which makes customizable scoring based on different search branches (Query) difficult.For example, when using Disjunction Max with several queries, by passing a flexible closure on defining the score, we can easily fine-grained control the scoring and offsets of each queries. Existing solution relies on only Boosting which can nest several hierarchies and becomes hard to read, and Boosting doesn't allow offsets as well.
Describe the solution you'd like
By introducing the FunctionScoreQuery, users can define their own closure for the score modification algorithm in the query level. The score tweak happens before the final
TopDoc
'stweak_score
method thus greater flexibility to users.Introducing the FunctionScoreQuery brings several benefits:
score
method on the scorer when their requirements are complicated, users can just define a function that is clean and neat. For simpler usecases, native query types likeBoostQuery
is prefered.Query
struct due to different language implementations. For example, for tantivy-py, thoughpyO3
can turn python objects into rust's struct, we have to first define all the utility classes before tantivy-py can consume.The text was updated successfully, but these errors were encountered: