New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to query fields indexed by nouveau #4997
Comments
thank you for the detailed report, I will look into it. My first thought is that the query parser is transforming your "Administrator" to "administrator", but as it was indexed as "string" and not "text" it is held as "Administrator" in the index itself, and thus doesn't match. assuming that's it then I agree that the query parser should not do this for string fields and I will make a fix. |
But I tried to query with "admin", "administrator" and "Administrator" without any luck. Same holds for other values like "User". Thus I am suspecting something is wrong with the analyzed value from index or query not giving out the same result. Email being a special case where analyzer does not change the value, it matches. |
Yes, I mean in the index it is "Administrator" but you are not able query with the "A" as the query parser converts with the standard analyzer. you might say 'q=foo:Administrator' but the query parser is making it a term query on "administrator". |
e.g, if you specified the "keyword" analyzer for the lastName field, the query parser won't lowercase it for you and it should then match. |
Can confirm that's the case. I modified the doc to give the |
I also modified the index to be of type If that's expected behavior, I can update the index creation doc. |
"text" type means the value is analyzed. the Lucene analyzers typically force to lower case among other effects, which explains the new success. |
Yeah, I got to understand that from the (archived) couchdb-lucene project @rnewson , you'd still like to work on this (I can also try to check the issue) with the lower-casing of string query. |
I still intend to make an enhancement. Assuming I'm right in my first comment you did nothing wrong, and I would like nouveau to do the right thing. We know that "string" fields will not be analyzed, we need to tell the query parser to also not analyze the query string for "string" fields (and nouveau knows the index definition, so it does know which fields are "string" or "text", etc). |
rehi (I've been out on vacation the last few weeks). I've mocked up a few approaches to this locally and I don't like any of them, they all either have a non-trivial overhead or other odd side-effects more surprising than what you've encountered. I think the right move is to clarify that if you index with type "string" and you intend to search on that field (as opposed to only sorting on it, for example), then you need to specify the "keyword" analyzer for that field in the index definition. If you do that, everything works out nicely. In your case I think you actually do want the "text" type for "lastName", so that you can search case-insensitively, but only you know for sure. |
Hey, thanks for the updates. The documentation makes the type field more clear in #5018. |
no problem! |
@rnewson Even using the provided suggestions from #5018 fail for different case. I have a document with fields name and version. Version is stored as a string in couchdb and sample values are $ curl --user "admin:admin" 'http://localhost:5984/_nouveau_analyze' -X POST -H 'Content-Type: application/json' -d '{"analyzer": "simple_asciifolding", "text": "4.2.0"}' | jq
{
"tokens": []
} Thus I created the index as following: {
"_id": "_design/lucene",
"_rev": "238-6e02d3801cc64311f5244cb242855e82",
"nouveau": {
"projects": {
"default_analyzer": "keyword",
"field_analyzers": {
"version": "keyword"
},
"index": "function(doc) {\nif(doc.version !== undefined && doc.version != null && doc.version.length >0) {\n index('text', 'version', doc.version, {'store': true});\n }\n}"
}
} Notice I added |
You specify the Can you show the result of querying the view with |
Here are the outputs as requested. I am getting same results for GET and POST queries. Output of analyze:
Output of
Output of
Output with the doc match:
|
thanks. |
ok, the short answer is that the (nouveau-specific) query parser interprets "4.2.0" as a number and performs a numeric query, not a text/string query. I'm surprised by that, but obviously the same would be true for "4", etc. This is a very helpful thread btw, these are exactly the issues I want to confront before removing the 'experimental' label from nouveau. |
That's core Java behaviour. |
BTW for context, I am translating the project sw360 which currently uses couchdb-lucene to nouveau. |
that's helpful to know, thanks. I'm looking at changing the "magical" nature of numeric queries. I extended/altered the basic lucene query syntax to auto-detect numbers but it has always been a bit awkward (as you've re-discovered). so I'm looking at a syntax extension that lets you tell nouveau that you intend to look for "2" as a string or as a number, explicitly. |
posted a draft PR that addresses this, with some extensive prose on whether it's a good idea or not. |
Will it make sense to use the field type of the index? We already have types |
@GMishx I merged a fix for this, but note that I had to change how some things work (you can see the documentation diff in #5021). Essentially you don't need to put a type indicator at the end of the field name when sorting. what should now happen is you can index a field as a number or a string and the right kind of query will be used. Please give it a try. |
I can confirm the indexing is now working as expected for mentioned issue. Thanks for the quick fix @rnewson I can index and query values "4.2.0", "1" and "2". I will test it further with other values as well and update here. |
thanks for the confirmation, I like this change and your issue was the nudge I needed to make this improvement, so thank you. |
Description
I have compiled latest couchdb with
./configure --enable-nouveau
and it is running fine. Even started the nouveau server with the created./rel/couchdb/nouveau/bin/nouveau server
.Now, when I try to query the information from the indexes, it does not work for fields other than for
email
.Steps to Reproduce
I have a
sw360users
database with following fields:Upon this DB, created a ddoc for nouveau with following document:
Here, I am indexing 3 fields,
givenname
,lastname
andemail
. I tried various configurations by changing the positions ofindex()
in the function, using different type of analyzers for creating the index.I see no error in the nouveau logs or in the couchdb logs after the creation of ddoc. Thus, I relaxed :-)
Note: Responses are trimmed for brevity.
Now, when I queried all records with
q=*:*
, I get 10 fields since I have 10 users:$ curl --user "admin:admin" 'http://localhost:5984/sw360users/_design/nouveau_user/_nouveau/users' -X POST -H 'Content-Type: application/json' -d '{"q": "*:*"}'
If I try to query with field
email
, I get expected response:$ curl --user "admin:admin" 'http://localhost:5984/sw360users/_design/nouveau_user/_nouveau/users' -X POST -H 'Content-Type: application/json' -d '{"q": "email:setup*"}'
But with field
lastname
, I get nothing:$ curl --user "admin:admin" 'http://localhost:5984/sw360users/_design/nouveau_user/_nouveau/users' -X POST -H 'Content-Type: application/json' -d '{"q": "lastname:Administrator"}'
Tried multiple times with
lastname:admin*
,lastname:administrator
,lastname:Administrator
but failed to get any response even with different analyzers. The behavior is same for the other fieldgivename
. Querying only works for email with different lucene syntax.Expected Behaviour
Expected to query the indexes on different fields as well.
Your Environment
$ curl --user "admin:admin" 'http://localhost:5984'
Nouveau is also configured in
default.ini
using default./rel/couchdb/etc/nouveau.yaml
:3.3.3-29db2df
curl 8.2.1 (x86_64-conda-linux-gnu) libcurl/8.2.1 OpenSSL/3.0.10 zlib/1.2.13 libssh2/1.10.0 nghttp2/1.52.0
Ubuntu 22.04.4 LTS
Additional Context
Using
counts
to aggregate index values works just as expected.$ curl --user "admin:admin" 'http://localhost:5984/sw360users/_design/nouveau_user/_nouveau/users' -X POST -H 'Content-Type: application/json' -d '{"q": "*:*", "counts": ["lastname"]}'
$ curl --user "admin:admin" 'http://localhost:5984/sw360users/_design/nouveau_user/_nouveau_info/users'
The text was updated successfully, but these errors were encountered: