Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text fields are not optimised for operations that require per-document field data like aggregations and sorting #1349

Open
gzhukov opened this issue Oct 16, 2023 · 1 comment

Comments

@gzhukov
Copy link

gzhukov commented Oct 16, 2023

Hi, after upgrade querybook from 2.4.0 to 3.28.0 we faced with missing table list from our metastore. All databases were empty in UI, but I see that update_metastore job works fine in worker's logs.
I checked webserver logs and found such warning:
[2023-10-13 Fri 19:30:02] - /opt/querybook/querybook/server/lib/elasticsearch/search_utils.py - WARNING "Got ElasticSearch exception: RequestError(400, 'search_phase_execution_exception', 'Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [name] in order to load field data by uninverting the inverted index. Note that this can use significant memory.')

After some investigation I added query to log https://github.com/pinterest/querybook/blob/master/querybook/server/lib/elasticsearch/search_utils.py#L64
and found that querybook use sorting function for 'name' field in search_tables_v1 index
{'query': {'bool': {'must': [{'function_score': {'query': {'match_all': {}}, 'boost_mode': 'sum', 'script_score': {'script': {'source': "doc['importance_score'].value * 10 + (doc['golden'].value ? 50 : 0)"}}}}], 'filter': {'bool': {'must': [{'match': {'schema': 'abasharin'}}, {'match': {'metastore_id': 3}}]}}}}, 'size': 100, 'from': 0, '_source': ['id', 'schema', 'name'], 'sort': [{'name': {'order': 'asc'}}], 'highlight': {'pre_tags': ['<mark>'], 'post_tags': ['</mark>'], 'type': 'plain', 'fields': {'columns': {'fragment_size': 20, 'number_of_fragments': 5}, 'data_elements': {'fragment_size': 20, 'number_of_fragments': 5}, 'description': {'fragment_size': 60, 'number_of_fragments': 3}}}}"

I downloaded current index mapping:
curl https://<es-server>/search_tables_v1/_mapping | jq > update_indeces.json
After that I added "fielddata": true, to name field and uploaded new mapping back:
curl -XPUT https://<es-server>/search_tables_v1/_mapping -H "Content-Type: application/json" -d @update_indeces.json
In helps and solved this issue.
update_indeces.json

I tested it with opensearch:2.9.0 and elasticsearch:7.16.2
Also I tried to add fielddata: true to querybook/config/elasticsearch.yaml and recreate indexes but looks like config parser ignores this key.

@adamstruck
Copy link
Contributor

Seems like your indices are out-of-date. The name is set to a keyword here: https://github.com/pinterest/querybook/blob/master/querybook/config/elasticsearch.yaml#L207

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants