Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error message in case of Solr error #161

Open
horde3d opened this issue Jun 12, 2023 · 0 comments
Open

Improve error message in case of Solr error #161

horde3d opened this issue Jun 12, 2023 · 0 comments

Comments

@horde3d
Copy link

horde3d commented Jun 12, 2023

When indexing a file where e.g. Tika extracts too long meta data entries, the current exception handling for the resulting HTTP 400 error from solr is not very helpful.

Printing the response text in addition makes it far more understandable where the issue comes from.
So I would propose to add a print statement in case of status_code >= 400

 # if bad status code, raise exception
  if r.status_code >= 400:
      print('Solr {} error: {}'.format(r.status_code, r.text))
  r.raise_for_status()

See the previous error output:

Error while posting data to Solr: 400 Client Error: Bad Request for url: http://localhost:8983/solr/opensemanticsearch/update?commit=trueError while exporting to index or database: /media/text_document.docx

vs. the new error output:

"msg":"Exception writing document id /media/text_document.docx to the index; possible analysis error: Document contains at least one immense term in field="Text_TextEntry_ss" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[107, 101, 121, 119, 111, 114, 100, 61, 88, 77, 76, 58, 99, 111, 109, 46, 97, 100, 111, 98, 101, 46, 120, 109, 112, 44, 32, 118, 97, 108]...', original message: bytes can be at most 32766 in length; got 38935. Perhaps the document has an indexed string field (solr.StrField) which is too large",
"code":400}}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant