Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

powerpoint file could not encoding in RAG #2410

Closed
4 tasks done
flyfox666 opened this issue May 20, 2024 · 2 comments · Fixed by #2422
Closed
4 tasks done

powerpoint file could not encoding in RAG #2410

flyfox666 opened this issue May 20, 2024 · 2 comments · Fixed by #2422

Comments

@flyfox666
Copy link

flyfox666 commented May 20, 2024

Bug Report

Description

Bug Summary:
Failed to upload ppt file in documents

Steps to Reproduce:
Failed to upload file

Expected Behavior:
Successfully read and parsed

Actual Behavior:
Display Failure

Environment

  • Open WebUI Version: v0.1.125

  • Ollama (if applicable): 0.1.38

  • Operating System: win11 wsl2 dockerdesktop

  • **Browser (if applicable):**chrome latest

Reproduction Details

Confirmation:

  • I have read and followed all the instructions provided in the README.md.
  • I am on the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.

Logs and Screenshots

Browser Console Logs:
[Include relevant browser console logs, if applicable]

Docker Container Logs:
2024-05-20 13:45:15 INFO:apps.rag.main:file.content_type: application/vnd.openxmlformats-officedocument.presentationml.presentation
2024-05-20 13:45:17 ERROR:apps.rag.main:Could not detect encoding for /app/backend/data/uploads/samplepptx2.pptx
2024-05-20 13:45:17 Traceback (most recent call last):
2024-05-20 13:45:17 File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/text.py", line 43, in lazy_load
2024-05-20 13:45:17 text = f.read()
2024-05-20 13:45:17 ^^^^^^^^
2024-05-20 13:45:17 File "", line 322, in decode
2024-05-20 13:45:17 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbc in position 18: invalid start byte
2024-05-20 13:45:17
2024-05-20 13:45:17 During handling of the above exception, another exception occurred:
2024-05-20 13:45:17
2024-05-20 13:45:17 Traceback (most recent call last):
2024-05-20 13:45:17 File "/app/backend/apps/rag/main.py", line 808, in store_doc
2024-05-20 13:45:17 data = loader.load()
2024-05-20 13:45:17 ^^^^^^^^^^^^^
2024-05-20 13:45:17 File "/usr/local/lib/python3.11/site-packages/langchain_core/document_loaders/base.py", line 29, in load
2024-05-20 13:45:17 return list(self.lazy_load())
2024-05-20 13:45:17 ^^^^^^^^^^^^^^^^^^^^^^
2024-05-20 13:45:17 File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/text.py", line 46, in lazy_load
2024-05-20 13:45:17 detected_encodings = detect_file_encodings(self.file_path)
2024-05-20 13:45:17 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-05-20 13:45:17 File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/helpers.py", line 50, in detect_file_encodings
2024-05-20 13:45:17 raise RuntimeError(f"Could not detect encoding for {file_path}")
2024-05-20 13:45:17 RuntimeError: Could not detect encoding for /app/backend/data/uploads/samplepptx2.pptx

Screenshots (if applicable):
image

Installation Method

Docker

Additional Information

[Include any additional details that may help in understanding and reproducing the issue. This could include specific configurations, error messages, or anything else relevant to the bug.]

Note

If the bug report is incomplete or does not follow the provided instructions, it may not be addressed. Please ensure that you have followed the steps outlined in the README.md and troubleshooting.md documents, and provide all necessary information for us to reproduce and address the issue. Thank you!

@jannikstdl
Copy link
Contributor

Got the error, will make a fix

@tjbck tjbck mentioned this issue May 20, 2024
@tjbck tjbck linked a pull request May 20, 2024 that will close this issue
@flyfox666
Copy link
Author

LGTM👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants