-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: loading 5.7 GB data to llama-index #13588
Comments
Hey there, @yasir23! 🎉 I'm Dosu, your friendly neighborhood bot, here to lend a hand with bugs, answer your queries, and guide you on your journey to becoming a contributor. While we wait for a human maintainer to swing by, know that I'm here to help out. Let me dive into your issue and I'll get back to you with a full answer shortly! |
I've never seen his before. Pretty impossible to debug without a way to reproduce 😅 My guess is either running out of memory, or some kind of error reading some of your pdfs (corrupted? password protected?) You can try loading them one by one instead maybe?
|
To tackle the issue of "Ignoring wrong pointing object" warnings when loading a 5.7 GB dataset of PDF files, consider the following steps:
By following these steps, you should be able to identify and resolve the issue with the "Ignoring wrong pointing object" warnings during the dataset loading process.
|
Bug Description
I have loaded 5.7 GB dataset to the loader but its giving this response:
Ignoring wrong pointing object 59 0 (offset 0)
Ignoring wrong pointing object 60 0 (offset 0)
Ignoring wrong pointing object 62 0 (offset 0)
Ignoring wrong pointing object 63 0 (offset 0)
whole dataset is composed of pdf files...
Version
latest
Steps to Reproduce
from llama_index.core import SimpleDirectoryReader
reader = SimpleDirectoryReader(input_dir="./dataset")
documents = reader.load_data()
Relevant Logs/Tracbacks
No response
The text was updated successfully, but these errors were encountered: