Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDFImageReader not working with PDFKnowledgeBase #309

Closed
sridharaiyer opened this issue May 17, 2024 · 7 comments
Closed

PDFImageReader not working with PDFKnowledgeBase #309

sridharaiyer opened this issue May 17, 2024 · 7 comments
Assignees

Comments

@sridharaiyer
Copy link

Running the following code, gives this error:

from phi.assistant import Assistant
from phi.document.reader.pdf import PDFImageReader
from phi.knowledge.pdf import PDFKnowledgeBase
from phi.vectordb.lancedb.lancedb import LanceDb

# type: ignore
db_url = "/tmp/lancedb"  # Optional

# Create a knowledge base with the PDFs from the data/pdfs directory
knowledge_base = PDFKnowledgeBase(
    path="data/pdfs",
    vector_db=LanceDb(uri=db_url),
    reader=PDFImageReader(chunk=True),
)
# Load the knowledge base
knowledge_base.load(recreate=False)

# Create an assistant with the knowledge base
assistant = Assistant(
    knowledge_base=knowledge_base,
    add_references_to_prompt=True,
)

# Ask the assistant about the knowledge base
assistant.print_response("Summarize this document.", markdown=True)

Error -

INFO     Creating table: phi                                                    
Traceback (most recent call last):
  File "/Users/siyer/PycharmProjects/report-call-summarizer/localdb-lancedb-knowledgebase.py", line 10, in <module>
    knowledge_base = PDFKnowledgeBase(
                     ^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/report-call-summarizer/lib/python3.11/site-packages/pydantic/main.py", line 164, in __init__
    __pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__pydantic_self__)
pydantic_core._pydantic_core.ValidationError: 1 validation error for PDFKnowledgeBase
reader
  Input should be a valid dictionary or instance of PDFReader [type=model_type, input_value=PDFImageReader(chunk=True...\n\r', '\t', ' ', '  ']), input_type=PDFImageReader]
    For further information visit https://errors.pydantic.dev/2.5/v/model_type

Process finished with exit code 1

@jacobweiss2305 jacobweiss2305 self-assigned this May 17, 2024
@jalotra
Copy link

jalotra commented May 17, 2024

this looks like a 1 line change
in pdf.py; reader should of type "reader"

reader: PDFReader = PDFReader()

class PDFKnowledgeBase(AssistantKnowledge):
    path: Union[str, Path]
    reader: Reader = PDFReader()

@datumradix
Copy link

cool

@sridharaiyer
Copy link
Author

Hi Team. Do we have an update on when we can get a new release with this change?

@ysolanky
Copy link
Contributor

ysolanky commented May 20, 2024

@sridharaiyer PR will be out shortly and most likely we will be releasing a new version by EOD

@ysolanky
Copy link
Contributor

The PR is out @sridharaiyer. You are welcome to test

@sridharaiyer
Copy link
Author

The PR is out @sridharaiyer. You are welcome to test

Tested. Works fine for my use case, thanks a lot!

@jacobweiss2305
Copy link
Contributor

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants