Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'charmap' codec can't decode #154

Open
Cinux90 opened this issue Dec 8, 2021 · 1 comment
Open

'charmap' codec can't decode #154

Cinux90 opened this issue Dec 8, 2021 · 1 comment

Comments

@Cinux90
Copy link

Cinux90 commented Dec 8, 2021

Hi All,

while starting to use organize I will setup my rules before execute it on productive files.
Therefore I started to set up a easy config file:

rules:
  - folders: ~/tmp_doc-test
    subfolders: false
    filters:
      - extension: pdf 
      - filecontent: "Entgeltbescheinigung"
    actions:
      - echo: "Found PDF!"
      - copy: "~/tmp_doc-test/sortiert/Lohnzettel/"

And execute it as usual:

organize run

For some files i face Following issues:

  File BWG.pdf:
    - (FileContent) ERROR! 'charmap' codec can't decode byte 0x9d in position 9796: character maps to <undefined>

I tried to resolve this issue but i have no Idea about the reason.
First of all I was thinking that's because the files charset is of type binary:

file -i BWG.pdf
BWG.pdf: application/pdf; charset=binary

But I have also other PDF files with charset binary

So I'm completely out of ideas.
Someone of you has any idea?

@tfeldmann
Copy link
Owner

organize uses textract under the hood. So you might check the output of:

textract file.pdf

You can also try installing another parser which is supported by textract:

pip install pdftotext

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants