'charmap' codec can't decode #154

Cinux90 · 2021-12-08T16:48:44Z

Hi All,

while starting to use organize I will setup my rules before execute it on productive files.
Therefore I started to set up a easy config file:

rules:
  - folders: ~/tmp_doc-test
    subfolders: false
    filters:
      - extension: pdf 
      - filecontent: "Entgeltbescheinigung"
    actions:
      - echo: "Found PDF!"
      - copy: "~/tmp_doc-test/sortiert/Lohnzettel/"

And execute it as usual:

organize run

For some files i face Following issues:

  File BWG.pdf:
    - (FileContent) ERROR! 'charmap' codec can't decode byte 0x9d in position 9796: character maps to <undefined>

I tried to resolve this issue but i have no Idea about the reason.
First of all I was thinking that's because the files charset is of type binary:

file -i BWG.pdf
BWG.pdf: application/pdf; charset=binary

But I have also other PDF files with charset binary

So I'm completely out of ideas.
Someone of you has any idea?

The text was updated successfully, but these errors were encountered:

tfeldmann · 2022-01-28T09:30:09Z

organize uses textract under the hood. So you might check the output of:

textract file.pdf

You can also try installing another parser which is supported by textract:

pip install pdftotext

tfeldmann added the awaiting feedback label Feb 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'charmap' codec can't decode #154

'charmap' codec can't decode #154

Cinux90 commented Dec 8, 2021

tfeldmann commented Jan 28, 2022

'charmap' codec can't decode #154

'charmap' codec can't decode #154

Comments

Cinux90 commented Dec 8, 2021

tfeldmann commented Jan 28, 2022