Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeEncodeError: 'charmap' codec can't encode characters in position 74-78: character maps to <undefined> #347

Open
geroldmeisinger opened this issue Sep 5, 2023 · 2 comments

Comments

@geroldmeisinger
Copy link

geroldmeisinger commented Sep 5, 2023

using laion2b-en-aesthetics65.parquet entry #3

"San Pedro: One Of Mother Nature's Most Powerful Psychedelics | Ayahuasca アヤワスカ | Scoop.it"

Error:

Traceback (most recent call last):
  File "%USERPROFILE%\miniconda3\envs\controlnet\Lib\site-packages\img2dataset\downloader.py", line 328, in download_shard
    sample_writer.write(
  File "%USERPROFILE%\miniconda3\envs\controlnet\Lib\site-packages\img2dataset\writer.py", line 280, in write
    f.write(str(caption))
  File "%USERPROFILE%\miniconda3\envs\controlnet\Lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode characters in position 74-78: character maps to <undefined>
Sample 3 failed to download: 'charmap' codec can't encode characters in position 74-78: character maps to <undefined>

Result:
jpg was downloaded
empty: 000000003.txt
missing: 000000003.json

Using Windows 10 with Miniconda

similar issue: #219

@rom1504
Copy link
Owner

rom1504 commented Sep 5, 2023 via email

@sraimund
Copy link

In my case it worked after specifying the encoding in writer.py:

def write(self, img_str, key, caption, meta):
  ...
  with self.fs.open(caption_filename, "w", encoding="utf-8") as f:
  ...
  with self.fs.open(meta_filename, "w", encoding="utf-8") as f:
  ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants