Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding error on windows #270

Open
imgaojun opened this issue Dec 14, 2019 · 0 comments
Open

Encoding error on windows #270

imgaojun opened this issue Dec 14, 2019 · 0 comments
Labels
enhancement New feature or request topic: data Issue about data loader modules

Comments

@imgaojun
Copy link
Contributor

I can not load utf-8 file while building my vocabulary or loading my dataset because gbk is used by default on windows. I added a new option to allow manually setting encoding PairedTextData. #269

$ python main.py 
Traceback (most recent call last):
  File "main.py", line 62, in <module>
    main()
  File "main.py", line 28, in main
    hparams=config_data.train, device=device)
  File "C:\Users\gaojun4ever\Miniconda3\lib\site-packages\texar\torch\data\data\paired_text_data.py", line 140, in __init__
    eos_token=src_hparams.eos_token)
  File "C:\Users\gaojun4ever\Miniconda3\lib\site-packages\texar\torch\data\vocabulary.py", line 103, in __init__
    = self.load(self._filename)
  File "C:\Users\gaojun4ever\Miniconda3\lib\site-packages\texar\torch\data\vocabulary.py", line 119, in load
    vocab = list(line.strip() for line in vocab_file)
  File "C:\Users\gaojun4ever\Miniconda3\lib\site-packages\texar\torch\data\vocabulary.py", line 119, in <genexpr>
    vocab = list(line.strip() for line in vocab_file)
UnicodeDecodeError: 'gbk' codec can't decode byte 0x8c in position 2: illegal multibyte sequence

@imgaojun imgaojun changed the title Add a new option to allow manually setting encoding Encoding error on windows Dec 14, 2019
@gpengzhi gpengzhi added enhancement New feature or request topic: data Issue about data loader modules labels Dec 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request topic: data Issue about data loader modules
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants