Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

npyio.loadtxt is bytes-casting text file input, even with str dtype specified. #2715

Closed
Panoplos opened this issue Nov 8, 2012 · 11 comments
Closed

Comments

@Panoplos
Copy link

Panoplos commented Nov 8, 2012

Environment

  • Python: Version 3.3 (python.org release) on OS X Mountain Lion
  • numpy: Cloned from git master

When calling numpy.loadtxt on file containing strings as follows:

import numpy as np
datestxt = np.loadtxt("NYSE_dates.txt", dtype=str)
print(datetxt)

Where NYSE_dates.txt is simply a list of dates (could be anything really):

7/5/1962
7/6/1962
7/9/1962
...
12/29/2020
12/30/2020
12/31/2020

Output is:

["b'7/5/1962'" "b'7/6/1962'" "b'7/9/1962'" ..., "b'12/29/2020'"
 "b'12/30/2020'" "b'12/31/2020'"]

As you can see, all the strings have been bytes-casted, then stringified through conv, as you would get the same result from str(str('12/31/2020').encode('latin1')), per conv & compat.asbytes.

After looking at the code, it appears that all strings are bytes-casted with asbytes(...) pretty much throughout, as for example in split_line(...), so this must mean every routine in the module is broken.

@vejnar
Copy link

vejnar commented Apr 18, 2013

I also have that issue. This is very very annoying; basically you can't use loadtxt in Python3.

Temporary solution: I removed all asbytes() calls in the loadtxt method.

@charris
Copy link
Member

charris commented Apr 19, 2013

Yeah, I remember thinking something was fishy in there when I looked through the code.

@jonathanrocher
Copy link

For the record, I am running into the same issue with datetime64 inputs, leading to a parsing error of the form: Error parsing datetime string "b'2013-01-02'". To work around this, I had to create a converter for that column:

def decoder(input_bytes):
    return input_bytes.decode("ascii")

This would be fine in production code but is highly non-pretty for training material...

@juliantaylor juliantaylor added this to the 1.10 blockers milestone Jul 30, 2014
@charris charris modified the milestones: 1.11 blockers, 1.10 blockers Jun 21, 2015
@charris
Copy link
Member

charris commented Jun 21, 2015

Pushing off to 1.11.

@danizen
Copy link

danizen commented Dec 11, 2015

work-around - run iconv on the file first.

@charris
Copy link
Member

charris commented Jan 21, 2016

pushing off to 1.12.

@charris charris modified the milestones: 1.12.0 release, 1.11.0 blockers Jan 21, 2016
@rgommers rgommers modified the milestone: 1.12.0 release Feb 15, 2017
@paalge
Copy link

paalge commented Mar 3, 2017

I see that this is being pushed forward, but I find that is is a bug that should be addressed, and a fix seems to be easily implemented.

@Queuecumber
Copy link

Pretty shocking that this hasn't been fixed for 5 years

@mdickinson
Copy link
Contributor

mdickinson commented Dec 13, 2017

It looks as though this is working as desired in NumPy 1.13.3 (though I'm not sure which PR fixed it). Can this issue be closed?

>>> import io
>>> import numpy as np
>>> f = io.StringIO("7/5/1962\n7/6/1962\n")
>>> np.loadtxt(f, dtype=str)
array(['7/5/1962', '7/6/1962'],
      dtype='<U8')
>>> np.__version__
'1.13.3'

@mdickinson
Copy link
Contributor

Looks like this was fixed in #8349, in response to #8033.

@mattip
Copy link
Member

mattip commented Sep 4, 2018

Closing. Please reopen if needed.

@mattip mattip closed this as completed Sep 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests