Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot read arrays stored with "string" objects. #6

Open
oysteijo opened this issue Jan 6, 2020 · 4 comments
Open

Cannot read arrays stored with "string" objects. #6

oysteijo opened this issue Jan 6, 2020 · 4 comments
Labels
documentation Low priority Won't fix until someone really needs it.

Comments

@oysteijo
Copy link
Owner

oysteijo commented Jan 6, 2020

A = np.random.rand(32,16).astype(np.float32)
B = np.random.rand(16, 8).astype(np.float32)
C = np.random.rand(8, 4).astype(np.float32)
activations = np.array(["sigmoid", "tanh", "softmax"])
np.savez("nn.npz", A,B,C,activations)

The above generated .npz can be read perfectly with example.c, however:

np.savez("nn_order.npz", activations, A,B,C)

cannot be read properly. Like it only reads the first array, and then stops....

@oysteijo
Copy link
Owner Author

oysteijo commented Jan 7, 2020

It just doesn't work with type <U. It seems like each unicode is stored with four bytes. Hmmm....

@oysteijo
Copy link
Owner Author

oysteijo commented Jan 7, 2020

>>> activations = np.array(["sigmoid", "tanh", "softmax"])
>>> np.save("act.npy", activations)
>>> exit()
oystein@lt-955213:~/simd_neuralnet/c_npy/example$ xxd act.npy
00000000: 934e 554d 5059 0100 4600 7b27 6465 7363  .NUMPY..F.{'desc
00000010: 7227 3a20 273c 5537 272c 2027 666f 7274  r': '<U7', 'fort
00000020: 7261 6e5f 6f72 6465 7227 3a20 4661 6c73  ran_order': Fals
00000030: 652c 2027 7368 6170 6527 3a20 2833 2c29  e, 'shape': (3,)
00000040: 2c20 7d20 2020 2020 2020 2020 2020 200a  , }            .
00000050: 7300 0000 6900 0000 6700 0000 6d00 0000  s...i...g...m...
00000060: 6f00 0000 6900 0000 6400 0000 7400 0000  o...i...d...t...
00000070: 6100 0000 6e00 0000 6800 0000 0000 0000  a...n...h.......
00000080: 0000 0000 0000 0000 7300 0000 6f00 0000  ........s...o...
00000090: 6600 0000 7400 0000 6d00 0000 6100 0000  f...t...m...a...
000000a0: 7800 0000                                x...```

@oysteijo
Copy link
Owner Author

oysteijo commented Jan 8, 2020

If I store a file with plain ascii, it will be readable.

>>> activations = np.array(["sigmoid", "tanh", "softmax"]).astype('S')

The file will then be stored like this:

00000000: 934e 554d 5059 0100 4600 7b27 6465 7363  .NUMPY..F.{'desc
00000010: 7227 3a20 277c 5337 272c 2027 666f 7274  r': '|S7', 'fort
00000020: 7261 6e5f 6f72 6465 7227 3a20 4661 6c73  ran_order': Fals
00000030: 652c 2027 7368 6170 6527 3a20 2833 2c29  e, 'shape': (3,)
00000040: 2c20 7d20 2020 2020 2020 2020 2020 200a  , }            .
00000050: 7369 676d 6f69 6474 616e 6800 0000 736f  sigmoidtanh...so
00000060: 6674 6d61 78                             ftmax

If you then take it back in Python, you can get the Python str type (instead of bytes), by simply doing:

>>> act = np.load("act.npy")
>>> act
array([b'sigmoid', b'tanh', b'softmax'],
      dtype='|S7')
>>> [ o.decode("ascii") for o in act]
['sigmoid', 'tanh', 'softmax']

OK. So this must be considered a workaround rather than a solution to the problem. The Numpy documentation does not recommend using 'S' (or 'a'. which gives the same result), so this workaround is a bit weak.

I just think we have to admit that c_npy is not general, as it only supports numerical types and bytes, and not all python types.

@oysteijo oysteijo added the bug label Jan 8, 2020
@oysteijo oysteijo changed the title Cannot read numeric arrays stored after "string"-array. Cannot read arrays stored with "string" objects. Jan 8, 2020
@oysteijo oysteijo added documentation Low priority Won't fix until someone really needs it. and removed bug labels Feb 15, 2020
@oysteijo
Copy link
Owner Author

Changing label to "Documentation" and "Low priority" since this is rather a known lacking feature and not really a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Low priority Won't fix until someone really needs it.
Projects
None yet
Development

No branches or pull requests

1 participant