Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VLenUTF8().encode(buffer) fails is buffer is read-only #514

Open
ivirshup opened this issue Mar 12, 2024 · 8 comments · May be fixed by #515
Open

VLenUTF8().encode(buffer) fails is buffer is read-only #514

ivirshup opened this issue Mar 12, 2024 · 8 comments · May be fixed by #515

Comments

@ivirshup
Copy link

Minimal, reproducible code sample, a copy-pastable example if possible

import numpy as np
from numcodecs import VLenUTF8

codec = VLenUTF8()

a = np.array(list("abc"), dtype=object)
a.flags.writeable = False

codec.encode(a)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[39], line 9
      6 a = np.array(list("abc"), dtype=object)
      7 a.flags.writeable = False
----> 9 codec.encode(a)

File numcodecs/vlen.pyx:87, in numcodecs.vlen.VLenUTF8.encode()

File <stringsource>:663, in View.MemoryView.memoryview_cwrapper()

File <stringsource>:353, in View.MemoryView.memoryview.__cinit__()

ValueError: buffer source array is read-only

Problem description

Short description: this shouldn't error, as the codec shouldn't care whether it can write to the buffer it's passed.

Long description:

  • Pandas 3.0 will set copy on write by default
  • AnnData saves pandas dataframes to zarr stores
  • Encoding string columns now errors since numcodecs throws an error
  • This is made more urgent by dask-dataframe setting copy-on-write=True on import in the latest release

I can't think of a reason that .encode would need to modify the buffer, so it shouldn't care that it's read-only.

Version and installation information

Please provide the following:

  • Value of numcodecs.__version__ '0.12.1'
  • Version of Python interpreter Python 3.11.7 | packaged by conda-forge | (main, Dec 23 2023, 14:43:09) [GCC 12.3.0]
  • Operating system (Linux/Windows/Mac) Linux-5.15.0-100-generic-x86_64-with-glibc2.35
  • How NumCodecs was installed pip into conda

Also, if you think it might be relevant, please provide the output from pip list or
conda list depending on which was used to install NumCodecs.

conda list
# packages in environment at /mnt/workspace/mambaforge/envs/scanpy-dev:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
anndata                   0.10.5.post1             pypi_0    pypi
annoy                     1.17.3                   pypi_0    pypi
array-api-compat          1.4.1                    pypi_0    pypi
asciitree                 0.3.3                    pypi_0    pypi
asttokens                 2.4.1                    pypi_0    pypi
atk-1.0                   2.38.0               hd4edc92_1    conda-forge
attrs                     23.2.0                   pypi_0    pypi
bokeh                     3.3.4                    pypi_0    pypi
bzip2                     1.0.8                hd590300_5    conda-forge
ca-certificates           2023.11.17           hbcca054_0    conda-forge
cairo                     1.18.0               h3faef2a_0    conda-forge
click                     8.1.7                    pypi_0    pypi
cloudpickle               3.0.0                    pypi_0    pypi
comm                      0.2.1                    pypi_0    pypi
contourpy                 1.2.0                    pypi_0    pypi
cycler                    0.12.1                   pypi_0    pypi
cython                    3.0.8                    pypi_0    pypi
dask                      2024.3.0                 pypi_0    pypi
dask-expr                 1.0                      pypi_0    pypi
dask-glm                  0.3.2                    pypi_0    pypi
dask-ml                   2023.3.24                pypi_0    pypi
debugpy                   1.8.0                    pypi_0    pypi
decorator                 5.1.1                    pypi_0    pypi
deprecated                1.2.14                   pypi_0    pypi
distributed               2024.1.1                 pypi_0    pypi
execnet                   2.0.2                    pypi_0    pypi
executing                 2.0.1                    pypi_0    pypi
expat                     2.5.0                hcb278e6_1    conda-forge
fasteners                 0.19                     pypi_0    pypi
fbpca                     1.0                      pypi_0    pypi
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 h77eed37_1    conda-forge
fontconfig                2.14.2               h14ed4e7_0    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
fonttools                 4.47.2                   pypi_0    pypi
freetype                  2.12.1               h267a509_2    conda-forge
fribidi                   1.0.10               h36c2ea0_0    conda-forge
fsspec                    2023.12.2                pypi_0    pypi
future                    0.18.3                   pypi_0    pypi
gdk-pixbuf                2.42.10              h829c605_4    conda-forge
geosketch                 1.2                      pypi_0    pypi
gettext                   0.21.1               h27087fc_0    conda-forge
giflib                    5.2.1                h0b41bf4_3    conda-forge
gprof2dot                 2022.7.29                pypi_0    pypi
graphite2                 1.3.13            h58526e2_1001    conda-forge
graphtools                1.5.3                    pypi_0    pypi
graphviz                  9.0.0                h78e8752_1    conda-forge
gtk2                      2.24.33              h7f000aa_3    conda-forge
gts                       0.7.6                h977cf35_4    conda-forge
h5py                      3.10.0                   pypi_0    pypi
harfbuzz                  8.3.0                h3d44ed6_0    conda-forge
harmonypy                 0.0.9                    pypi_0    pypi
icu                       73.2                 h59595ed_0    conda-forge
igraph                    0.11.3                   pypi_0    pypi
imageio                   2.33.1                   pypi_0    pypi
importlib-metadata        7.0.1                    pypi_0    pypi
iniconfig                 2.0.0                    pypi_0    pypi
intervaltree              3.1.0                    pypi_0    pypi
ipykernel                 6.29.0                   pypi_0    pypi
ipython                   8.20.0                   pypi_0    pypi
jedi                      0.19.1                   pypi_0    pypi
jinja2                    3.1.3                    pypi_0    pypi
joblib                    1.3.2                    pypi_0    pypi
jupyter-client            8.6.0                    pypi_0    pypi
jupyter-core              5.7.1                    pypi_0    pypi
kiwisolver                1.4.5                    pypi_0    pypi
lazy-loader               0.3                      pypi_0    pypi
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
legacy-api-wrap           1.4                      pypi_0    pypi
leidenalg                 0.10.2                   pypi_0    pypi
lerc                      4.0.0                h27087fc_0    conda-forge
libdeflate                1.19                 hd590300_0    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 13.2.0               h807b86a_4    conda-forge
libgd                     2.3.3                h119a65a_9    conda-forge
libglib                   2.78.3               h783c2da_0    conda-forge
libgomp                   13.2.0               h807b86a_4    conda-forge
libiconv                  1.17                 hd590300_2    conda-forge
libjpeg-turbo             3.0.0                hd590300_1    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libpng                    1.6.39               h753d276_0    conda-forge
librsvg                   2.56.3               he3f83f7_1    conda-forge
libsqlite                 3.44.2               h2797004_0    conda-forge
libstdcxx-ng              13.2.0               h7e041cc_4    conda-forge
libtiff                   4.6.0                ha9c0a0a_2    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libwebp                   1.3.2                h658648e_1    conda-forge
libwebp-base              1.3.2                hd590300_0    conda-forge
libxcb                    1.15                 h0b41bf4_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libxml2                   2.12.4               h232c23b_1    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
llvmlite                  0.41.1                   pypi_0    pypi
locket                    1.0.0                    pypi_0    pypi
magic-impute              3.0.0                    pypi_0    pypi
markdown-it-py            3.0.0                    pypi_0    pypi
markupsafe                2.1.4                    pypi_0    pypi
matplotlib                3.8.2                    pypi_0    pypi
matplotlib-inline         0.1.6                    pypi_0    pypi
matplotx                  0.3.10                   pypi_0    pypi
mdurl                     0.1.2                    pypi_0    pypi
memory-profiler           0.61.0                   pypi_0    pypi
msgpack                   1.0.7                    pypi_0    pypi
multipledispatch          1.0.0                    pypi_0    pypi
natsort                   8.4.0                    pypi_0    pypi
ncurses                   6.4                  h59595ed_2    conda-forge
nest-asyncio              1.6.0                    pypi_0    pypi
networkx                  3.2.1                    pypi_0    pypi
numba                     0.58.1                   pypi_0    pypi
numcodecs                 0.12.1                   pypi_0    pypi
numpy                     1.26.3                   pypi_0    pypi
openssl                   3.2.0                hd590300_1    conda-forge
packaging                 23.2                     pypi_0    pypi
pandas                    2.2.0                    pypi_0    pypi
pango                     1.50.14              ha41ecd1_2    conda-forge
parso                     0.8.3                    pypi_0    pypi
partd                     1.4.1                    pypi_0    pypi
patsy                     0.5.6                    pypi_0    pypi
pbr                       6.0.0                    pypi_0    pypi
pcre2                     10.42                hcad00b1_0    conda-forge
perfplot                  0.10.2                   pypi_0    pypi
pexpect                   4.9.0                    pypi_0    pypi
pillow                    10.2.0                   pypi_0    pypi
pip                       23.3.2             pyhd8ed1ab_0    conda-forge
pixman                    0.43.2               h59595ed_0    conda-forge
platformdirs              4.1.0                    pypi_0    pypi
pluggy                    1.4.0                    pypi_0    pypi
profimp                   0.1.0                    pypi_0    pypi
prompt-toolkit            3.0.43                   pypi_0    pypi
psutil                    5.9.8                    pypi_0    pypi
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
ptyprocess                0.7.0                    pypi_0    pypi
pure-eval                 0.2.2                    pypi_0    pypi
pyarrow                   15.0.1                   pypi_0    pypi
pygments                  2.17.2                   pypi_0    pypi
pygsp                     0.5.1                    pypi_0    pypi
pynndescent               0.5.11                   pypi_0    pypi
pyparsing                 3.1.1                    pypi_0    pypi
pytest                    7.4.4                    pypi_0    pypi
pytest-mock               3.12.0                   pypi_0    pypi
pytest-nunit              1.0.4                    pypi_0    pypi
pytest-profiling          1.7.0                    pypi_0    pypi
pytest-xdist              3.5.0                    pypi_0    pypi
python                    3.11.7          hab00c5b_1_cpython    conda-forge
python-dateutil           2.8.2                    pypi_0    pypi
python-graphviz           0.20.1                   pypi_0    pypi
pytz                      2023.4                   pypi_0    pypi
pyyaml                    6.0.1                    pypi_0    pypi
pyzmq                     25.1.2                   pypi_0    pypi
readline                  8.2                  h8228510_1    conda-forge
rich                      13.7.1                   pypi_0    pypi
scanorama                 1.7.4                    pypi_0    pypi
scanpy                    1.10.0.dev197+g96e19540          pypi_0    pypi
scikit-image              0.22.0                   pypi_0    pypi
scikit-learn              1.4.0                    pypi_0    pypi
scikit-misc               0.3.1                    pypi_0    pypi
scipy                     1.12.0                   pypi_0    pypi
scprep                    1.1.0                    pypi_0    pypi
scrublet                  0.2.3                    pypi_0    pypi
seaborn                   0.13.2                   pypi_0    pypi
session-info              1.0.0                    pypi_0    pypi
setuptools                69.0.3             pyhd8ed1ab_0    conda-forge
six                       1.16.0                   pypi_0    pypi
sortedcontainers          2.4.0                    pypi_0    pypi
sparse                    0.15.1                   pypi_0    pypi
stack-data                0.6.3                    pypi_0    pypi
statsmodels               0.14.1                   pypi_0    pypi
stdlib-list               0.10.0                   pypi_0    pypi
tasklogger                1.2.0                    pypi_0    pypi
tblib                     3.0.0                    pypi_0    pypi
texttable                 1.7.0                    pypi_0    pypi
threadpoolctl             3.2.0                    pypi_0    pypi
tifffile                  2023.12.9                pypi_0    pypi
tk                        8.6.13          noxft_h4845f30_101    conda-forge
toolz                     0.12.1                   pypi_0    pypi
tornado                   6.4                      pypi_0    pypi
tqdm                      4.66.1                   pypi_0    pypi
traitlets                 5.14.1                   pypi_0    pypi
tzdata                    2023.4                   pypi_0    pypi
umap-learn                0.5.5                    pypi_0    pypi
urllib3                   2.1.0                    pypi_0    pypi
wcwidth                   0.2.13                   pypi_0    pypi
wheel                     0.42.0             pyhd8ed1ab_0    conda-forge
wrapt                     1.16.0                   pypi_0    pypi
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.1.1                hd590300_0    conda-forge
xorg-libsm                1.2.4                h7391055_0    conda-forge
xorg-libx11               1.8.7                h8ee46fc_0    conda-forge
xorg-libxau               1.0.11               hd590300_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h0b41bf4_2    conda-forge
xorg-libxrender           0.9.11               hd590300_0    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h0b41bf4_1003    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xyzservices               2023.10.1                pypi_0    pypi
xz                        5.2.6                h166bdaf_0    conda-forge
zarr                      2.17.1                   pypi_0    pypi
zict                      3.0.0                    pypi_0    pypi
zipp                      3.17.0                   pypi_0    pypi
zlib                      1.2.13               hd590300_5    conda-forge
zstd                      1.5.5                hfc55251_0    conda-forge
@martindurant
Copy link
Member

martindurant commented Mar 12, 2024

Does object[:] input_values allow for static (meaning we promise not to change the values, as opposed to changing the value of the pointer) ? In true C-land, we cannot truly guarantee that code will not write to any buffer passed.

@ivirshup
Copy link
Author

Do you mean like:

      Error compiling Cython file:
      ------------------------------------------------------------
      ...
          @cython.wraparound(False)
          @cython.boundscheck(False)
          def encode(self, buf):
              cdef:
                  Py_ssize_t i, l, n_items, data_length, total_length
                  const object[:] values
                  ^
      ------------------------------------------------------------
      
      numcodecs/vlen.pyx:351:12: Const/volatile base type cannot be a Python object

Apparently not.

I would have thought that this is handleable since pandas is presumably passing these arrays into cython code.

@martindurant
Copy link
Member

Pandas has recently started wrapping the low-level arrays into immutable ones, which is maybe why you are seeing this now. I assume they internally access the low-level writable buffer somewhere. I think this is part of their move towards arrow, since arrow buffers are supposed to be immutable (which makes sense when there are offsets/indexes around, rather than just values).

@ivirshup
Copy link
Author

Pandas has recently started wrapping the low-level arrays into immutable ones

It looks like if you access the .array backing a Series you can get a mutable interface to the memory via the public API. Unclear if I should rely on that though.

@martindurant
Copy link
Member

If you're not doing any ._data or similar, I don't see why not. It would fail for some extension array that doesn't offer that API, but some extension arrays wouldn't be appropriate input anyway.

Or we could require the caller to always provide a raw, writable numpy-like.

@ivirshup ivirshup linked a pull request Mar 12, 2024 that will close this issue
7 tasks
@ivirshup
Copy link
Author

My concern is that pandas may not intentionally be giving me a writable view, and may change this behaviour in the future.

I was pointed at:

For how pandas deals with this case. AFAICT, it's basically changing the typing from a memoryview to a ndarray.

@ivirshup
Copy link
Author

I've opened a PR which should handle this on the numcodecs side. Does the approach look fine to you @martindurant?

@martindurant
Copy link
Member

Yes, I suppose it's fine. We should maybe document this somewhere, since having to make a copy of the data, even temporarily, may surprise some people.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants