Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid data and segfault when reading past the size of file with fromfile on Ubuntu 16.04 #12300

Closed
amuresan opened this issue Oct 31, 2018 · 4 comments

Comments

@amuresan
Copy link

amuresan commented Oct 31, 2018

fromfile invalid data and sometimes segfault if reading past the end of a file i.e. it does not check if reading will go past the file end. This issue leads to a segfault on Ubuntu 16.04, but seems to not segfault on OSX.

Reproducing code example:

import numpy as np

def test_read_from_file():
    # create an empty file named `empty.bin`
    filename = 'empty.bin'
    open(filename, 'a').close()

    # read large chunk of data, past the end of the file
    dtype = [('data', '<f4', 500,)]
    count = 100000000

    with open(filename, 'rb') as fh:
        data = np.fromfile(fh, dtype, count)

    print(data.shape)

Error message:

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

Numpy/Python version information:

platform linux -- Python 3.6.6, pytest-3.8.2, py-1.6.0, pluggy-0.7.1

@seberg
Copy link
Member

seberg commented Oct 31, 2018

Just to note, reproducable on 1.15.3. I guess we know the size, so this should just raise an error, or read the whole file. If this works silently on some systems, maybe we should put a release note just in case (I still would say we can just fix it).

EDIT: I would tend to error, just thought whole file might be an option because of indexing, but indexing is a bit special in this regard.

@amuresan
Copy link
Author

amuresan commented Nov 1, 2018

Agreed, raising an error sounds like a good idea. There still is something to be said about partial reads, which could be handled in two ways:

  1. read as many data records as possible till either the end of file or the count in fromfile is reached, but then we need a mechanism for explicitly returning the actual number of records that were read (implicitly this should be visible in the shape of the resulting array). An error can still be raised because it is not the normal usage scenario.
  2. don't allow partial reads i.e. raise if size to be read does not fit in the rest of the file.

I don't know which scenarios fits better with the numpy philosophy, but the first option sounds more useful.

@seberg
Copy link
Member

seberg commented Nov 1, 2018

I think an error is most reasonable. What I am not sure about right now is if fromfile supports file like objects that do not have a known size, or what currently happens in the case of non-empty sep kwarg.

@amuresan the code for fromfile is in C, but if you have a bit of time, we are always very happy about pull requests, and it seems like a reasonable difficulty to dabble a bit into the C (Python) API.

@simongibbons
Copy link
Contributor

I believe the problem here is actually that on ubuntu you are getting a MemoryError that is being handled incorrectly and causing the segfault.

A PR with a fix is here: #12354

simongibbons added a commit to simongibbons/numpy that referenced this issue Nov 10, 2018
There is a problem with the way in which we handle
errors which occur in the call to `PyArray_FromFile`
in `np.fromfile`.

The problem here is twofold.

 1. The return value isn't checked, therefore if we reach
    the fail block, we will attempt a DECREF on a NULL and
    go down in flames.

 2. The cleanup code on the filepointers (most notabily the
    call to `npy_PyFile_DupClose2`) assumes that there is no
    error set to work.

This PR addresses these issues
 1. By adding a NULL check to the fail block to ensure we don't
    attempt a DECREF on a NULL.

 2. By saving the error state before attempting the cleanup code
    on the file descriptor, and then restoring it after.

Fixes: numpy#12300
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants