Invalid data and segfault when reading past the size of file with fromfile on Ubuntu 16.04 #12300

amuresan · 2018-10-31T10:04:41Z

fromfile invalid data and sometimes segfault if reading past the end of a file i.e. it does not check if reading will go past the file end. This issue leads to a segfault on Ubuntu 16.04, but seems to not segfault on OSX.

Reproducing code example:

import numpy as np

def test_read_from_file():
    # create an empty file named `empty.bin`
    filename = 'empty.bin'
    open(filename, 'a').close()

    # read large chunk of data, past the end of the file
    dtype = [('data', '<f4', 500,)]
    count = 100000000

    with open(filename, 'rb') as fh:
        data = np.fromfile(fh, dtype, count)

    print(data.shape)

Error message:

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

Numpy/Python version information:

platform linux -- Python 3.6.6, pytest-3.8.2, py-1.6.0, pluggy-0.7.1

The text was updated successfully, but these errors were encountered:

seberg · 2018-10-31T22:10:27Z

Just to note, reproducable on 1.15.3. I guess we know the size, so this should just raise an error, or read the whole file. If this works silently on some systems, maybe we should put a release note just in case (I still would say we can just fix it).

EDIT: I would tend to error, just thought whole file might be an option because of indexing, but indexing is a bit special in this regard.

amuresan · 2018-11-01T09:54:35Z

Agreed, raising an error sounds like a good idea. There still is something to be said about partial reads, which could be handled in two ways:

read as many data records as possible till either the end of file or the count in fromfile is reached, but then we need a mechanism for explicitly returning the actual number of records that were read (implicitly this should be visible in the shape of the resulting array). An error can still be raised because it is not the normal usage scenario.
don't allow partial reads i.e. raise if size to be read does not fit in the rest of the file.

I don't know which scenarios fits better with the numpy philosophy, but the first option sounds more useful.

seberg · 2018-11-01T10:08:56Z

I think an error is most reasonable. What I am not sure about right now is if fromfile supports file like objects that do not have a known size, or what currently happens in the case of non-empty sep kwarg.

@amuresan the code for fromfile is in C, but if you have a bit of time, we are always very happy about pull requests, and it seems like a reasonable difficulty to dabble a bit into the C (Python) API.

simongibbons · 2018-11-08T22:00:05Z

I believe the problem here is actually that on ubuntu you are getting a MemoryError that is being handled incorrectly and causing the segfault.

A PR with a fix is here: #12354

There is a problem with the way in which we handle errors which occur in the call to `PyArray_FromFile` in `np.fromfile`. The problem here is twofold. 1. The return value isn't checked, therefore if we reach the fail block, we will attempt a DECREF on a NULL and go down in flames. 2. The cleanup code on the filepointers (most notabily the call to `npy_PyFile_DupClose2`) assumes that there is no error set to work. This PR addresses these issues 1. By adding a NULL check to the fail block to ensure we don't attempt a DECREF on a NULL. 2. By saving the error state before attempting the cleanup code on the file descriptor, and then restoring it after. Fixes: numpy#12300

seberg added 00 - Bug component: numpy._core labels Oct 31, 2018

simongibbons mentioned this issue Nov 8, 2018

BUG: Fix segfault when an error occurs in np.fromfile #12354

Merged

eric-wieser closed this as completed in #12354 Nov 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid data and segfault when reading past the size of file with fromfile on Ubuntu 16.04 #12300

Invalid data and segfault when reading past the size of file with fromfile on Ubuntu 16.04 #12300

amuresan commented Oct 31, 2018 •

edited

seberg commented Oct 31, 2018 •

edited

amuresan commented Nov 1, 2018

seberg commented Nov 1, 2018

simongibbons commented Nov 8, 2018

Invalid data and segfault when reading past the size of file with fromfile on Ubuntu 16.04 #12300

Invalid data and segfault when reading past the size of file with fromfile on Ubuntu 16.04 #12300

Comments

amuresan commented Oct 31, 2018 • edited

Reproducing code example:

Error message:

Numpy/Python version information:

seberg commented Oct 31, 2018 • edited

amuresan commented Nov 1, 2018

seberg commented Nov 1, 2018

simongibbons commented Nov 8, 2018

amuresan commented Oct 31, 2018 •

edited

seberg commented Oct 31, 2018 •

edited