Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specifying dim_contiguous causes seg faults in PatchObliqueForestClassifier #215

Open
2 of 10 tasks
j1c opened this issue Feb 7, 2024 · 2 comments
Open
2 of 10 tasks
Labels
bug Something isn't working

Comments

@j1c
Copy link
Member

j1c commented Feb 7, 2024

Checklist

  • I have verified that the issue exists against the main branch.
  • I have read the relevant section in the contribution guide on reporting bugs.
  • I have checked the issues list for similar or identical bug reports.
  • I have checked the pull requests list for existing proposed fixes.
  • I have checked the CHANGELOG and the commit log to find out if the bug was already fixed in the main branch.
  • I have included in the "Description" section below a traceback from any exceptions related to this bug.
  • I have included in the "Related issues or possible duplicates" section beloew all related issues and possible duplicate issues (If there are none, check this box anyway).
  • I have included in the "Environment" section below the name of the operating system and Python version that I was using when I discovered this bug.
  • I have included in the "Environment" section below the output of pip freeze.
  • I have included in the "Steps to reproduce" section below a minimally reproducible example.

Description

When there are non-contiguous dimensions, there are segmentation faults when fitting. See below for example.

Setting dim_contiguous=(True, True) runs fine, but dim_contiguous=(False, True) leads to seg faults.

Python traceback:

Related issues or possible duplicates

  • None

Environment

OS: Ubuntu

Python version: 3.9.16

scikit-tree version: 0.6.1

Output of pip freeze:

Steps to reproduce

Example source:

import numpy as np
from sktree import PatchObliqueRandomForestClassifier

n, a, b = 100, 10, 10

x = np.random.normal(size=(n, a, b))
y = np.random.binomial(1, 0.5, size=(n))

rf = PatchObliqueRandomForestClassifier(
    n_estimators=100,
    min_patch_dims=[1, 1],
    max_patch_dims=[4, 4],
    dim_contiguous=(False, True),
    data_dims=(a, b),
    n_jobs=-1,
)

rf.fit(x.reshape(100, -1), y)

@j1c j1c added the bug Something isn't working label Feb 7, 2024
@adam2392
Copy link
Collaborator

adam2392 commented Feb 7, 2024

It seems this test works when n_estimators is low, but not when I ramp to 50 or maybe 100. This makes me think there is a sampling edge case error where we go out of the bounds of the image:

def test_contiguous_and_discontiguous_patch_forest():
    """Test regression reported in https://github.com/neurodata/scikit-tree/issues/215."""
    n, a, b = 100, 10, 10
    x = np.random.normal(size=(n, a, b))
    y = np.random.binomial(1, 0.5, size=(n))

    est = PatchObliqueRandomForestClassifier(
        n_estimators=20,
        min_patch_dims=[1, 1],
        max_patch_dims=[4, 4],
        dim_contiguous=(False, True),
        data_dims=(a, b),
        n_jobs=-1
    )

    est.fit(x.reshape(100, -1), y)

@adam2392
Copy link
Collaborator

adam2392 commented Feb 7, 2024

I think the issue is isolated to when dim_contiguous=False you can test this with a single tree iterated over 100-1000 different random seeds.
When dim_contiguous=True , seems there is consistently no error.

It must occur in accessing an index that is not valid in one of two areas:

  • if self._discontiguous:
    # fill with values 0, 1, ..., dimension - 1
    for i in range(0, self.data_dims[0]):
    self._index_data_buffer[i] = i
    # then shuffle indices using Fisher-Yates
    for i in range(num_rows):
    j = rand_int(0, num_rows - i, random_state)
    self._index_data_buffer[i], self._index_data_buffer[j] = \
    self._index_data_buffer[j], self._index_data_buffer[i]
    # now select the first `patch_dims[0]` indices
    for i in range(num_rows):
    self._index_patch_buffer[i] = self._index_data_buffer[i]
  • if self._discontiguous is True:
    for dim_idx in range(self.ndim):
    if self.dim_contiguous[dim_idx] is True:
    continue
    # determine the "row" we are currently on
    # other_dims_offset = 1
    # for idx in range(dim_idx + 1, self.ndim):
    # other_dims_offset *= self.data_dims[idx]
    # row_index = self.unraveled_patch_point[dim_idx] % other_dims_offset
    # determine the "row" we are currently on
    other_dims_offset = 1
    for idx in range(dim_idx + 1, self.ndim):
    if not self.dim_contiguous[idx]:
    other_dims_offset *= self.data_dims[idx]
    row_index = 0
    for idx in range(dim_idx + 1, self.ndim):
    if not self.dim_contiguous[idx]:
    row_index += (
    (self.unraveled_patch_point[idx] // other_dims_offset) %
    self.patch_dims_buff[idx]
    ) * other_dims_offset
    other_dims_offset //= self.data_dims[idx]
    # assign random row index now
    self.unraveled_patch_point[dim_idx] = self._index_patch_buffer[row_index]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants