Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added ordinal variable type to independent variables #45

Merged
merged 19 commits into from Oct 25, 2016

Conversation

xulaus
Copy link
Collaborator

@xulaus xulaus commented Oct 17, 2016

No description provided.

@xulaus xulaus force-pushed the feature/ordinal_variables branch 3 times, most recently from 2ed6136 to f967b4c Compare October 17, 2016 16:46
'order to them')
var.add_argument('--ordinal-variables', type=str, nargs='*',
help='The names of independent variables to use that '
'have an intrinsic order but a finite amount of states')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooo very nice descriptions 👍

data = pd.DataFrame(raw_data_list)
data = data.rename(columns=data.loc[0]).iloc[1:]
else:
print('Uknown file type')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*Unknown

if minmax1[1] == minmax2[0]
]
if self._nan in self._arr:
self._possible_groups += list(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be a list wrapped around it?

self.alpha_merge = alpha_merge
self.max_depth = max_depth
self.min_parent_node_size = min_parent_node_size
self.min_child_node_size = min_child_node_size
self.split_titles = split_titles or []
self.vectorised_array = []
for ind in range(0, ndarr.shape[1]):
self.vectorised_array.append(NominalColumn(ndarr[:, ind]))
variable_types = variable_types or ['nominal'] * ndarr.shape[1]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default to nominal?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For backward compatibility, but this should be removed in 3.0.0

sub_data_columns = [('combinations', object), ('p', float), ('chi', float)]
sub_data = np.array([(None, 0, 1)]*size, dtype=sub_data_columns, order='F')
for j, comb in groupings:
choice = None
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comma separated, you no like?

""" Test fixture class for deep copy method """
def setUp(self):
""" Setup for copy tests"""
# Use string so numpy array dtype is object and may store references
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what string?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy paste error :X

@@ -0,0 +1,161 @@
"""
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtype Object tests?
What happens when strings are passed in?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strings aren't supported as they have no strict ordering. Alphabetical is too weak an assumption to make.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah but what about numbers stored as objects?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do some jiggery pokery with casting to force them to ints. You are right in that I need tests for that though

@@ -68,7 +68,7 @@ def test_best_split_with_combination():
assert list_ordered_equal(ndarr, orig_ndarr), 'Calling chaid should have no side affects for original numpy arrays'
assert list_ordered_equal(arr, orig_arr), 'Calling chaid should have no side affects for original numpy arrays'
assert split.column_id == 0, 'Identifies correct column to split on'
assert list_unordered_equal(split.split_map, [[1], [2, 3]]), 'Correctly identifies catagories'
assert list_unordered_equal(split.split_map, [[1], [2], [3]]), 'Correctly identifies catagories'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been passing falsely on master for a while. The stricter list comparators picked it up. I assumed the numbers output by master for the last however many versions is correct and that the spec needed updating.

Previously the comparators were only checking the lists up until the smaller list ran out of elements. so when comparing [[1], [2, 3]] and [[1], [2], [3]] it was actually comparing [[1], [2, 3]] and [[1], [2]]. As it is recursive, this went to [[1], [2]] and [[1], [2]]

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right 👍

@@ -55,7 +55,8 @@ class Tree(object):
the threshold value for the maximum number of levels after the root
node in the tree (default 2)
min_parent_node_size : float
the threshold value of the number of respondents that the node must
the threshold value of the number of respondents tha
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newb

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me no program good

@xulaus xulaus force-pushed the feature/ordinal_variables branch 2 times, most recently from 5a74053 to ce0f3f6 Compare October 18, 2016 15:19
@Rambatino
Copy link
Owner

Rambatino commented Oct 22, 2016

Hmm it appears we have a numpy sorting issue in python3: ar.sort()

        if optional_indices:
            perm = ar.argsort(kind='mergesort' if return_index else 'quicksort')
            aux = ar[perm]
        else:
>           ar.sort()
E           TypeError: unorderable types: NoneType() > float()

/usr/local/lib/python3.5/site-packages/numpy/lib/arraysetops.py:198: TypeError
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> /usr/local/lib/python3.5/site-packages/numpy/lib/arraysetops.py(198)unique()
-> ar.sort()
(Pdb) ar.sort()
*** TypeError: unorderable types: NoneType() > float()
(Pdb) ar
array([1.0, 2.0, 3.0, 3.0, 3.0, 3.0, 4.0, 5.0, 10.0, None], dtype=object)

And here's a discussion on numpy:

numpy/numpy#641

@xulaus xulaus merged commit 2291318 into master Oct 25, 2016
@xulaus xulaus deleted the feature/ordinal_variables branch October 25, 2016 13:11
@Rambatino
Copy link
Owner

@xulaus argh dealing with this again in python 3. So weird when I went on the numpy issues, found this issue and then my CHAID was linked in it! numpy/numpy#641

@Rambatino Rambatino changed the title Ordinal variables Added ordinal variable type to independent variables May 7, 2017
else:
for x in np.unique(self._arr):
self._groupings[x] = list(groupings[x])
self._nan = np.array([np.nan]).astype(int)[0]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xulaus can you remember what this line is doing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly. I think we were using nan as a sentinel value that can combine with any item in the ordinal but we wanted the arrays to all be of the same dtype so we did this force cast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants