New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added ordinal variable type to independent variables #45
Conversation
2ed6136
to
f967b4c
Compare
'order to them') | ||
var.add_argument('--ordinal-variables', type=str, nargs='*', | ||
help='The names of independent variables to use that ' | ||
'have an intrinsic order but a finite amount of states') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooo very nice descriptions 👍
data = pd.DataFrame(raw_data_list) | ||
data = data.rename(columns=data.loc[0]).iloc[1:] | ||
else: | ||
print('Uknown file type') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*Unknown
if minmax1[1] == minmax2[0] | ||
] | ||
if self._nan in self._arr: | ||
self._possible_groups += list( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be a list wrapped around it?
self.alpha_merge = alpha_merge | ||
self.max_depth = max_depth | ||
self.min_parent_node_size = min_parent_node_size | ||
self.min_child_node_size = min_child_node_size | ||
self.split_titles = split_titles or [] | ||
self.vectorised_array = [] | ||
for ind in range(0, ndarr.shape[1]): | ||
self.vectorised_array.append(NominalColumn(ndarr[:, ind])) | ||
variable_types = variable_types or ['nominal'] * ndarr.shape[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default to nominal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For backward compatibility, but this should be removed in 3.0.0
sub_data_columns = [('combinations', object), ('p', float), ('chi', float)] | ||
sub_data = np.array([(None, 0, 1)]*size, dtype=sub_data_columns, order='F') | ||
for j, comb in groupings: | ||
choice = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comma separated, you no like?
""" Test fixture class for deep copy method """ | ||
def setUp(self): | ||
""" Setup for copy tests""" | ||
# Use string so numpy array dtype is object and may store references |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy paste error :X
@@ -0,0 +1,161 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dtype Object tests?
What happens when strings are passed in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strings aren't supported as they have no strict ordering. Alphabetical is too weak an assumption to make.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah but what about numbers stored as objects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do some jiggery pokery with casting to force them to int
s. You are right in that I need tests for that though
@@ -68,7 +68,7 @@ def test_best_split_with_combination(): | |||
assert list_ordered_equal(ndarr, orig_ndarr), 'Calling chaid should have no side affects for original numpy arrays' | |||
assert list_ordered_equal(arr, orig_arr), 'Calling chaid should have no side affects for original numpy arrays' | |||
assert split.column_id == 0, 'Identifies correct column to split on' | |||
assert list_unordered_equal(split.split_map, [[1], [2, 3]]), 'Correctly identifies catagories' | |||
assert list_unordered_equal(split.split_map, [[1], [2], [3]]), 'Correctly identifies catagories' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been passing falsely on master for a while. The stricter list comparators picked it up. I assumed the numbers output by master for the last however many versions is correct and that the spec needed updating.
Previously the comparators were only checking the lists up until the smaller list ran out of elements. so when comparing [[1], [2, 3]]
and [[1], [2], [3]]
it was actually comparing [[1], [2, 3]]
and [[1], [2]]
. As it is recursive, this went to [[1], [2]]
and [[1], [2]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right 👍
@@ -55,7 +55,8 @@ class Tree(object): | |||
the threshold value for the maximum number of levels after the root | |||
node in the tree (default 2) | |||
min_parent_node_size : float | |||
the threshold value of the number of respondents that the node must | |||
the threshold value of the number of respondents tha |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me no program good
5a74053
to
ce0f3f6
Compare
2fe82be
to
c03df76
Compare
Hmm it appears we have a numpy sorting issue in python3: if optional_indices:
perm = ar.argsort(kind='mergesort' if return_index else 'quicksort')
aux = ar[perm]
else:
> ar.sort()
E TypeError: unorderable types: NoneType() > float()
/usr/local/lib/python3.5/site-packages/numpy/lib/arraysetops.py:198: TypeError
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> /usr/local/lib/python3.5/site-packages/numpy/lib/arraysetops.py(198)unique()
-> ar.sort()
(Pdb) ar.sort()
*** TypeError: unorderable types: NoneType() > float()
(Pdb) ar
array([1.0, 2.0, 3.0, 3.0, 3.0, 3.0, 4.0, 5.0, 10.0, None], dtype=object) And here's a discussion on numpy: |
@xulaus argh dealing with this again in python 3. So weird when I went on the numpy issues, found this issue and then my CHAID was linked in it! numpy/numpy#641 |
else: | ||
for x in np.unique(self._arr): | ||
self._groupings[x] = list(groupings[x]) | ||
self._nan = np.array([np.nan]).astype(int)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xulaus can you remember what this line is doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly. I think we were using nan as a sentinel value that can combine with any item in the ordinal but we wanted the arrays to all be of the same dtype so we did this force cast.
No description provided.