Added ordinal variable type to independent variables #45

xulaus · 2016-10-17T15:03:47Z

No description provided.

Rambatino · 2016-10-18T14:07:47Z

CHAID/__main__.py

+                     'order to them')
+    var.add_argument('--ordinal-variables', type=str, nargs='*',
+                     help='The names of independent variables to use that '
+                     'have an intrinsic order but a finite amount of states')


ooo very nice descriptions 👍

Rambatino · 2016-10-18T14:08:09Z

CHAID/__main__.py

+        data = pd.DataFrame(raw_data_list)
+        data = data.rename(columns=data.loc[0]).iloc[1:]
+    else:
+        print('Uknown file type')


Rambatino · 2016-10-18T14:10:36Z

CHAID/column.py

+                if minmax1[1] == minmax2[0]
+            ]
+            if self._nan in self._arr:
+                self._possible_groups += list(


Why does this need to be a list wrapped around it?

Rambatino · 2016-10-18T14:15:31Z

CHAID/tree.py

        self.alpha_merge = alpha_merge
        self.max_depth = max_depth
        self.min_parent_node_size = min_parent_node_size
        self.min_child_node_size = min_child_node_size
        self.split_titles = split_titles or []
        self.vectorised_array = []
-        for ind in range(0, ndarr.shape[1]):
-            self.vectorised_array.append(NominalColumn(ndarr[:, ind]))
+        variable_types = variable_types or ['nominal'] * ndarr.shape[1]


default to nominal?

For backward compatibility, but this should be removed in 3.0.0

Rambatino · 2016-10-18T14:24:59Z

CHAID/tree.py

                sub_data_columns = [('combinations', object), ('p', float), ('chi', float)]
-                sub_data = np.array([(None, 0, 1)]*size, dtype=sub_data_columns, order='F')
-                for j, comb in groupings:
+                choice = None


comma separated, you no like?

Rambatino · 2016-10-18T14:26:43Z

tests/test_ordinal_column.py

+    """ Test fixture class for deep copy method """
+    def setUp(self):
+        """ Setup for copy tests"""
+        # Use string so numpy array dtype is object and may store references


what string?

Copy paste error :X

Rambatino · 2016-10-18T14:29:50Z

tests/test_ordinal_column.py

@@ -0,0 +1,161 @@
+"""


dtype Object tests?
What happens when strings are passed in?

Strings aren't supported as they have no strict ordering. Alphabetical is too weak an assumption to make.

yeah but what about numbers stored as objects?

I do some jiggery pokery with casting to force them to ints. You are right in that I need tests for that though

Rambatino · 2016-10-18T14:30:08Z

tests/test_tree.py

@@ -68,7 +68,7 @@ def test_best_split_with_combination():
    assert list_ordered_equal(ndarr, orig_ndarr), 'Calling chaid should have no side affects for original numpy arrays'
    assert list_ordered_equal(arr, orig_arr), 'Calling chaid should have no side affects for original numpy arrays'
    assert split.column_id == 0, 'Identifies correct column to split on'
-    assert list_unordered_equal(split.split_map, [[1], [2, 3]]), 'Correctly identifies catagories'
+    assert list_unordered_equal(split.split_map, [[1], [2], [3]]), 'Correctly identifies catagories'


what's this?

This has been passing falsely on master for a while. The stricter list comparators picked it up. I assumed the numbers output by master for the last however many versions is correct and that the spec needed updating.

Previously the comparators were only checking the lists up until the smaller list ran out of elements. so when comparing [[1], [2, 3]] and [[1], [2], [3]] it was actually comparing [[1], [2, 3]] and [[1], [2]]. As it is recursive, this went to [[1], [2]] and [[1], [2]]

Rambatino · 2016-10-18T14:51:16Z

CHAID/tree.py

@@ -55,7 +55,8 @@ class Tree(object):
        the threshold value for the maximum number of levels after the root
        node in the tree (default 2)
    min_parent_node_size : float
-        the threshold value of the number of respondents that the node must
+        the threshold value of the number of respondents tha


Me no program good

Rambatino · 2016-10-22T11:30:18Z

Hmm it appears we have a numpy sorting issue in python3: ar.sort()

        if optional_indices:
            perm = ar.argsort(kind='mergesort' if return_index else 'quicksort')
            aux = ar[perm]
        else:
>           ar.sort()
E           TypeError: unorderable types: NoneType() > float()

/usr/local/lib/python3.5/site-packages/numpy/lib/arraysetops.py:198: TypeError
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> /usr/local/lib/python3.5/site-packages/numpy/lib/arraysetops.py(198)unique()
-> ar.sort()
(Pdb) ar.sort()
*** TypeError: unorderable types: NoneType() > float()
(Pdb) ar
array([1.0, 2.0, 3.0, 3.0, 3.0, 3.0, 4.0, 5.0, 10.0, None], dtype=object)

And here's a discussion on numpy:

numpy/numpy#641

Rambatino · 2017-04-19T11:51:33Z

@xulaus argh dealing with this again in python 3. So weird when I went on the numpy issues, found this issue and then my CHAID was linked in it! numpy/numpy#641

Rambatino · 2023-09-05T11:57:21Z

CHAID/column.py

+        else:
+            for x in np.unique(self._arr):
+                self._groupings[x] = list(groupings[x])
+        self._nan = np.array([np.nan]).astype(int)[0]


@xulaus can you remember what this line is doing?

Not exactly. I think we were using nan as a sentinel value that can combine with any item in the ordinal but we wanted the arrays to all be of the same dtype so we did this force cast.

xulaus force-pushed the feature/ordinal_variables branch 3 times, most recently from 2ed6136 to f967b4c Compare October 17, 2016 16:46

Rambatino reviewed Oct 18, 2016

View reviewed changes

xulaus force-pushed the feature/ordinal_variables branch 2 times, most recently from 5a74053 to ce0f3f6 Compare October 18, 2016 15:19

xulaus added 18 commits October 22, 2016 12:20

Factor out weighted Chi Squared calculation for cleaner code

6fabe09

Ordinal Variables?!?!?!? 😱

9450af3

Rework command line to accept ordinal variables

451f55a

Remove an enumerate

092b00a

Fix performance issue

84069f3

Fix misguided refactor

c102b17

Add tests for ordinal column

08371e6

Add Nan test case

4f83f2a

Allow variable types to be passed in as a Dict.

fdff416

Fix incorrect groups reported bug and add specs for said bug

0f807a5

Get NaN behavour correct.

1b0ccb7

Fix empty node issue

4ca04a9

Deal with NaNs correctly when returning groups containing NaN

0d52cbb

Fix for disappering group items

1d8984a

Fix broken spec (that was wrong in the first place)

4ac893a

PR Comments

15c88fd

PEP8 fixes

5902d0e

Add dtype=oject tests

c03df76

Rambatino force-pushed the feature/ordinal_variables branch from 2fe82be to c03df76 Compare October 22, 2016 11:23

Python 3 fixes

cf4a388

Rambatino approved these changes Oct 25, 2016

View reviewed changes

xulaus merged commit 2291318 into master Oct 25, 2016

xulaus deleted the feature/ordinal_variables branch October 25, 2016 13:11

Rambatino changed the title ~~Ordinal variables~~ Added ordinal variable type to independent variables May 7, 2017

Rambatino added the enhancement label May 7, 2017

Rambatino reviewed Sep 5, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added ordinal variable type to independent variables #45

Added ordinal variable type to independent variables #45

xulaus commented Oct 17, 2016

Rambatino Oct 18, 2016

Rambatino Oct 18, 2016

Rambatino Oct 18, 2016

Rambatino Oct 18, 2016

xulaus Oct 18, 2016

Rambatino Oct 18, 2016

Rambatino Oct 18, 2016

xulaus Oct 18, 2016

Rambatino Oct 18, 2016

xulaus Oct 18, 2016

Rambatino Oct 18, 2016

xulaus Oct 18, 2016

Rambatino Oct 18, 2016

xulaus Oct 18, 2016

Rambatino Oct 18, 2016

Rambatino Oct 18, 2016

xulaus Oct 18, 2016

Rambatino commented Oct 22, 2016 •

edited

Rambatino commented Apr 19, 2017

Rambatino Sep 5, 2023

xulaus Sep 5, 2023

Added ordinal variable type to independent variables #45

Added ordinal variable type to independent variables #45

Conversation

xulaus commented Oct 17, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rambatino commented Oct 22, 2016 • edited

Rambatino commented Apr 19, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rambatino commented Oct 22, 2016 •

edited