Seaborn Plots #62

jeff-hernandez · 2019-08-29T22:27:32Z

This PR uses Seaborn for plotting Label Times. Theses are plot examples for categorical and continuous Label Times. Closes #60. Closes #16 . Closes #56 .

codecov · 2019-08-29T22:43:49Z

Codecov Report

Merging #62 into master will not change coverage.
The diff coverage is 100%.

@@          Coverage Diff          @@
##           master    #62   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files          16     17    +1     
  Lines         532    629   +97     
=====================================
+ Hits          532    629   +97

Impacted Files	Coverage Δ
composeml/label_plots.py	`100% <100%> (ø)`
composeml/tests/utils.py	`100% <100%> (ø)`	⬆️
composeml/label_maker.py	`100% <100%> (ø)`	⬆️
composeml/tests/test_label_times.py	`100% <100%> (ø)`	⬆️
composeml/tests/test_label_maker.py	`100% <100%> (ø)`	⬆️
composeml/tests/test_label_plots.py	`100% <100%> (ø)`	⬆️
...seml/tests/test_label_transforms/test_threshold.py	`100% <100%> (ø)`	⬆️
composeml/conftest.py	`100% <100%> (ø)`	⬆️
composeml/label_times.py	`100% <100%> (ø)`	⬆️

kmax12 · 2019-08-30T00:08:09Z

does count by time make sense for continuous? I feel like it only makes sense if you were to bucket it in discrete labels or if you take some sort of rolling average by time

jeff-hernandez · 2019-08-30T13:36:40Z

I agree. When count_by_time is called with continuous labels, not sure whether to:

make discrete automatically before plotting
return none
raise an error

Also, do we want to plot: continuous labels vs. cutoff times?

kmax12 · 2019-08-30T14:56:24Z

maybe just use overall total number of label by time. don't try to break it up at all.

jeff-hernandez · 2019-08-30T15:07:31Z

I see. So, we'd just count event per cutoff time without grouping by label.

jeff-hernandez · 2019-08-30T15:43:00Z

Updated plot for count by time here. May need a hard refresh in browser to remove cached files.

kmax12 · 2019-09-03T21:43:59Z

composeml/label_times.py

-        labels = labels.groupby(self.name)
-        distribution = labels['count'].count()
-        return distribution
+    def _is_categorical(self):


I think a user should be able to provide a label_type parameter to override this. Label types could be "categorical" or "continuous". we can also pass this parameter through when using the label maker. for example, bin would know to pass categorical through.

In terms of infer categorical, I'd also check for any non numeric dtyes like objects, strs, or the pandas categorical dtype

additionally, we only want to be calculating this once when the labeltime are initialized, not every time _is_categorical is accessed

What do you think about overriding it by using type hints?

>>> def labeling_function(df) -> 'categorical': ... >>> labeling_function.__annotations__ {'return': 'categorical'}

Also, casting label times to categorical dtype if categorical. Related to #60.

I don't think that the type hints for annotations is very intuitive. might just be better to make it a parameter to the label maker

okay, will make parameter for label_type

kmax12

the new plotting stuff looks good. just some comments on how we do the label type inference

composeml/label_maker.py

composeml/label_times.py

kmax12 · 2019-09-09T00:41:53Z

composeml/label_times.py

@@ -1,9 +1,10 @@
 import pandas as pd

+from composeml.label_plots import LabelPlots
+

 class LabelTimes(pd.DataFrame):


the label times class should store the type of label it is

added label_type attribute to label times

kmax12 · 2019-09-09T00:42:45Z

composeml/label_times.py

+            value = self.groupby('cutoff_time')
+            value = value[self.name].count()
+            value = value.cumsum()
+            return value

    def describe(self):


the describe method should say the label type

Should label type be under settings? I guess it is a parameter of search.

kmax12 · 2019-09-09T00:58:30Z

composeml/label_maker.py

@@ -327,14 +329,23 @@ def search(self,

        labels = LabelTimes(data=labels, name=name, target_entity=self.target_entity)


we should pass the label type to the LabelTime so it can track it as well. perhaps even move the logic to handle and check is into the init.

moved check to label times init

kmax12 · 2019-09-09T22:08:39Z

composeml/label_maker.py

-
-        else:
-            labels = labels.infer_type()
+        if labels.label_type == 'discrete':


can we do this casting before we init the label_times? like above line 330

Labels are records (list of dictionaries) above line 330. Should I pass records to a pandas data frame to make categorical before initializing label times?

I see. it's fine to leave it here then

kmax12 · 2019-09-09T22:10:09Z

composeml/label_times.py

+            error = 'label type must be "continuous" or "discrete"'
+            assert label_type in ['continuous', 'discrete'], error
+
+        if label_type is None and name in self.columns:


why would name not be in self.columns?

Came across behavior where pandas would initialize label times without passing value for name. I refactored logic to infer inside is_discrete only if label type is none. link

kmax12 · 2019-09-09T22:11:23Z

composeml/label_times.py

+        if is_discrete:
+            return True
+
+        labels = self[self.name].iloc[:100]


for now, let's only look at the dtype to infer type. we can always add this functionality in later

updated function

kmax12 · 2019-09-09T22:15:52Z

composeml/label_times.py

@@ -320,16 +355,9 @@ def infer_type(self):
        """Infer label type.

        Returns:
-            LabelTimes : Label Times as inferred type.
+            str : Inferred label type. Can be "continuous" or "discrete".


I think intuitively I'd expect the logic in is_discrete to be here in infer_type and then I'd expect is_discrete to just check if label_type == "discrete.

then every we currently check if self.label_type == 'discrete' we'd just replace with is_discrete

updated logic

kmax12

LGTM

Jeff Hernandez added 12 commits August 27, 2019 18:05

update reqs

2c39774

add class

1d3a4f8

add label plots

e0e31e7

update plotting

c812d00

update getting started

76d61fe

add docstring

93e8996

update getting started

f5148df

auto format date / update notebooks

b3de01e

update fort label times

ab0caba

add tests for label plots

3e6fcd2

update notebook

9ebecf7

sort imports

f59cbc6

jeff-hernandez requested a review from kmax12 August 29, 2019 23:00

update notebook

06e952c

jeff-hernandez closed this Aug 30, 2019

kmax12 reopened this Aug 30, 2019

Jeff Hernandez added 4 commits September 3, 2019 13:41

update count by time

fd6f934

add alias for distribution

9e37e47

update test

8907710

update settings for minimum data

ab9a1df

kmax12 suggested changes Sep 3, 2019

View reviewed changes

infer label type

107ed6a

jeff-hernandez requested a review from kmax12 September 4, 2019 20:03

update infer type

a2f1bbc

Jeff Hernandez added 5 commits September 4, 2019 16:36

update infer type

aa4434e

update threshold

3f997a7

update threshold

19ad10e

sort imports

0a58ff3

Merge branch 'master' into seaborn-plots

e7e45c6

kmax12 suggested changes Sep 9, 2019

View reviewed changes

update

4669167

jeff-hernandez requested a review from kmax12 September 9, 2019 21:11

kmax12 suggested changes Sep 9, 2019

View reviewed changes

Jeff Hernandez added 2 commits September 10, 2019 10:31

remove logic

b4e1a56

update

e3c1beb

jeff-hernandez requested a review from kmax12 September 10, 2019 16:20

Jeff Hernandez added 4 commits September 10, 2019 12:25

minor update

dfe7f80

update settings

2fe86ea

update settings

8371b75

fix settings

f182cf8

kmax12 approved these changes Sep 10, 2019

View reviewed changes

jeff-hernandez merged commit 50e9f90 into master Sep 10, 2019

jeff-hernandez mentioned this pull request Sep 10, 2019

v0.1.5 #68

Merged

jeff-hernandez deleted the seaborn-plots branch September 13, 2019 18:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seaborn Plots #62

Seaborn Plots #62

jeff-hernandez commented Aug 29, 2019 •

edited

codecov bot commented Aug 29, 2019 •

edited

kmax12 commented Aug 30, 2019

jeff-hernandez commented Aug 30, 2019 •

edited

kmax12 commented Aug 30, 2019 •

edited

jeff-hernandez commented Aug 30, 2019 •

edited

jeff-hernandez commented Aug 30, 2019 •

edited

kmax12 Sep 3, 2019

kmax12 Sep 3, 2019

jeff-hernandez Sep 3, 2019 •

edited

jeff-hernandez Sep 3, 2019 •

edited

kmax12 Sep 4, 2019

jeff-hernandez Sep 4, 2019

kmax12 left a comment

kmax12 Sep 9, 2019

jeff-hernandez Sep 9, 2019

kmax12 Sep 9, 2019

jeff-hernandez Sep 9, 2019 •

edited

kmax12 Sep 9, 2019

jeff-hernandez Sep 9, 2019

kmax12 Sep 9, 2019

jeff-hernandez Sep 10, 2019

kmax12 Sep 10, 2019

kmax12 Sep 9, 2019

jeff-hernandez Sep 10, 2019

kmax12 Sep 9, 2019

jeff-hernandez Sep 10, 2019

kmax12 Sep 9, 2019

jeff-hernandez Sep 10, 2019

kmax12 left a comment

		@@ -327,14 +329,23 @@ def search(self,

		labels = LabelTimes(data=labels, name=name, target_entity=self.target_entity)

Seaborn Plots #62

Seaborn Plots #62

Conversation

jeff-hernandez commented Aug 29, 2019 • edited

codecov bot commented Aug 29, 2019 • edited

Codecov Report

kmax12 commented Aug 30, 2019

jeff-hernandez commented Aug 30, 2019 • edited

kmax12 commented Aug 30, 2019 • edited

jeff-hernandez commented Aug 30, 2019 • edited

jeff-hernandez commented Aug 30, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeff-hernandez Sep 3, 2019 • edited

Choose a reason for hiding this comment

jeff-hernandez Sep 3, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kmax12 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeff-hernandez Sep 9, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kmax12 left a comment

Choose a reason for hiding this comment

jeff-hernandez commented Aug 29, 2019 •

edited

codecov bot commented Aug 29, 2019 •

edited

jeff-hernandez commented Aug 30, 2019 •

edited

kmax12 commented Aug 30, 2019 •

edited

jeff-hernandez commented Aug 30, 2019 •

edited

jeff-hernandez commented Aug 30, 2019 •

edited

jeff-hernandez Sep 3, 2019 •

edited

jeff-hernandez Sep 3, 2019 •

edited

jeff-hernandez Sep 9, 2019 •

edited