Merge branch 'release-1.0.1'

DistrictDataLabs · Oct 6, 2019 · dd795b4 · dd795b4
2 parents d9e1218 + 753e4ba
commit dd795b4
Show file tree

Hide file tree

Showing 179 changed files with 1,685 additions and 14,009 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -42,12 +42,12 @@ install:
       conda install coveralls;
   fi
 
-script: 
+script:
 - python -m nltk.downloader popular
 - make test
 
 after_success: coveralls
 
 notifications:
   slack:
-    secure: mWKVHmEc22FJSp6Rrnd1j4QYCgZY4NJSrA8kZ5wj2/lf1iHI/CfWGTf7+Qihqe+rt0FOU0+UA9SzvSHRD1bV76q/zINayQ0EyJAfQzvIWIRGGnnMSO/79WoEYF56wwjpc5pLUTh6QV5qqfy+8nNGQ1/uJ0h6FtsUaSa/g61a5ZJEVBIjIpH8PgMxM64dRgJCmAdQuXkBP5Uf3yHlCtYk+Jr+gyXU2oqwMZ1VWgZkEo1Tqo7W9WY8dkOaAkzXDT61OqtcyyTuVSYbmK4i3c84681NBpb7wT6BfiCCAd3tn5AIKCkJVJ0ga0XeF6MdDpnicpku4FaN+fQjwkPiU47o/aFp8RNp27JQ9AhvH7wMuu5O8HDhszjRkfGOlUbuPOTavc22o4j0ShsrLiTQRJRhQQzJoquPuPj5wHqCCN+ice7IVUHj3ZC2jpJKDEYUNnr1fATtOwocimc6PhJM/IoeHgEEHpi37b+AxnhgOFoBlgsq2f4nsRD9JsLHqIpJCHgMjKxc6p3FtcFcXZDlDXQIcCzSRiPhG207dahspA3aPLj4Z+tOLJwh7/PSEfp02kcgPMM/MLYTWcaBv14aYi69kvQoZTfqVY8tIohg3ygda5siOCTTgqGriJYzkmdY5/Dp51kabhl+cEVIxPyY0miqyl3hZjqkqCnnOtg06qqxLLM=
+    secure: YvJ/aF5Ev2wgqoSc+QG4LA8XCovdfW7w7FiOMiRA6zrLjywEC12KzVDBTotIRFJVncCmh/WuyTCJUYfYA1Q0MrySpAF8cDr4fdGnO3skopU9Nx7pVuXOrHQ2LcVTEE0sGAeYH+hGrT+7TsbGR9iwki5xkkT0g1QEgJqvLhph6Y6gQMAtPceXU7wnIJf9Fn4IdTrDbeAawxhYsuVLTptGSS9UHYsV0P3lwPg1FItduE1UzNhyicBXzj/8f56/xBxNeYEGwFMhE1oad3lm9BRLzpqGwsIHWR5JLIYcX+y1YceFvB+vz4Xsf6H+XaCCb7uzBfC2BAc9+gr0zjUbiLcTyA1LyuR9kOlFCUx/nSGkJyhXcMb+NbA0vK9JY7ss2kempoxCDCkzpjFNasqGJMyPagI3na8YRu1RTTmBJUip9U+oN80Kr4lSMzbLDCDA2LTQBeL3zSSW51foiQPIDowK/CYQSMo/0IVp2x9ronWhDBbszHkXoWCv6/AMzjGhASDDg4AJD40zLo/pcEevcJdTraO915Sp8PtltbLnuuklJSi1xci5O6ja/ldyC7lKPm77z9nlx805349dLTkNpD27xXpALWPUJBNNrVpD3H6SvYB3b2IVgVjENdHZGLcCjlbwgdZ30zPik4Sj/w+8GoGxh5l/V6wHUhwOMm7ZKr7lcXk=
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -165,11 +165,11 @@ These two basic types of visualizers map well to the two basic estimator objects
 
 The scikit-learn API is object oriented, and estimators are initialized with parameters by instantiating their class. Hyperparameters can also be set using the `set_attrs()` method and retrieved with the corresponding `get_attrs()` method. All scikit-learn estimators have a `fit(X, y=None)` method that accepts a two dimensional data array, `X`, and optionally a vector `y` of target values. The `fit()` method trains the estimator, making it ready to transform data or make predictions. Transformers have an associated `transform(X)` method that returns a new dataset, `Xprime` and models have a `predict(X)` method that returns a vector of predictions, `yhat`. Models may also have a `score(X, y)` method that evaluate the performance of the model.
 
-Visualizers interact with scikit-learn objects by intersecting with them at the methods defined above. Specifically, visualizers perform actions related to `fit()`, `transform()`, `predict()`, and `score()` then call a `draw()` method which initializes the underlying figure associated with the visualizer. The user calls the visualizer's `poof()` method, which in turn calls a `finalize()` method on the visualizer to draw legends, titles, etc. and then `poof()` renders the figure. The Visualizer API is therefore:
+Visualizers interact with scikit-learn objects by intersecting with them at the methods defined above. Specifically, visualizers perform actions related to `fit()`, `transform()`, `predict()`, and `score()` then call a `draw()` method which initializes the underlying figure associated with the visualizer. The user calls the visualizer's `show()` method, which in turn calls a `finalize()` method on the visualizer to draw legends, titles, etc. and then `show()` renders the figure. The Visualizer API is therefore:
 
 - `draw()`: add visual elements to the underlying axes object
 - `finalize()`: prepare the figure for rendering, adding final touches such as legends, titles, axis labels, etc.
-- `poof()`: render the figure for the user.
+- `show()`: render the figure for the user.
 
 Creating a visualizer means defining a class that extends `Visualizer` or one of its subclasses, then implementing several of the methods described above. A barebones implementation is as follows::
 
@@ -201,7 +201,7 @@ This simple visualizer simply draws a line graph for some input dataset X, inter
 ```python
 visualizer = MyVisualizer()
 visualizer.fit(X)
-visualizer.poof()
+visualizer.show()
 ```
 
 Score visualizers work on the same principle but accept an additional required `model` argument. Score visualizers wrap the model (which can be either instantiated or uninstantiated) and then pass through all attributes and methods through to the underlying model, drawing where necessary.
@@ -231,7 +231,7 @@ class MyVisualizerTests(VisualTestCase):
         try:
             visualizer = MyVisualizer()
             visualizer.fit(X)
-            visualizer.poof()
+            visualizer.show()
         except Exception as e:
             pytest.fail("my visualizer didn't work")
 ```
@@ -287,7 +287,7 @@ class MyVisualizer(Visualizer):
 
     >>> model = MyVisualizer()
     >>> model.fit(X)
-    >>> model.poof()
+    >>> model.show()
 
     Notes
     -----

diff --git a/README.md b/README.md
@@ -54,7 +54,7 @@ visualizer = Rank2D(
 )
 visualizer.fit(X, y)                # Fit the data to the visualizer
 visualizer.transform(X)             # Transform the data
-visualizer.poof()                   # Show the data
+visualizer.show()                   # Finalize and render the figure
 ```
 
 ### Model Visualization
@@ -69,7 +69,7 @@ model = LinearSVC()
 model.fit(X,y)
 visualizer = ROCAUC(model)
 visualizer.score(X,y)
-visualizer.poof()
+visualizer.show()
 ```
 
 For additional information on getting started with Yellowbrick, view the quickstart guide in the [documentation](https://www.scikit-yb.org/en/latest/) and check out our [examples notebook](https://github.com/DistrictDataLabs/yellowbrick/blob/develop/examples/examples.ipynb).

diff --git a/docs/README.md b/docs/README.md
@@ -2,7 +2,7 @@
 
 *Welcome to the Yellowbrick docs!*
 
-If you're looking for information about how to use Yellowbrick, for our contributor's guide, for examples and teaching resources, for answers to frequently asked questions, and more, please visit the latest version of our documentation at [www.scikit-yb.org](https://www.scikit-yb.org/). 
+If you're looking for information about how to use Yellowbrick, for our contributor's guide, for examples and teaching resources, for answers to frequently asked questions, and more, please visit the latest version of our documentation at [www.scikit-yb.org](https://www.scikit-yb.org/).
 
 ## Building the Docs
 
@@ -16,9 +16,9 @@ You will then be able to build the documentation from inside the `docs` director
 
 ## reStructuredText
 
-Yellowbrick uses [Sphinx](http://www.sphinx-doc.org/en/master/index.html) to build our documentation. The advantages of using Sphinx are many; we can more directly link to the documentation and source code of other projects like Matplotlib and scikit-learn using [intersphinx](http://www.sphinx-doc.org/en/master/usage/extensions/intersphinx.html). In addition, docstrings used to describe Yellowbrick visualizers can be automatically included when the documentation is built via [autodoc](http://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#sphinx.ext.autodoc). 
+Yellowbrick uses [Sphinx](http://www.sphinx-doc.org/en/master/index.html) to build our documentation. The advantages of using Sphinx are many; we can more directly link to the documentation and source code of other projects like Matplotlib and scikit-learn using [intersphinx](http://www.sphinx-doc.org/en/master/usage/extensions/intersphinx.html). In addition, docstrings used to describe Yellowbrick visualizers can be automatically included when the documentation is built via [autodoc](http://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#sphinx.ext.autodoc).
 
-To take advantage of these features, our documentation must be written in reStructuredText (or "rst"). reStructuredText is similar to markdown, but not identical, and does take some getting used to. For instance, styling for things like codeblocks, external hyperlinks, internal cross references, notes, and fixed-width text are all unique in rst.  
+To take advantage of these features, our documentation must be written in reStructuredText (or "rst"). reStructuredText is similar to markdown, but not identical, and does take some getting used to. For instance, styling for things like codeblocks, external hyperlinks, internal cross references, notes, and fixed-width text are all unique in rst.
 
 If you would like to contribute to our documentation and do not have prior experience with rst, we recommend you make use of these resources:
 
@@ -28,7 +28,7 @@ If you would like to contribute to our documentation and do not have prior exper
 
 ## Adding New Visualizers to the Docs
 
-If you are adding a new visualizer to the docs, there are quite a few examples in the documentation on which you can base your files of similar types. 
+If you are adding a new visualizer to the docs, there are quite a few examples in the documentation on which you can base your files of similar types.
 
 The primary format for the API section is as follows:
 
@@ -48,7 +48,7 @@ A brief introduction to my visualizer and how it is useful in the machine learni
     visualizer = MyVisualizer(LinearRegression())
 
     visualizer.fit(X, y)
-    g = visualizer.poof()
+    g = visualizer.show()
 
 Discussion about my visualizer and some interpretation of the above plot.
 
@@ -62,7 +62,7 @@ API Reference
     :show-inheritance:
 ```
 
-This is a pretty good structure for a documentation page; a brief introduction followed by a code example with a visualization included using [the plot directive](https://matplotlib.org/devel/plot_directive.html). This will render the `MyVisualizer` image in the document along with links for the complete source code, the png, and the pdf versions of the image. It will also have the "alt-text" (for screen-readers) and will not display the source because of the `:include-source:` option. If `:include-source:` is omitted, the source will also be included. 
+This is a pretty good structure for a documentation page; a brief introduction followed by a code example with a visualization included using [the plot directive](https://matplotlib.org/devel/plot_directive.html). This will render the `MyVisualizer` image in the document along with links for the complete source code, the png, and the pdf versions of the image. It will also have the "alt-text" (for screen-readers) and will not display the source because of the `:include-source:` option. If `:include-source:` is omitted, the source will also be included.
 
 The primary section is wrapped up with a discussion about how to interpret the visualizer and use it in practice. Finally the `API Reference` section will use `automodule` to include the documentation from your docstring.
 

diff --git a/docs/api/classifier/class_prediction_error.rst b/docs/api/classifier/class_prediction_error.rst
@@ -40,7 +40,7 @@ The class prediction error chart provides a way to quickly understand how good y
     visualizer.score(X_test, y_test)
 
     # Draw visualization
-    visualizer.poof()
+    visualizer.show()
 
 In the above example, while the ``RandomForestClassifier`` appears to be fairly good at correctly predicting apples based on the features of the fruit, it often incorrectly labels pears as kiwis and mistakes kiwis for bananas.
 
@@ -56,13 +56,13 @@ By contrast, in the following example, the ``RandomForestClassifier`` does a gre
     from yellowbrick.datasets import load_credit
 
     X, y = load_credit()
-    
+
     classes = ['account in default', 'current with bills']
 
     # Perform 80/20 training/test split
     X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20,
                                                         random_state=42)
-    
+
     # Instantiate the classification model and visualizer
     visualizer = ClassPredictionError(
         RandomForestClassifier(n_estimators=10), classes=classes
@@ -75,10 +75,10 @@ By contrast, in the following example, the ``RandomForestClassifier`` does a gre
     visualizer.score(X_test, y_test)
 
     # Draw visualization
-    visualizer.poof()
+    visualizer.show()
+
+
 
-
-
 API Reference
 -------------
 

diff --git a/docs/api/classifier/classification_report.rst b/docs/api/classifier/classification_report.rst
@@ -33,7 +33,7 @@ The classification report visualizer displays the precision, recall, F1, and sup
 
     visualizer.fit(X_train, y_train)        # Fit the visualizer and the model
     visualizer.score(X_test, y_test)        # Evaluate the model on the test data
-    visualizer.poof()                       # Draw/show/poof the data
+    visualizer.show()                       # Finalize and show the figure
 
 
 The classification report shows a representation of the main classification metrics on a per-class basis. This gives a deeper intuition of the classifier behavior over global accuracy which can mask functional weaknesses in one class of a multiclass problem. Visual classification reports are used to compare classification models to select models that are "redder", e.g. have stronger classification metrics or that are more balanced.

diff --git a/docs/api/classifier/confusion_matrix.rst b/docs/api/classifier/confusion_matrix.rst
@@ -47,23 +47,23 @@ scikit-learn documentation on `confusion matrices <http://scikit-learn.org/stabl
     cm.score(X_test, y_test)
 
     # How did we do?
-    cm.poof()
+    cm.show()
 
 
 Plotting with Class Names
 -------------------------
 
 Class names can be added to a ``ConfusionMatrix`` plot using the ``label_encoder`` argument. The ``label_encoder`` can be a `sklearn.preprocessing.LabelEncoder <http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html>`_ (or anything with an ``inverse_transform`` method that performs the mapping), or a ``dict`` with the encoding-to-string mapping as in the example below:
 
-.. plot:: 
+.. plot::
     :context: close-figs
     :alt: ConfusionMatrix plot with class names
 
     from sklearn.datasets import load_iris
     from sklearn.model_selection import train_test_split as tts
     from sklearn.linear_model import LogisticRegression
     from yellowbrick.classifier import ConfusionMatrix
-    
+
     iris = load_iris()
     X = iris.data
     y = iris.target
@@ -81,7 +81,7 @@ Class names can be added to a ``ConfusionMatrix`` plot using the ``label_encoder
     iris_cm.fit(X_train, y_train)
     iris_cm.score(X_test, y_test)
 
-    iris_cm.poof()
+    iris_cm.show()
 
 
 API Reference

diff --git a/docs/api/classifier/prcurve.rst b/docs/api/classifier/prcurve.rst
@@ -28,11 +28,11 @@ Binary Classification
 
     X_train, X_test, y_train, y_test = tts(X, y, test_size=0.2, shuffle=True)
 
-    # Create the visualizer, fit, score, and poof it
+    # Create the visualizer, fit, score, and show it
     viz = PrecisionRecallCurve(RidgeClassifier())
     viz.fit(X_train, y_train)
     viz.score(X_test, y_test)
-    viz.poof()
+    viz.show()
 
 
 The base case for precision-recall curves is the binary classification case, and this case is also the most visually interpretable. In the figure above we can see the precision plotted on the y-axis against the recall on the x-axis. The larger the filled in area, the stronger the classifier is. The red line annotates the *average precision*, a summary of the entire plot computed as the weighted average of precision achieved at each threshold such that the weight is the difference in recall from the previous threshold.
@@ -59,11 +59,11 @@ To support multi-label classification, the estimator is wrapped in a `OneVsRestC
 
     X_train, X_test, y_train, y_test = tts(X, y, test_size=0.2, shuffle=True)
 
-    # Create the visualizer, fit, score, and poof it
+    # Create the visualizer, fit, score, and show it
     viz = PrecisionRecallCurve(RandomForestClassifier(n_estimators=10))
     viz.fit(X_train, y_train)
     viz.score(X_test, y_test)
-    viz.poof()
+    viz.show()
 
 
 A more complex Precision-Recall curve can be computed, however, displaying the each curve individually, along with F1-score ISO curves (e.g. that show the relationship between precision and recall for various F1 scores).
@@ -86,14 +86,14 @@ A more complex Precision-Recall curve can be computed, however, displaying the e
 
     X_train, X_test, y_train, y_test = tts(X, y, test_size=0.2, shuffle=True)
 
-    # Create the visualizer, fit, score, and poof it
+    # Create the visualizer, fit, score, and show it
     viz = PrecisionRecallCurve(
         MultinomialNB(), per_class=True, iso_f1_curves=True,
         fill_area=False, micro=False, classes=encoder.classes_
     )
     viz.fit(X_train, y_train)
     viz.score(X_test, y_test)
-    viz.poof()
+    viz.show()
 
 
 .. seealso:: `Scikit-Learn: Model Selection with Precision Recall Curves <http://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html>`_

diff --git a/docs/api/classifier/rocauc.rst b/docs/api/classifier/rocauc.rst
@@ -31,14 +31,14 @@ This leads to another metric, area under the curve (AUC), which is a computation
 
     visualizer.fit(X_train, y_train)        # Fit the training data to the visualizer
     visualizer.score(X_test, y_test)        # Evaluate the model on the test data
-    visualizer.poof()                       # Draw/show/poof the data
+    visualizer.show()                       # Finalize and show the figure
 
 
 .. warning::
-    Versions of Yellowbrick =< v0.8 had a `bug <https://github.com/DistrictDataLabs/yellowbrick/blob/develop/examples/rebeccabilbro/rocauc_bug_research.ipynb>`_ 
-    that triggered an ``IndexError`` when attempting binary classification using 
+    Versions of Yellowbrick =< v0.8 had a `bug <https://github.com/DistrictDataLabs/yellowbrick/blob/develop/examples/rebeccabilbro/rocauc_bug_research.ipynb>`_
+    that triggered an ``IndexError`` when attempting binary classification using
     a Scikit-learn-style estimator with only a ``decision_function``. This has been
-    fixed as of v0.9, where the ``micro``, ``macro``, and ``per-class`` parameters of 
+    fixed as of v0.9, where the ``micro``, ``macro``, and ``per-class`` parameters of
     ``ROCAUC`` are set to ``False`` for such classifiers.
 
 
@@ -75,7 +75,7 @@ ROC curves are typically used in binary classification, and in fact the Scikit-L
 
     visualizer.fit(X_train, y_train)        # Fit the training data to the visualizer
     visualizer.score(X_test, y_test)        # Evaluate the model on the test data
-    visualizer.poof()                       # Draw/show/poof the data
+    visualizer.show()                       # Finalize and render the figure
 
 .. warning::
     The target ``y`` must be numeric for this figure to work, or update to the latest version of sklearn.

diff --git a/docs/api/classifier/threshold.rst b/docs/api/classifier/threshold.rst
@@ -24,7 +24,7 @@ A visualization of precision, recall, f1 score, and queue rate with respect to t
     visualizer = DiscriminationThreshold(model)
 
     visualizer.fit(X, y)        # Fit the data to the visualizer
-    visualizer.poof()           # Draw/show/poof the data
+    visualizer.show()           # Finalize and render the figure
 
 One common use of binary classification algorithms is to use the score or probability they produce to determine cases that require special treatment. For example, a fraud prevention application might use a classification algorithm to determine if a transaction is likely fraudulent and needs to be investigated in detail. In the figure above, we present an example where a binary classifier determines if an email is "spam" (the positive case) or "not spam" (the negative case). Emails that are detected as spam are moved to a hidden folder and eventually deleted.
 
@@ -40,7 +40,7 @@ Generally speaking, the threshold is balanced between cases and set to 0.5 or 50
 
 - **Queue Rate**: The "queue" is the spam folder or the inbox of the fraud investigation desk. This metric describes the percentage of instances that must be reviewed. If review has a high cost (e.g. fraud prevention) then this must be minimized with respect to business requirements; if it doesn't (e.g. spam filter), this could be optimized to ensure the inbox stays clean.
 
-In the figure above we see the visualizer tuned to look for the optimal F1 score, which is annotated as a threshold of 0.43. The model is run multiple times over multiple train/test splits in order to account for the variability of the model with respect to the metrics (shown as the fill area around the median curve). 
+In the figure above we see the visualizer tuned to look for the optimal F1 score, which is annotated as a threshold of 0.43. The model is run multiple times over multiple train/test splits in order to account for the variability of the model with respect to the metrics (shown as the fill area around the median curve).
 
 
 API Reference