Merge branch 'development'

EpistasisLab · Oct 26, 2020 · 08e9db8 · 08e9db8
2 parents af16ad0 + c766c1b
commit 08e9db8
Show file tree

Hide file tree

Showing 34 changed files with 1,339 additions and 188 deletions.
diff --git a/README.md b/README.md
@@ -54,7 +54,7 @@ Click on the corresponding links to find more information on TPOT usage in the d
 
 ### Classification
 
-Below is a minimal working example with the the optical recognition of handwritten digits dataset.
+Below is a minimal working example with the optical recognition of handwritten digits dataset.
 
 ```python
 from tpot import TPOTClassifier

diff --git a/docs/api/index.html b/docs/api/index.html
@@ -147,7 +147,6 @@ <h2 id="classification">Classification</h2>
                           <strong>disable_update_check</strong>=False,
                           <strong>log_file</strong>=None
                           </em>)</pre>
-
 <div align="right"><a href="https://github.com/EpistasisLab/tpot/blob/master/tpot/base.py">source</a></div>
 
 <p>Automated machine learning for supervised classification tasks.</p>
@@ -352,10 +351,13 @@ <h2 id="classification">Classification</h2>
 The update checker will tell you when a new version of TPOT has been released.
 </blockquote>
 
-<strong>log_file</strong>: io.TextIOWrapper or io.StringIO, optional (defaul: sys.stdout)
+<strong>log_file</strong>: file-like class (io.TextIOWrapper or io.StringIO) or string, optional (default: None)
 <br /><br />
 <blockquote>
 Save progress content to a file.
+If it is a string for the path and file name of the desired output file,
+TPOT will create the file and write log into it.
+If it is None, TPOT will output log into sys.stdout
 </blockquote>
 
 </td>
@@ -389,7 +391,7 @@ <h2 id="classification">Classification</h2>
 </table>
 
 <p><strong>Example</strong></p>
-<pre><code class="Python">from tpot import TPOTClassifier
+<pre><code class="language-Python">from tpot import TPOTClassifier
 from sklearn.datasets import load_digits
 from sklearn.model_selection import train_test_split
 
@@ -402,7 +404,6 @@ <h2 id="classification">Classification</h2>
 print(tpot.score(X_test, y_test))
 tpot.export('tpot_digits_pipeline.py')
 </code></pre>
-
 <p><strong>Functions</strong></p>
 <table width="100%">
 <tr>
@@ -432,9 +433,8 @@ <h2 id="classification">Classification</h2>
 </table>
 
 <p><a name="tpotclassifier-fit"></a></p>
-<pre><code class="Python">fit(features, classes, sample_weight=None, groups=None)
+<pre><code class="language-Python">fit(features, classes, sample_weight=None, groups=None)
 </code></pre>
-
 <div style="padding-left:5%" width="100%">
 Run the TPOT optimization process on the given training data.
 <br /><br />
@@ -486,9 +486,8 @@ <h2 id="classification">Classification</h2>
 </div>
 
 <p><a name="tpotclassifier-predict"></a></p>
-<pre><code class="Python">predict(features)
+<pre><code class="language-Python">predict(features)
 </code></pre>
-
 <div style="padding-left:5%" width="100%">
 Use the optimized pipeline to predict the classes for a feature set.
 <br /><br />
@@ -515,9 +514,8 @@ <h2 id="classification">Classification</h2>
 </div>
 
 <p><a name="tpotclassifier-predict-proba"></a></p>
-<pre><code class="Python">predict_proba(features)
+<pre><code class="language-Python">predict_proba(features)
 </code></pre>
-
 <div style="padding-left:5%" width="100%">
 Use the optimized pipeline to estimate the class probabilities for a feature set.
 <br /><br />
@@ -546,9 +544,8 @@ <h2 id="classification">Classification</h2>
 </div>
 
 <p><a name="tpotclassifier-score"></a></p>
-<pre><code class="Python">score(testing_features, testing_classes)
+<pre><code class="language-Python">score(testing_features, testing_classes)
 </code></pre>
-
 <div style="padding-left:5%" width="100%">
 Returns the optimized pipeline's score on the given testing data using the user-specified scoring function.
 <br /><br />
@@ -582,9 +579,8 @@ <h2 id="classification">Classification</h2>
 </div>
 
 <p><a name="tpotclassifier-export"></a></p>
-<pre><code class="Python">export(output_file_name, data_file_path)
+<pre><code class="language-Python">export(output_file_name, data_file_path)
 </code></pre>
-
 <div style="padding-left:5%" width="100%">
 Export the optimized pipeline as Python code.
 <br /><br />
@@ -631,7 +627,6 @@ <h2 id="regression">Regression</h2>
                          <strong>early_stop</strong>=None,
                          <strong>verbosity</strong>=0,
                          <strong>disable_update_check</strong>=False</em>)</pre>
-
 <div align="right"><a href="https://github.com/EpistasisLab/tpot/blob/master/tpot/base.py">source</a></div>
 
 <p>Automated machine learning for supervised regression tasks.</p>
@@ -868,7 +863,7 @@ <h2 id="regression">Regression</h2>
 </table>
 
 <p><strong>Example</strong></p>
-<pre><code class="Python">from tpot import TPOTRegressor
+<pre><code class="language-Python">from tpot import TPOTRegressor
 from sklearn.datasets import load_boston
 from sklearn.model_selection import train_test_split
 
@@ -881,7 +876,6 @@ <h2 id="regression">Regression</h2>
 print(tpot.score(X_test, y_test))
 tpot.export('tpot_boston_pipeline.py')
 </code></pre>
-
 <p><strong>Functions</strong></p>
 <table width="100%">
 <tr>
@@ -906,9 +900,8 @@ <h2 id="regression">Regression</h2>
 </table>
 
 <p><a name="tpotregressor-fit"></a></p>
-<pre><code class="Python">fit(features, target, sample_weight=None, groups=None)
+<pre><code class="language-Python">fit(features, target, sample_weight=None, groups=None)
 </code></pre>
-
 <div style="padding-left:5%" width="100%">
 Run the TPOT optimization process on the given training data.
 <br /><br />
@@ -960,9 +953,8 @@ <h2 id="regression">Regression</h2>
 </div>
 
 <p><a name="tpotregressor-predict"></a></p>
-<pre><code class="Python">predict(features)
+<pre><code class="language-Python">predict(features)
 </code></pre>
-
 <div style="padding-left:5%" width="100%">
 Use the optimized pipeline to predict the target values for a feature set.
 <br /><br />
@@ -989,9 +981,8 @@ <h2 id="regression">Regression</h2>
 </div>
 
 <p><a name="tpotregressor-score"></a></p>
-<pre><code class="Python">score(testing_features, testing_target)
+<pre><code class="language-Python">score(testing_features, testing_target)
 </code></pre>
-
 <div style="padding-left:5%" width="100%">
 Returns the optimized pipeline's score on the given testing data using the user-specified scoring function.
 <br /><br />
@@ -1025,9 +1016,8 @@ <h2 id="regression">Regression</h2>
 </div>
 
 <p><a name="tpotregressor-export"></a></p>
-<pre><code class="Python">export(output_file_name)
+<pre><code class="language-Python">export(output_file_name)
 </code></pre>
-
 <div style="padding-left:5%" width="100%">
 Export the optimized pipeline as Python code.
 <br /><br />

diff --git a/docs/citing/index.html b/docs/citing/index.html
@@ -128,7 +128,7 @@ <h1 id="citing-tpot">Citing TPOT</h1>
 <p>If you use TPOT in a scientific publication, please consider citing at least one of the following papers:</p>
 <p>Trang T. Le, Weixuan Fu and Jason H. Moore (2020). <a href="https://academic.oup.com/bioinformatics/article/36/1/250/5511404">Scaling tree-based automated machine learning to biomedical big data with a feature set selector</a>. <em>Bioinformatics</em>.36(1): 250-256.</p>
 <p>BibTeX entry:</p>
-<pre><code class="bibtex">@article{le2020scaling,
+<pre><code class="language-bibtex">@article{le2020scaling,
   title={Scaling tree-based automated machine learning to biomedical big data with a feature set selector},
   author={Le, Trang T and Fu, Weixuan and Moore, Jason H},
   journal={Bioinformatics},
@@ -139,10 +139,9 @@ <h1 id="citing-tpot">Citing TPOT</h1>
   publisher={Oxford University Press}
 }
 </code></pre>
-
 <p>Randal S. Olson, Ryan J. Urbanowicz, Peter C. Andrews, Nicole A. Lavender, La Creis Kidd, and Jason H. Moore (2016). <a href="http://link.springer.com/chapter/10.1007/978-3-319-31204-0_9">Automating biomedical data science through tree-based pipeline optimization</a>. <em>Applications of Evolutionary Computation</em>, pages 123-137.</p>
 <p>BibTeX entry:</p>
-<pre><code class="bibtex">@inbook{Olson2016EvoBio,
+<pre><code class="language-bibtex">@inbook{Olson2016EvoBio,
     author={Olson, Randal S. and Urbanowicz, Ryan J. and Andrews, Peter C. and Lavender, Nicole A. and Kidd, La Creis and Moore, Jason H.},
     editor={Squillero, Giovanni and Burelli, Paolo},
     chapter={Automating Biomedical Data Science Through Tree-Based Pipeline Optimization},
@@ -155,11 +154,10 @@ <h1 id="citing-tpot">Citing TPOT</h1>
     url={http://dx.doi.org/10.1007/978-3-319-31204-0_9}
 }
 </code></pre>
-
 <p>Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science</p>
 <p>Randal S. Olson, Nathan Bartley, Ryan J. Urbanowicz, and Jason H. Moore (2016). <a href="http://dl.acm.org/citation.cfm?id=2908918">Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science</a>. <em>Proceedings of GECCO 2016</em>, pages 485-492.</p>
 <p>BibTeX entry:</p>
-<pre><code class="bibtex">@inproceedings{OlsonGECCO2016,
+<pre><code class="language-bibtex">@inproceedings{OlsonGECCO2016,
     author = {Olson, Randal S. and Bartley, Nathan and Urbanowicz, Ryan J. and Moore, Jason H.},
     title = {Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science},
     booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference 2016},
@@ -176,7 +174,6 @@ <h1 id="citing-tpot">Citing TPOT</h1>
     address = {New York, NY, USA},
 }
 </code></pre>
-
 <p>Alternatively, you can cite the repository directly with the following DOI:</p>
 <p><a href="https://zenodo.org/badge/latestdoi/20747/rhiever/tpot">DOI</a></p>
 

diff --git a/docs/examples/index.html b/docs/examples/index.html
@@ -194,14 +194,28 @@ <h1 id="overview">Overview</h1>
 <td align="center"><a href="https://archive.ics.uci.edu/ml/datasets/MAGIC+Gamma+Telescope">link</a></td>
 <td align="center"><a href="https://github.com/EpistasisLab/tpot/blob/master/tutorials/MAGIC%20Gamma%20Telescope/MAGIC%20Gamma%20Telescope.ipynb">link</a></td>
 </tr>
+<tr>
+<td>cuML Classification Example</td>
+<td>random classification problem</td>
+<td>classification</td>
+<td align="center"><a href="https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html">link</a></td>
+<td align="center"><a href="https://github.com/EpistasisLab/tpot/blob/master/tutorials/cuML_Classification_Example.ipynb">link</a></td>
+</tr>
+<tr>
+<td>cuML Regression Example</td>
+<td>random regression problem</td>
+<td>regression</td>
+<td align="center"><a href="https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html">link</a></td>
+<td align="center"><a href="https://github.com/EpistasisLab/tpot/blob/master/tutorials/cuML_Regression_Example.ipynb">link</a></td>
+</tr>
 </tbody>
 </table>
 <p><strong>Notes:</strong>
 - For details on how the <code>fit()</code>, <code>score()</code> and <code>export()</code> methods work, refer to the <a href="/using/">usage documentation</a>.
 - Upon re-running the experiments, your resulting pipelines <em>may</em> differ (to some extent) from the ones demonstrated here.</p>
 <h2 id="iris-flower-classification">Iris flower classification</h2>
 <p>The following code illustrates how TPOT can be employed for performing a simple <em>classification task</em> over the Iris dataset.</p>
-<pre><code class="Python">from tpot import TPOTClassifier
+<pre><code class="language-Python">from tpot import TPOTClassifier
 from sklearn.datasets import load_iris
 from sklearn.model_selection import train_test_split
 import numpy as np
@@ -215,9 +229,8 @@ <h2 id="iris-flower-classification">Iris flower classification</h2>
 print(tpot.score(X_test, y_test))
 tpot.export('tpot_iris_pipeline.py')
 </code></pre>
-
 <p>Running this code should discover a pipeline (exported as <code>tpot_iris_pipeline.py</code>) that achieves about 97% test accuracy:</p>
-<pre><code class="Python">import numpy as np
+<pre><code class="language-Python">import numpy as np
 import pandas as pd
 from sklearn.model_selection import train_test_split
 from sklearn.neighbors import KNeighborsClassifier
@@ -242,10 +255,9 @@ <h2 id="iris-flower-classification">Iris flower classification</h2>
 exported_pipeline.fit(training_features, training_target)
 results = exported_pipeline.predict(testing_features)
 </code></pre>
-
 <h2 id="digits-dataset">Digits dataset</h2>
 <p>Below is a minimal working example with the optical recognition of handwritten digits dataset, which is an <em>image classification problem</em>.</p>
-<pre><code class="Python">from tpot import TPOTClassifier
+<pre><code class="language-Python">from tpot import TPOTClassifier
 from sklearn.datasets import load_digits
 from sklearn.model_selection import train_test_split
 
@@ -258,9 +270,8 @@ <h2 id="digits-dataset">Digits dataset</h2>
 print(tpot.score(X_test, y_test))
 tpot.export('tpot_digits_pipeline.py')
 </code></pre>
-
 <p>Running this code should discover a pipeline (exported as <code>tpot_digits_pipeline.py</code>) that achieves about 98% test accuracy:</p>
-<pre><code class="Python">import numpy as np
+<pre><code class="language-Python">import numpy as np
 import pandas as pd
 from sklearn.ensemble import RandomForestClassifier
 from sklearn.linear_model import LogisticRegression
@@ -288,10 +299,9 @@ <h2 id="digits-dataset">Digits dataset</h2>
 exported_pipeline.fit(training_features, training_target)
 results = exported_pipeline.predict(testing_features)
 </code></pre>
-
 <h2 id="boston-housing-prices-modeling">Boston housing prices modeling</h2>
 <p>The following code illustrates how TPOT can be employed for performing a <em>regression task</em> over the Boston housing prices dataset.</p>
-<pre><code class="Python">from tpot import TPOTRegressor
+<pre><code class="language-Python">from tpot import TPOTRegressor
 from sklearn.datasets import load_boston
 from sklearn.model_selection import train_test_split
 
@@ -304,9 +314,8 @@ <h2 id="boston-housing-prices-modeling">Boston housing prices modeling</h2>
 print(tpot.score(X_test, y_test))
 tpot.export('tpot_boston_pipeline.py')
 </code></pre>
-
 <p>Running this code should discover a pipeline (exported as <code>tpot_boston_pipeline.py</code>) that achieves at least 10 mean squared error (MSE) on the test set:</p>
-<pre><code class="Python">import numpy as np
+<pre><code class="language-Python">import numpy as np
 import pandas as pd
 from sklearn.ensemble import ExtraTreesRegressor
 from sklearn.model_selection import train_test_split
@@ -331,7 +340,6 @@ <h2 id="boston-housing-prices-modeling">Boston housing prices modeling</h2>
 exported_pipeline.fit(training_features, training_target)
 results = exported_pipeline.predict(testing_features)
 </code></pre>
-
 <h2 id="titanic-survival-analysis">Titanic survival analysis</h2>
 <p>To see the TPOT applied the Titanic Kaggle dataset, see the Jupyter notebook <a href="https://github.com/EpistasisLab/tpot/blob/master/tutorials/Titanic_Kaggle.ipynb">here</a>. This example shows how to take a messy dataset and preprocess it such that it can be used in scikit-learn and TPOT.</p>
 <h2 id="portuguese-bank-marketing">Portuguese Bank Marketing</h2>
@@ -340,7 +348,7 @@ <h2 id="magic-gamma-telescope">MAGIC Gamma Telescope</h2>
 <p>The corresponding Jupyter notebook, containing the associated data preprocessing and analysis, can be found <a href="https://github.com/EpistasisLab/tpot/blob/master/tutorials/MAGIC%20Gamma%20Telescope/MAGIC%20Gamma%20Telescope.ipynb">here</a>.</p>
 <h2 id="neural-network-classifier-using-tpot-nn">Neural network classifier using TPOT-NN</h2>
 <p>By loading the <a href="https://github.com/EpistasisLab/tpot/blob/master/tpot/config/classifier_nn.py">TPOT-NN configuration dictionary</a>, PyTorch estimators will be included for classification. Users can also create their own NN configuration dictionary that includes <code>tpot.builtins.PytorchLRClassifier</code> and/or <code>tpot.builtins.PytorchMLPClassifier</code>, or they can specify them using a template string, as shown in the following example:</p>
-<pre><code class="Python">from tpot import TPOTClassifier
+<pre><code class="language-Python">from tpot import TPOTClassifier
 from sklearn.datasets import make_blobs
 from sklearn.model_selection import train_test_split
 
@@ -353,7 +361,6 @@ <h2 id="neural-network-classifier-using-tpot-nn">Neural network classifier using
 print(clf.score(X_test, y_test))
 clf.export('tpot_nn_demo_pipeline.py')
 </code></pre>
-
 <p>This example is somewhat trivial, but it should result in nearly 100% classification accuracy.</p>
 
             </div>

diff --git a/docs/index.html b/docs/index.html
@@ -204,5 +204,5 @@
 
 <!--
 MkDocs version : 1.1.2
-Build Date UTC : 2020-07-21 20:34:39.398221+00:00
+Build Date UTC : 2020-10-26 14:32:58.841000+00:00
 -->