Skip to content

Commit

Permalink
Merge branch 'development'
Browse files Browse the repository at this point in the history
  • Loading branch information
Weixuan Fu committed Oct 26, 2020
2 parents af16ad0 + c766c1b commit 08e9db8
Show file tree
Hide file tree
Showing 34 changed files with 1,339 additions and 188 deletions.
2 changes: 1 addition & 1 deletion README.md
Expand Up @@ -54,7 +54,7 @@ Click on the corresponding links to find more information on TPOT usage in the d

### Classification

Below is a minimal working example with the the optical recognition of handwritten digits dataset.
Below is a minimal working example with the optical recognition of handwritten digits dataset.

```python
from tpot import TPOTClassifier
Expand Down
40 changes: 15 additions & 25 deletions docs/api/index.html
Expand Up @@ -147,7 +147,6 @@ <h2 id="classification">Classification</h2>
<strong>disable_update_check</strong>=False,
<strong>log_file</strong>=None
</em>)</pre>

<div align="right"><a href="https://github.com/EpistasisLab/tpot/blob/master/tpot/base.py">source</a></div>

<p>Automated machine learning for supervised classification tasks.</p>
Expand Down Expand Up @@ -352,10 +351,13 @@ <h2 id="classification">Classification</h2>
The update checker will tell you when a new version of TPOT has been released.
</blockquote>

<strong>log_file</strong>: io.TextIOWrapper or io.StringIO, optional (defaul: sys.stdout)
<strong>log_file</strong>: file-like class (io.TextIOWrapper or io.StringIO) or string, optional (default: None)
<br /><br />
<blockquote>
Save progress content to a file.
If it is a string for the path and file name of the desired output file,
TPOT will create the file and write log into it.
If it is None, TPOT will output log into sys.stdout
</blockquote>

</td>
Expand Down Expand Up @@ -389,7 +391,7 @@ <h2 id="classification">Classification</h2>
</table>

<p><strong>Example</strong></p>
<pre><code class="Python">from tpot import TPOTClassifier
<pre><code class="language-Python">from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

Expand All @@ -402,7 +404,6 @@ <h2 id="classification">Classification</h2>
print(tpot.score(X_test, y_test))
tpot.export('tpot_digits_pipeline.py')
</code></pre>

<p><strong>Functions</strong></p>
<table width="100%">
<tr>
Expand Down Expand Up @@ -432,9 +433,8 @@ <h2 id="classification">Classification</h2>
</table>

<p><a name="tpotclassifier-fit"></a></p>
<pre><code class="Python">fit(features, classes, sample_weight=None, groups=None)
<pre><code class="language-Python">fit(features, classes, sample_weight=None, groups=None)
</code></pre>

<div style="padding-left:5%" width="100%">
Run the TPOT optimization process on the given training data.
<br /><br />
Expand Down Expand Up @@ -486,9 +486,8 @@ <h2 id="classification">Classification</h2>
</div>

<p><a name="tpotclassifier-predict"></a></p>
<pre><code class="Python">predict(features)
<pre><code class="language-Python">predict(features)
</code></pre>

<div style="padding-left:5%" width="100%">
Use the optimized pipeline to predict the classes for a feature set.
<br /><br />
Expand All @@ -515,9 +514,8 @@ <h2 id="classification">Classification</h2>
</div>

<p><a name="tpotclassifier-predict-proba"></a></p>
<pre><code class="Python">predict_proba(features)
<pre><code class="language-Python">predict_proba(features)
</code></pre>

<div style="padding-left:5%" width="100%">
Use the optimized pipeline to estimate the class probabilities for a feature set.
<br /><br />
Expand Down Expand Up @@ -546,9 +544,8 @@ <h2 id="classification">Classification</h2>
</div>

<p><a name="tpotclassifier-score"></a></p>
<pre><code class="Python">score(testing_features, testing_classes)
<pre><code class="language-Python">score(testing_features, testing_classes)
</code></pre>

<div style="padding-left:5%" width="100%">
Returns the optimized pipeline's score on the given testing data using the user-specified scoring function.
<br /><br />
Expand Down Expand Up @@ -582,9 +579,8 @@ <h2 id="classification">Classification</h2>
</div>

<p><a name="tpotclassifier-export"></a></p>
<pre><code class="Python">export(output_file_name, data_file_path)
<pre><code class="language-Python">export(output_file_name, data_file_path)
</code></pre>

<div style="padding-left:5%" width="100%">
Export the optimized pipeline as Python code.
<br /><br />
Expand Down Expand Up @@ -631,7 +627,6 @@ <h2 id="regression">Regression</h2>
<strong>early_stop</strong>=None,
<strong>verbosity</strong>=0,
<strong>disable_update_check</strong>=False</em>)</pre>

<div align="right"><a href="https://github.com/EpistasisLab/tpot/blob/master/tpot/base.py">source</a></div>

<p>Automated machine learning for supervised regression tasks.</p>
Expand Down Expand Up @@ -868,7 +863,7 @@ <h2 id="regression">Regression</h2>
</table>

<p><strong>Example</strong></p>
<pre><code class="Python">from tpot import TPOTRegressor
<pre><code class="language-Python">from tpot import TPOTRegressor
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

Expand All @@ -881,7 +876,6 @@ <h2 id="regression">Regression</h2>
print(tpot.score(X_test, y_test))
tpot.export('tpot_boston_pipeline.py')
</code></pre>

<p><strong>Functions</strong></p>
<table width="100%">
<tr>
Expand All @@ -906,9 +900,8 @@ <h2 id="regression">Regression</h2>
</table>

<p><a name="tpotregressor-fit"></a></p>
<pre><code class="Python">fit(features, target, sample_weight=None, groups=None)
<pre><code class="language-Python">fit(features, target, sample_weight=None, groups=None)
</code></pre>

<div style="padding-left:5%" width="100%">
Run the TPOT optimization process on the given training data.
<br /><br />
Expand Down Expand Up @@ -960,9 +953,8 @@ <h2 id="regression">Regression</h2>
</div>

<p><a name="tpotregressor-predict"></a></p>
<pre><code class="Python">predict(features)
<pre><code class="language-Python">predict(features)
</code></pre>

<div style="padding-left:5%" width="100%">
Use the optimized pipeline to predict the target values for a feature set.
<br /><br />
Expand All @@ -989,9 +981,8 @@ <h2 id="regression">Regression</h2>
</div>

<p><a name="tpotregressor-score"></a></p>
<pre><code class="Python">score(testing_features, testing_target)
<pre><code class="language-Python">score(testing_features, testing_target)
</code></pre>

<div style="padding-left:5%" width="100%">
Returns the optimized pipeline's score on the given testing data using the user-specified scoring function.
<br /><br />
Expand Down Expand Up @@ -1025,9 +1016,8 @@ <h2 id="regression">Regression</h2>
</div>

<p><a name="tpotregressor-export"></a></p>
<pre><code class="Python">export(output_file_name)
<pre><code class="language-Python">export(output_file_name)
</code></pre>

<div style="padding-left:5%" width="100%">
Export the optimized pipeline as Python code.
<br /><br />
Expand Down
9 changes: 3 additions & 6 deletions docs/citing/index.html
Expand Up @@ -128,7 +128,7 @@ <h1 id="citing-tpot">Citing TPOT</h1>
<p>If you use TPOT in a scientific publication, please consider citing at least one of the following papers:</p>
<p>Trang T. Le, Weixuan Fu and Jason H. Moore (2020). <a href="https://academic.oup.com/bioinformatics/article/36/1/250/5511404">Scaling tree-based automated machine learning to biomedical big data with a feature set selector</a>. <em>Bioinformatics</em>.36(1): 250-256.</p>
<p>BibTeX entry:</p>
<pre><code class="bibtex">@article{le2020scaling,
<pre><code class="language-bibtex">@article{le2020scaling,
title={Scaling tree-based automated machine learning to biomedical big data with a feature set selector},
author={Le, Trang T and Fu, Weixuan and Moore, Jason H},
journal={Bioinformatics},
Expand All @@ -139,10 +139,9 @@ <h1 id="citing-tpot">Citing TPOT</h1>
publisher={Oxford University Press}
}
</code></pre>

<p>Randal S. Olson, Ryan J. Urbanowicz, Peter C. Andrews, Nicole A. Lavender, La Creis Kidd, and Jason H. Moore (2016). <a href="http://link.springer.com/chapter/10.1007/978-3-319-31204-0_9">Automating biomedical data science through tree-based pipeline optimization</a>. <em>Applications of Evolutionary Computation</em>, pages 123-137.</p>
<p>BibTeX entry:</p>
<pre><code class="bibtex">@inbook{Olson2016EvoBio,
<pre><code class="language-bibtex">@inbook{Olson2016EvoBio,
author={Olson, Randal S. and Urbanowicz, Ryan J. and Andrews, Peter C. and Lavender, Nicole A. and Kidd, La Creis and Moore, Jason H.},
editor={Squillero, Giovanni and Burelli, Paolo},
chapter={Automating Biomedical Data Science Through Tree-Based Pipeline Optimization},
Expand All @@ -155,11 +154,10 @@ <h1 id="citing-tpot">Citing TPOT</h1>
url={http://dx.doi.org/10.1007/978-3-319-31204-0_9}
}
</code></pre>

<p>Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science</p>
<p>Randal S. Olson, Nathan Bartley, Ryan J. Urbanowicz, and Jason H. Moore (2016). <a href="http://dl.acm.org/citation.cfm?id=2908918">Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science</a>. <em>Proceedings of GECCO 2016</em>, pages 485-492.</p>
<p>BibTeX entry:</p>
<pre><code class="bibtex">@inproceedings{OlsonGECCO2016,
<pre><code class="language-bibtex">@inproceedings{OlsonGECCO2016,
author = {Olson, Randal S. and Bartley, Nathan and Urbanowicz, Ryan J. and Moore, Jason H.},
title = {Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science},
booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference 2016},
Expand All @@ -176,7 +174,6 @@ <h1 id="citing-tpot">Citing TPOT</h1>
address = {New York, NY, USA},
}
</code></pre>

<p>Alternatively, you can cite the repository directly with the following DOI:</p>
<p><a href="https://zenodo.org/badge/latestdoi/20747/rhiever/tpot">DOI</a></p>

Expand Down
35 changes: 21 additions & 14 deletions docs/examples/index.html
Expand Up @@ -194,14 +194,28 @@ <h1 id="overview">Overview</h1>
<td align="center"><a href="https://archive.ics.uci.edu/ml/datasets/MAGIC+Gamma+Telescope">link</a></td>
<td align="center"><a href="https://github.com/EpistasisLab/tpot/blob/master/tutorials/MAGIC%20Gamma%20Telescope/MAGIC%20Gamma%20Telescope.ipynb">link</a></td>
</tr>
<tr>
<td>cuML Classification Example</td>
<td>random classification problem</td>
<td>classification</td>
<td align="center"><a href="https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html">link</a></td>
<td align="center"><a href="https://github.com/EpistasisLab/tpot/blob/master/tutorials/cuML_Classification_Example.ipynb">link</a></td>
</tr>
<tr>
<td>cuML Regression Example</td>
<td>random regression problem</td>
<td>regression</td>
<td align="center"><a href="https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html">link</a></td>
<td align="center"><a href="https://github.com/EpistasisLab/tpot/blob/master/tutorials/cuML_Regression_Example.ipynb">link</a></td>
</tr>
</tbody>
</table>
<p><strong>Notes:</strong>
- For details on how the <code>fit()</code>, <code>score()</code> and <code>export()</code> methods work, refer to the <a href="/using/">usage documentation</a>.
- Upon re-running the experiments, your resulting pipelines <em>may</em> differ (to some extent) from the ones demonstrated here.</p>
<h2 id="iris-flower-classification">Iris flower classification</h2>
<p>The following code illustrates how TPOT can be employed for performing a simple <em>classification task</em> over the Iris dataset.</p>
<pre><code class="Python">from tpot import TPOTClassifier
<pre><code class="language-Python">from tpot import TPOTClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import numpy as np
Expand All @@ -215,9 +229,8 @@ <h2 id="iris-flower-classification">Iris flower classification</h2>
print(tpot.score(X_test, y_test))
tpot.export('tpot_iris_pipeline.py')
</code></pre>

<p>Running this code should discover a pipeline (exported as <code>tpot_iris_pipeline.py</code>) that achieves about 97% test accuracy:</p>
<pre><code class="Python">import numpy as np
<pre><code class="language-Python">import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
Expand All @@ -242,10 +255,9 @@ <h2 id="iris-flower-classification">Iris flower classification</h2>
exported_pipeline.fit(training_features, training_target)
results = exported_pipeline.predict(testing_features)
</code></pre>

<h2 id="digits-dataset">Digits dataset</h2>
<p>Below is a minimal working example with the optical recognition of handwritten digits dataset, which is an <em>image classification problem</em>.</p>
<pre><code class="Python">from tpot import TPOTClassifier
<pre><code class="language-Python">from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

Expand All @@ -258,9 +270,8 @@ <h2 id="digits-dataset">Digits dataset</h2>
print(tpot.score(X_test, y_test))
tpot.export('tpot_digits_pipeline.py')
</code></pre>

<p>Running this code should discover a pipeline (exported as <code>tpot_digits_pipeline.py</code>) that achieves about 98% test accuracy:</p>
<pre><code class="Python">import numpy as np
<pre><code class="language-Python">import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
Expand Down Expand Up @@ -288,10 +299,9 @@ <h2 id="digits-dataset">Digits dataset</h2>
exported_pipeline.fit(training_features, training_target)
results = exported_pipeline.predict(testing_features)
</code></pre>

<h2 id="boston-housing-prices-modeling">Boston housing prices modeling</h2>
<p>The following code illustrates how TPOT can be employed for performing a <em>regression task</em> over the Boston housing prices dataset.</p>
<pre><code class="Python">from tpot import TPOTRegressor
<pre><code class="language-Python">from tpot import TPOTRegressor
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

Expand All @@ -304,9 +314,8 @@ <h2 id="boston-housing-prices-modeling">Boston housing prices modeling</h2>
print(tpot.score(X_test, y_test))
tpot.export('tpot_boston_pipeline.py')
</code></pre>

<p>Running this code should discover a pipeline (exported as <code>tpot_boston_pipeline.py</code>) that achieves at least 10 mean squared error (MSE) on the test set:</p>
<pre><code class="Python">import numpy as np
<pre><code class="language-Python">import numpy as np
import pandas as pd
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.model_selection import train_test_split
Expand All @@ -331,7 +340,6 @@ <h2 id="boston-housing-prices-modeling">Boston housing prices modeling</h2>
exported_pipeline.fit(training_features, training_target)
results = exported_pipeline.predict(testing_features)
</code></pre>

<h2 id="titanic-survival-analysis">Titanic survival analysis</h2>
<p>To see the TPOT applied the Titanic Kaggle dataset, see the Jupyter notebook <a href="https://github.com/EpistasisLab/tpot/blob/master/tutorials/Titanic_Kaggle.ipynb">here</a>. This example shows how to take a messy dataset and preprocess it such that it can be used in scikit-learn and TPOT.</p>
<h2 id="portuguese-bank-marketing">Portuguese Bank Marketing</h2>
Expand All @@ -340,7 +348,7 @@ <h2 id="magic-gamma-telescope">MAGIC Gamma Telescope</h2>
<p>The corresponding Jupyter notebook, containing the associated data preprocessing and analysis, can be found <a href="https://github.com/EpistasisLab/tpot/blob/master/tutorials/MAGIC%20Gamma%20Telescope/MAGIC%20Gamma%20Telescope.ipynb">here</a>.</p>
<h2 id="neural-network-classifier-using-tpot-nn">Neural network classifier using TPOT-NN</h2>
<p>By loading the <a href="https://github.com/EpistasisLab/tpot/blob/master/tpot/config/classifier_nn.py">TPOT-NN configuration dictionary</a>, PyTorch estimators will be included for classification. Users can also create their own NN configuration dictionary that includes <code>tpot.builtins.PytorchLRClassifier</code> and/or <code>tpot.builtins.PytorchMLPClassifier</code>, or they can specify them using a template string, as shown in the following example:</p>
<pre><code class="Python">from tpot import TPOTClassifier
<pre><code class="language-Python">from tpot import TPOTClassifier
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split

Expand All @@ -353,7 +361,6 @@ <h2 id="neural-network-classifier-using-tpot-nn">Neural network classifier using
print(clf.score(X_test, y_test))
clf.export('tpot_nn_demo_pipeline.py')
</code></pre>

<p>This example is somewhat trivial, but it should result in nearly 100% classification accuracy.</p>

</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/index.html
Expand Up @@ -204,5 +204,5 @@

<!--
MkDocs version : 1.1.2
Build Date UTC : 2020-07-21 20:34:39.398221+00:00
Build Date UTC : 2020-10-26 14:32:58.841000+00:00
-->

0 comments on commit 08e9db8

Please sign in to comment.