Skip to content

Releases: ContextLab/hypertools

v0.8.0 (February, 2022)

12 Feb 03:29
564c1d4
Compare
Choose a tag to compare

updates to .geo file format

Hypertools now saves DataGeometry objects using the pickle file format internally, rather than HDF5. With improvements made to the built-in pickle module since Hypertools's initial release, this now generally results in smaller files that save and load more quickly. It also allows us to no longer depend on deepdish, which has compatibility issues with various pandas objects, doesn't offer pre-built wheels for more recent Python versions, and is largely no longer maintained.

If you need to load .geo files from the old format, hypertools.load now accepts a keyword-only argument, legacy. Install deepdish if necessary, and pass legacy=True to load older DataGeometry objects. You can then .save() them to convert them to the new format.

improvements to example datasets

All example data files have been upgraded to the new file format. Additionally, the three pre-trained scikit-learn Pipelines Hypertools provides (wiki_model, nips_model, and sotus_model) have been recreated from scratch using a newer scikit-learn version, better text preprocessing, and updated CountVectorizer and LatentDirichletAllocation parameters that result in overall better models.

The example DataGeometry objects associated with these three models (wiki, nips, and sotus) have been updated accordingly, and additionally now use IncrementalPCA as their default reducers, resulting in faster, deterministic transform outputs.

To use the new models and datasets, upgrade Hypertools to v0.8.0 (pip install -U hypertools) and remove the local cache of old versions ([[ -d ~/hypertools_data ]] && rm ~/hypertools_data/*). Older versions of Hypertools will continue to use the old example data.

Other improvements

  • Hypertools is now compatible with Python 3.9! This release is also compatible in principle with Python 3.10, but numba does not yet support Python 3.10, so certain dependencies will fail to install.
  • Hypertools now works with newer scikit-learn versions! The updates above to the example datasets make Hypertools fully compatible with recent scikit-learn releases (>=0.24). This should make Hypertools easier to use in Colaboratory notebooks and more flexible in general. If you need to use an older scikit-learn version, pip-install hypertools<0.8.0.
  • Hypertools now works with newer Matplotlib versions! Recent updates to matplotlib's plotting backends were causing Hypertools's plotting interface to fail on import. We've fixed these bugs and maintained backwards compatibility with older (deprecated) interactive plotting backends as well.

Other assorted changes

  • failures when loading example datasets and .geo files now raise HypertoolsIOError with clearer error messages
  • specifying a compression when saving a DataGeometry object raises a FutureWarning
  • CI tests now run with Python 3.6 -- 3.9, use mamba for faster environment setup, and generate more verbose output
  • dependencies and code required for Python 2/3 compatibility have been removed
  • various code causing RuntimeWarnings has been fixed

v0.7.0 (June 2021)

15 Jun 19:22
e7b7446
Compare
Choose a tag to compare

Control over matplotlib backend & various bug fixes

New features:

  • Create animated plots in an environment with a non-interactive matplotlib plotting backend set, without disrupting the global plotting backend
  • Create non-animated, interactive plots for easy inspection of data using the new interactive keyword argument
  • Set the plotting backend for a single plot using the new mpl_backend keyword argument, and easily switch between backends within a single Python interpreter session, IPython kernel, and even Jupyter notebook cell
  • Use the new hypertools.set_interactive_backend function to change the backend for all future plots, or use it as a context manager to temporarily switch to a different backend. You can also use this to create multiple animated/interactive plots simultaneously.
  • use hypertools's backend adjustments to control behavior of other plotting libraries
  • Set the $HYPERTOOLS_BACKEND environment variable to permanently set your preferred plotting backend for non-IPython environments

NB: Currently supported backends include TkInter, GTK, wxPython, Qt4, Qt5, Cocoa (aka MacOSX; MacOS only), notebook/nbAgg (Jupyter notebooks only), and ipympl/widget (Jupyter notebooks only). 3D and interactive plots may not render properly in Colab notebooks due to security restrictions imposed by the Colaboratory platform.

Bug fixes

  • importing hypertools in a notebook no longer creates phantom Python processes, issues warnings when TkInter isn't installed, fails if matplotlib.pyplot was imported first, or silently changes the plotting backend (fixes #242)
  • creating 3D plots with hypertools no longer alters the global matplotlib.rcParams object (fixes #243)
  • hypertools can now be imported for non-plotting-related uses in environments without a compatible GUI without throwing an error
  • IPython's TAB-completion no longer triggers a full import of hypertools or improperly sets the plotting backend based on the subprocess's environment
  • require scikit-learn<0.24 (full spec: scikit-learn>=0.19.1,!=0.22,<0.24) to avoid bug when loading pre-trained DataGeometry objects due to renamed sklearn module

v0.6.3 (October 2020)

02 Oct 21:38
9ac3dc1
Compare
Choose a tag to compare

dependency-related updates

  • allow scikit-learn>0.22. scikit-learn==0.22.0 contains a bug that affects the CountVectorizer vocabulary. This has been fixed in 0.23.0.
  • require umap-learn>=0.4.6. We previously avoided a bug in umap-learn<=0.4.5 by installing a pre-release version from GitHub. This has now been fixed in umap-learn==0.4.6
  • Beginning with seaborn==0.11.0, "dark" color palettes are returned in reverse order from how they were previously. This difference in behavior will be reflected in hypertools, but we've changed the default cmap in hypertools._shared.helpers.vals2colors to a non-dark palette for consistent default behavior.
  • Added tests for Python 3.8

v0.6.2 (December 2019)

18 Dec 23:00
eca7cff
Compare
Choose a tag to compare

minor patch that enables dependencies not hosted on PyPI to install properly

  • setup.py's setup command is now a custom class that inherits from setuptools.command.install.install, runs the regular installation process, then pip-installs UMAP from its GitHub URL at a pre-release commit hash. This is completely equivalent to manually running pip install git+<URL>, but takes the burden of having to do so off of end-users.
  • removed URL from requirements.txt, added a comment in its place
  • added MANIFEST.IN file to include requirements.txt
  • updated minimum Python version listed on PyPI page to 3.5 to reflect that Python 3.4 support was dropped in v0.5.1 (August 2018)

This version is tagged as 0.6.2 to keep the versioning here and on PyPI consistent. The fix intended to be 0.6.1 was unsuccessful on TestPyPI, and PyPI does not allow removing and reuploading an existing version.

v0.6.0 (December 2019)

18 Dec 19:14
4808f9e
Compare
Choose a tag to compare

Updates to hypertools.reduce

  • fixed bug when to passing a dictionary of parameters to the reduce argument that would result in those parameters being overwritten
  • added some basic support for passing custom embedding models
  • added a warning when resolving conflicts between hypertools arguments and model-specific arguments

Other changes

  • dropped support for Python 2.7
  • fixed bug in Travis tests
  • replaced depreciated pandas.DataFrame method in hypertools.tools.df2mat
  • require installing UMAP from the GitHub repository due to bug fix not released yet.
  • updated setup.py to comply with PEP 508 guidelines for installing external dependencies
  • added unit test for hypertools.reduce bug fix
  • removed some unused imports and commented-out code
  • removed outdated pages from readthedocs
  • readthedocs build is now Python 3-based
  • build folder is ignored by default when installing from GitHub repository in editable mode

v0.5.1 (August 2018)

02 Aug 02:01
1bd82af
Compare
Choose a tag to compare
  • added flake8 to travis tests
  • refactored some of procrustes function code
  • removed support for python 3.4
  • removed hdbscan from dependencies (still can be used if installed manually)

Code cleanup (thanks @dwillmer!):

  • Changed string comparisons from if x is 'str' to if x == 'str'; the former is an identity comparison, not equality. It happens to be true for some strings because of string interning, but == should always be used for normal comparisons.
  • Removed unused arguments from _draw function - return_data and others weren't used in the function body.
  • Removed unreachable code in normalize function (branch criteria could never be True).
  • Separated out the multiply-nested function calls in DataGeometry class for clarity.
  • Changed comparisons of the formif type(x) is list to if isinstance(x, list); The former doesn't return True for subclasses, so isinstance should always be used.
  • Set unused loop variables to _.
  • Removed unused imports.
  • Ensured all imports are at the top of the file (except lazy / circular ones)
  • Ensure 2 blank lines above functions/classes (PEP8), the code looks a bit weird without this.
  • Fixed typo repect -> respect, was copy-pasted in multiple docstrings.
  • Removed redundant pass before error raise

v0.5.0 (April 2018)

18 Apr 20:29
Compare
Choose a tag to compare

Enhancements:

Plotting and transforming text data

  • hyp.plot now supports plotting text data. Simply pass a string, list of strings or list of lists of strings and the text will be transformed using a semantic model and plotted. By default, the text will be fit to a topic model (LDA) fit to a selection of wikipedia pages.
  • A new vectorizer argument in hyp.plot to specify a text vectorizer. Currently supports CountVectorizer, TfidfVectorizer`, or class instances (fit or unfit) of these models.
  • A new semantic argument in hyp.plot that specifies the semantic model to use to transform text. Current supports LatentDirichletAllocation, NMF, or class instances (fit or unfit) of these models.
  • A new corpus argument in hyp.plot that allows the user to specify text to fit a semantic model. Can be 'wiki', 'nips', 'sotus' or a custom list of text.
  • Enhanced hyp.format_data function that takes data in various forms (numpy array, dataframe, str, or list of str, or mixed list) and returns them in a standard format (a list of numpy arrays). This function can be used to transform text data using a semantic model.

New algorithms

  • A new clustering algorithm HDBSCAN (thanks @lmcinnes!) e.g. hyp.plot(data, cluster='HDBSCAN')
  • A new dimensionality reduction algorithm UMAP (thanks @lmcinnes!) e.g. hyp.plot(data, reduce='UMAP')

New parameters

  • A new size param to resize figure e.g. hyp.plot(data, size=[10,8])
  • A new ax param to add figure to existing axis e.g. hyp.plot(data, ax=ax)

New text examples

  • A new dataset of NIPS papers e.g. hyp.load('nips') (from kaggle)
  • A new dataset of selected wikipedia pages e.g. hyp.load('wiki')
  • A new dataset of State of the Union text from 1989-2017. Can be loaded as hyp.load('sotus') (from kaggle)

API changes
In hyp.plot changed group arg to hue (group will still be supported but depreciated in a coming release).

  • Removed deprecated describe_pca function. Please use more general function, describe.

Bugs fixed

  • When using chemtrails in hyp.plot, the entire timeseries would appear for the first few seconds of an animation and then dissapear.
  • The legend colors did not align with the data when using the fmt or color args.
  • When grouping with group/hue arg, labels were not reshuffled.
  • Fixed bug in describe function where correlations between data and reduced data would asymptote < 1.

NOTE: If you have been using the development version of 0.5.0, please clear your
data cache (/Users/yourusername/hypertools_data).

v0.4.2 (December 2017)

11 Dec 20:35
Compare
Choose a tag to compare
  • fixed bug in plot function where software would crash if reduce was specified as dict
  • added tutorials to readthedocs

v0.4.1 (November 2017)

19 Nov 17:23
640745b
Compare
Choose a tag to compare
  • exposed format_data which formats numpy array, pandas df or mixed list in list of numpy arrays(hypertools.tools.format_data)
  • added tests for the function to format_data
  • added documentation to format_data

v0.4.0 (October 2017)

12 Oct 21:37
Compare
Choose a tag to compare

Enhancements -

  • A new class: DataGeometry with methods for plotting, transforming new data and saving
    Support for loading *.geo objects
  • A new function: analyze to perform combinations of transformations
  • A new function: describe for characterizing the loss of information due to dimensionality reduction algs
  • In-memory caching of time-intensive reduce, align and describe operations
  • New syntax for reduce function: model and model_params are now passed as a dictionary using the reduce arg
  • New clustering models added to the cluster function: MiniBatchKMeans, AgglomerativeClustering, Birch, FeatureAgglomeration, and SpectralClustering
  • Moved major functions (normalize, align, reduce, cluster, load) to main level (i.e. hyp.load instead of hyp.tools.load, but the latter will still work)

Deprecations -

  • A deprecation warning is thrown for the following align arguments: normalize, ndims, and method
  • A deprecation warning is thrown for the following reduce arguments: model, model_params, align, and normalize
  • A deprecation warning is thrown for the following cluster arguments: ndims
  • A deprecation warning is thrown for the describe_pca function (replaced by describe)

Bugs -

  • fixed #148 bug in hyp.plot where figure would be rendered despite setting show=False (thanks @chaseWilliams !)
  • fixed a bug where n_clusters would not override group, even though a warning message said it would
  • fixed a bug where hyp.plot would quit if any kwargs were not the same length as the number of arrays in the list of input data.

Minor -

  • added brainiak toolbox citation and github link to align.py docstring
  • added additional details and fixed typos in align.py docstring
  • Upgraded seaborn requirement to 8.1
  • updated all examples/docs with new syntax changes
  • added new tests for new features