Stochastic Neighbor and Crowd Kernel (SNaCK) embedding

Quick and dirty visualization of large-scale datasets via concept embeddings

This code (and the companion paper) showcase our work on “SNaCK,” a low-dimensional concept embedding algorithm that combines human expertise with automatic machine similarity kernels. Both parts are complimentary: human insight can capture relationships that are not apparent from the object’s visual similarity and the machine can help relieve the human from having to exhaustively specify many constraints.

As input, our SNaCK algorithm takes two sources:

Several "relative similarity comparisons." Each constraint has the form (a,b,c), meaning that in the lower-dimension embedding Y, Y[a] should be closer to Y[b] than it is to Y[c]. Experts can generate many of these constraints using crowdsourcing.
Feature vector representations of each point. For instance, such features could come from HOG, SIFT, a deep-learned CNN, word embeddings, or some other representation.

SNaCK then generates an embedding that satisfies both classes of constraints.

Usage

See http://nbviewer.ipython.org/github/cornelltech/snack/blob/master/Examples.ipynb for documentation on SNaCK's parameters and example usage.

Launch a notebook with SNaCK and Food-10k

Click the above button to launch an IPython Notebook with SNaCK. This environment has a copy of SNaCK and comes pre-loaded with a copy of the Food-10k dataset for you to explore.

This service is made possible by the folks at binder

Installation

The following platforms are supported:

Python 2.7 on Linux, x64
- Binary packages available on Conda
- Source packages available from Pip
Python 2.7 on OSX
- Binary packages for Yosemite available on Conda
- Source packages available from Pip (Homebrew-GCC required)

Linux and Mac OS X: Install from Conda (Preferred)

Please use Conda. Your life will be easier.

Just run:

$ conda install -c https://conda.anaconda.org/gcr snack

If you insist on compiling from source, read on:

Linux: Install from source with Pip

Just run:

$ pip install snack

You need to install Python 2.7, Numpy, and Cython. You also need a working compiler, CBLAS, and the Python development headers, which are installable from your distribution's package manager.

To install SNaCK and its dependencies on a clean Ubuntu Trusty x64 system, run:

# sudo aptitude install \
  build-essential       \
  python-dev            \
  libblas3              \
  libblas-dev           \
  python-virtualenv
$ virtualenv venv; source venv/bin/activate
$ pip install numpy
$ pip install cython
$ pip install snack

OS X: Install from source with Pip and Homebrew

SNaCK uses OpenMP. This makes compilation tricky on Mac OS X.

If you are on Mac OS X, you must install the real "not-clang" version of gcc because it has OpenMP support. At the time of writing, clang does not support OpenMP, and Apple has unhelpfully symlinked clang to /usr/bin/gcc. This will not work.

Using Apple-provided GCC is NOT supported. If gcc-5 --version contains the string clang anywhere in its output, you do not have the correct version of gcc.

Using Apple-provided Python is NOT supported.

The recommended installation method on OS X is with Homebrew. The following has been tested on a clean Yosemite installation.

$ brew install gcc
$ brew install python
$ virtualenv venv; source venv/bin/activate
$ pip install numpy
$ pip install cython
$ pip install snack

You may need to edit setup.py and change GCC_VERSION to point to the correct version if you are not using /usr/local/bin/gcc-5.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
conda-build-files		conda-build-files
lib-bhtsne		lib-bhtsne
snack		snack
.gitignore		.gitignore
Examples.ipynb		Examples.ipynb
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
PACKAGING.org		PACKAGING.org
README.md		README.md
Vagrantfile		Vagrantfile
gradient_check.py		gradient_check.py
setup.py		setup.py
snack-logo.jpg		snack-logo.jpg
test.py		test.py

License

cornelltech/snack

Folders and files

Latest commit

History

Repository files navigation

Stochastic Neighbor and Crowd Kernel (SNaCK) embedding

Usage

Launch a notebook with SNaCK and Food-10k

Installation

Linux and Mac OS X: Install from Conda (Preferred)

Linux: Install from source with Pip

OS X: Install from source with Pip and Homebrew

About

Resources

License

Stars

Watchers

Forks

Languages