Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Hierarchical Dimensionality Reduction module #729

Open
dylanrstewart opened this issue Jul 28, 2022 · 0 comments
Open

Proposal: Hierarchical Dimensionality Reduction module #729

dylanrstewart opened this issue Jul 28, 2022 · 0 comments
Labels
enhancement New feature or request

Comments

@dylanrstewart
Copy link

Author of Proposal: Dylan Stewart

Reason or Problem

A common issue with multi-dimensional raster image processing (at the extremes, hyperspectral imagery with hundreds of features) is significant redundancy within the feature space. Some datasets have tens or hundreds of bands when only a handful might be necessary for downstream use (e.g., classification, segmentation, clustering).

Proposal

This module takes high dimensional data and a desired number of output channels or threshold, compares the distributions of the features within the data, and returns the most dissimilar grouping.

Design:

  1. Given a dataset containing $N$ pixels and $F$ features, produce a pairwise-distance matrix:
    $$C = F \times F,$$
    where $C$ can be computed using various metrics (e.g., Jensen-Shannon divergence, a symmetric Kullback-Leibler divergence, Mahalanobis Distance Add Mahalanobis Distance Metric #114, Euclidean distance) evaluated over the distribution of pixels within the dataset.
  2. Then, select the most similar pair of features (or spectra) by finding the minimum (for a distance/divergence measure) or maximum (similarity measure, e.g., mutual information or cosine similarity) and merge them by a specified aggregation (e.g., mean, median, max, min).
  3. Update $C$ based on 2. until stopping criteria is met. Return dataset with reduced dimensionality.

Usage: for reducing the dimensionality of an input by finding correlating features within and removing redundancy.

Value: provide support to high-dimensional raster processing applications (e.g., data fusion, hyperspectral, multispectral)

Additional Notes or Context

Some distance metrics already available to build from:

@dylanrstewart dylanrstewart added the enhancement New feature or request label Jul 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: 🆕 New
Development

No branches or pull requests

1 participant