Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Graph Encoder Embedding #986

Closed
wants to merge 24 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ jobs:
strategy:
matrix:
os: [ubuntu-latest, windows-latest, macos-latest]
python_version: ["3.7", "3.8", "3.9"]
python_version: ["3.8", "3.9", "3.10"]
fail-fast: false
steps:
- uses: actions/checkout@v2
Expand Down
5 changes: 2 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ branch using a virtual environment. Steps:
right of the page. This creates a copy of the code under your GitHub user account. For more details on how to
fork a repository see [this guide](https://help.github.com/articles/fork-a-repo/).

2. Clone your fork of the `graspologic` repo from your GitHub account to your local disk:
2. Clone your fork of the `graspologic` repo from your GitHub account to your local disk. Do this by typing the following into command prompt or the equivelant on your operating system:

```bash
git clone git@github.com:YourGithubAccount/graspologic.git
Expand All @@ -87,8 +87,7 @@ branch using a virtual environment. Steps:
Always use a `feature` branch. Pull requests directly to either `dev` or `main` will be rejected
until you create a feature branch based on `dev`.

4. From the project root, create a [virtual environment](https://docs.python.org/3/library/venv.html) and install all development dependencies. Examples using various terminals are provided below. These examples use Python 3.8 but you may use any Python version supported by graspologic. These commands should install `graspologic` in editable mode, as well as
all of its dependencies and several tools you need for developing `graspologic`.
4. From the project root, create a [virtual environment](https://docs.python.org/3/library/venv.html) and install all development dependencies. Examples using various terminals are provided below. These examples use Python 3.8 but you may use any Python version supported by graspologic. If using Python 3.8 does not work feel free to type the same command simply using "Python" instead of "Python 3.8".These commands should install `graspologic` in editable mode, as well as all of its dependencies and several tools you need for developing `graspologic`. These commands assume that your operating system has already activated virtual environments which will allow virtual environments to be created.

**Bash**
```bash
Expand Down
10 changes: 8 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
<!-- omit in toc -->
# graspologic
[![Paper shield](https://img.shields.io/badge/JMLR-Paper-red)](http://www.jmlr.org/papers/volume20/19-490/19-490.pdf)
[![PyPI version](https://img.shields.io/pypi/v/graspologic.svg)](https://pypi.org/project/graspologic/)
[![Downloads shield](https://pepy.tech/badge/graspologic)](https://pepy.tech/project/graspologic)
![graspologic CI](https://github.com/microsoft/graspologic/workflows/graspologic%20CI/badge.svg)
[![DOI](https://zenodo.org/badge/147768493.svg)](https://zenodo.org/badge/latestdoi/147768493)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

## `graspologic` is a package for graph statistical algorithms.
Expand All @@ -15,6 +15,7 @@
- [Contributing](#contributing)
- [License](#license)
- [Issues](#issues)
- [Citing `graspologic`](#citing-graspologic)

# Overview
A graph, or network, provides a mathematically intuitive representation of data with some sort of relationship between items. For example, a social network can be represented as a graph by considering all participants in the social network as nodes, with connections representing whether each pair of individuals in the network are friends with one another. Naively, one might apply traditional statistical techniques to a graph, which neglects the spatial arrangement of nodes within the network and is not utilizing all of the information present in the graph. In this package, we provide utilities and algorithms designed for the processing and analysis of graphs with specialized graph statistical algorithms.
Expand All @@ -25,31 +26,36 @@ The official documentation with usage is at https://microsoft.github.io/graspolo
Please visit the [tutorial section](https://microsoft.github.io/graspologic/latest/tutorials/index.html) in the official website for more in depth usage.

# System Requirements
<!-- omit in toc -->
## Hardware requirements
`graspologic` package requires only a standard computer with enough RAM to support the in-memory operations.

<!-- omit in toc -->
## Software requirements
<!-- omit in toc -->
### OS Requirements
`graspologic` is tested on the following OSes:
- Linux x64
- macOS x64
- Windows 10 x64

And across the following **x86_64** versions of Python:
- 3.7
- 3.8
- 3.9
- 3.10

If you try to use `graspologic` for a different platform than the ones listed and notice any unexpected behavior,
please feel free to [raise an issue](https://github.com/microsoft/graspologic/issues/new). It's better for ourselves and our users
if we have concrete examples of things not working!

# Installation Guide
<!-- omit in toc -->
## Install from pip
```
pip install graspologic
```

<!-- omit in toc -->
## Install from Github
```
git clone https://github.com/microsoft/graspologic
Expand Down
9 changes: 9 additions & 0 deletions docs/reference/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,15 @@
Release Log
===========

graspologic 2.0.1
-----------------
- Fixed bug with a matplotlib version incompatibility
`#996 <https://github.com/microsoft/graspologic/pull/996>`
- Fixed graph matching with similarity matrix of unequal dimensions
`#1002 <https://github.com/microsoft/graspologic/pull/1002>`
- Fixed bug with missing typing-extensions dependency
`#999 <https://github.com/microsoft/graspologic/pull/999>`

graspologic 2.0.0
-----------------
- Refactored graph matching code and added many new features
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/embedding/CovariateAssistedEmbed.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@
"def plot_latents(latent_positions, *, title, labels, ax=None):\n",
" if ax is None:\n",
" ax = plt.gca()\n",
" plot = sns.scatterplot(latent_positions[:, 0], latent_positions[:, 1], hue=labels, \n",
" plot = sns.scatterplot(x=latent_positions[:, 0], y=latent_positions[:, 1], hue=labels, \n",
" linewidth=0, s=10, ax=ax, palette=\"Set1\")\n",
" plot.set_title(title, wrap=True);\n",
" ax.axes.xaxis.set_visible(False)\n",
Expand Down
11 changes: 7 additions & 4 deletions graspologic/embed/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from .ase import AdjacencySpectralEmbed
from .base import BaseSpectralEmbed
from .case import CovariateAssistedEmbed
from .gee import GraphEncoderEmbed
from .lse import LaplacianSpectralEmbed
from .mase import MultipleASE
from .mds import ClassicalMDS
Expand All @@ -13,14 +14,16 @@
from .svd import select_dimension, select_svd

__all__ = [
"ClassicalMDS",
"OmnibusEmbed",
"AdjacencySpectralEmbed",
"BaseSpectralEmbed",
"ClassicalMDS",
"CovariateAssistedEmbed",
"GraphEncoderEmbed",
"LaplacianSpectralEmbed",
"mug2vec",
"MultipleASE",
"node2vec_embed",
"OmnibusEmbed",
"select_dimension",
"select_svd",
"BaseSpectralEmbed",
"CovariateAssistedEmbed",
]
165 changes: 165 additions & 0 deletions graspologic/embed/gee.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
# Copyright (c) Microsoft Corporation and contributors.
# Licensed under the MIT License.

import numpy as np
from numba import njit
from sklearn.base import BaseEstimator

from graspologic.types import AdjacencyMatrix, Tuple
from graspologic.utils import is_almost_symmetric


@njit
def _project_edges_numba(
sources: np.ndarray, targets: np.ndarray, weights: np.ndarray, W: np.ndarray
) -> np.ndarray:
n = W.shape[0]
k = W.shape[1]
Z = np.zeros((n, k))
# TODO redo with broadcasting/einsum?
for source, target, weight in zip(sources, targets, weights):
Z[source] += W[target] * weight
Z[target] += W[source] * weight
return Z


def _get_edges(adjacency: AdjacencyMatrix) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
sources, targets = np.nonzero(adjacency)

# handle the undireced case
# if undirected, we only need to iterate over the upper triangle of adjacency
if is_almost_symmetric(adjacency):
mask = sources <= targets # includes the diagonal
sources = sources[mask]
targets = targets[mask]

weights = adjacency[sources, targets]

return sources, targets, weights


def _scale_weights(
adjacency: AdjacencyMatrix,
sources: np.ndarray,
targets: np.ndarray,
weights: np.ndarray,
) -> np.ndarray:
# TODO implement regularized laplacian
degrees_out = np.sum(adjacency, axis=1)
degrees_in = np.sum(adjacency, axis=0)

# regularized laplacian
degrees_out += degrees_out.mean()
degrees_in += degrees_in.mean()

# # may have some cases where these are 0, so set to 1 avoid dividing by 0
# # doesn't actually mater since these never get multiplied
# degrees_out[degrees_out == 0] = 1
# degrees_in[degrees_in == 0] = 1
degrees_out_root = 1 / np.sqrt(degrees_out)
degrees_in_root = 1 / np.sqrt(degrees_in)

weights *= degrees_out_root[sources] * degrees_in_root[targets]
return weights


def _initialize_projection(features: np.ndarray) -> np.ndarray:
features_colsum = np.sum(features, axis=0)
W = features / features_colsum[None, :]
return W


class GraphEncoderEmbed(BaseEstimator):
def __init__(self, laplacian: bool = False) -> None:
"""
Implements the Graph Encoder Embedding of [1]_, which transforms an input
network and a matrix of node features into a low-dimensional embedding space.

Parameters
----------
laplacian : bool, optional
Whether to normalize the embedding by the degree of the input and output
nodes, by default False

References
----------
.. [1] C. Shen, Q. Wang, and C. Priebe, "One-Hot Graph Encoder Embedding,"
arXiv:2109.13098 (2021).
"""
self.laplacian = laplacian
super().__init__()

def fit(
self, adjacency: AdjacencyMatrix, features: np.ndarray
) -> "GraphEncoderEmbed":
"""Fit the embedding model to the input data.

Parameters
----------
adjacency : AdjacencyMatrix
n x n adjacency matrix of the graph
features : np.ndarray
n x k matrix of node features. These may be one-hot encoded community labels
or other node features.

Returns
-------
GraphEncoderEmbedding
The fitted embedding model
"""

sources, targets, weights = _get_edges(adjacency)

if self.laplacian:
weights = _scale_weights(adjacency, sources, targets, weights)

W = _initialize_projection(features)

Z = _project_edges_numba(sources, targets, weights, W)

self.embedding_ = Z
self.projection_ = W

return self

def fit_transform(
self, adjacency: AdjacencyMatrix, features: np.ndarray
) -> np.ndarray:
"""Fit the model to the input data and return the embedding.

Parameters
----------
adjacency : AdjacencyMatrix
n x n adjacency matrix of the graph
features : np.ndarray
n x k matrix of node features

Returns
-------
np.ndarray
The n x k embedding of the input graph
"""
self.fit(adjacency, features)
return self.embedding_

def transform(self, adjacency: AdjacencyMatrix) -> np.ndarray:
"""Transform the input adjacency matrix into the embedding space.

Parameters
----------
adjacency : AdjacencyMatrix
n x n adjacency matrix of the graph

Returns
-------
np.ndarray
The n x k embedding of the input graph
"""
sources, targets, weights = _get_edges(adjacency)

if self.laplacian:
weights = _scale_weights(adjacency, sources, targets, weights)

Z = _project_edges_numba(sources, targets, weights, self.projection_)

return Z
11 changes: 7 additions & 4 deletions graspologic/match/solver.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,9 @@ def __init__(
_compare_dimensions(B, AB, "row", "column", "B", "AB")
_compare_dimensions(A, BA, "row", "column", "A", "BA")
_compare_dimensions(B, BA, "row", "row", "B", "BA")
if S is not None:
_compare_dimensions(A, [S], "row", "row", "A", "S")
_compare_dimensions(B, [S], "column", "column", "B", "S")

# padding for unequally sized inputs
if self.n_A != self.n_B:
Expand All @@ -189,9 +192,11 @@ def __init__(
# check for similarity term
if S is None:
S = csr_array((self.n, self.n))
elif self.padded:
S = _adj_pad(S, n_padded=self.n, method="naive")

_compare_dimensions(A, [S], "row", "row", "A", "S")
_compare_dimensions(B, [S], "row", "column", "B", "S")
_compare_dimensions(B, [S], "column", "column", "B", "S")

self.A = A
self.B = B
Expand Down Expand Up @@ -643,9 +648,7 @@ def _multilayer_adj_pad(
return new_matrices


def _adj_pad(
matrix: AdjacencyMatrix, n_padded: Int, method: PaddingType
) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
def _adj_pad(matrix: AdjacencyMatrix, n_padded: Int, method: PaddingType) -> np.ndarray:
if isinstance(matrix, (csr_matrix, csr_array)) and (method == "adopted"):
msg = (
"Using adopted padding method with a sparse adjacency representation; this "
Expand Down
1 change: 0 additions & 1 deletion runtime.txt

This file was deleted.

15 changes: 8 additions & 7 deletions setup.cfg
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[metadata]
name = graspologic
version = 2.0.0
version = 3.0.0

description = A set of python modules for graph statistics
long_description = file: README.md
Expand All @@ -17,28 +17,29 @@ classifiers =
Topic :: Scientific/Engineering :: Mathematics
License :: OSI Approved :: MIT License
Programming Language :: Python :: 3
Programming Language :: Python :: 3.7
Programming Language :: Python :: 3.8
Programming Language :: Python :: 3.9
Programming Language :: Python :: 3.10

[options]
packages = find:
include_package_data = True
python_requires = >=3.7, <3.10
python_requires = >=3.8, <3.11
install_requires =
anytree>=2.8.0
beartype>=0.10.0
gensim>=4.0.0
gensim>=4.0.0,!=4.2.0 # bug with 4.2.0 on some platforms, issue #998
graspologic-native>=1.1.1
hyppo>=0.3.2 # bug with lower versions and scipy>=1.8
joblib>=0.17.0 # Older versions of joblib cause issue #806. Transitive dependency of hyppo.
matplotlib>=3.0.0,!=3.3.*
networkx>=2.1
matplotlib>=3.0.0,!=3.3.*,!=3.6.1
networkx>=2.1,<3.0
numpy>=1.8.1
POT>=0.7.0
seaborn>= 0.11.0
scikit-learn>=0.22.0
scipy>=1.4.0
typing-extensions>=4.4.0
umap-learn>=0.4.6

[options.packages.find]
Expand All @@ -49,7 +50,7 @@ exclude =
dev =
black
ipykernel>=5.1.0
ipython>=7.4.0
ipython>=7.4.0,!=8.7.0 # https://github.com/spatialaudio/nbsphinx/issues/687#issuecomment-1339271312
isort>=5.9.3
mypy>=0.910
nbsphinx>=0.8.7
Expand Down