Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix for SBM estimator with some size-one block and loops=False #1056

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
13 changes: 13 additions & 0 deletions docs/reference/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,19 @@
Release Log
===========

graspologic 3.3.0
-----------------
- Added features and bugfixes to ``heatmap``
`#750 <https://github.com/microsoft/graspologic/pull/750>`
- Fixed type specification bugs related to Numpy 1.25 release
`#1047 <https://github.com/microsoft/graspologic/pull/1047>`
- Added option for more efficient graph matching matrix operations
`#1046 <https://github.com/microsoft/graspologic/pull/1046>`
- Added an axis argument to ``screeplot``
`#1048 <https://github.com/microsoft/graspologic/pull/1048>`
- Fixed compatibility issues related to matplotlib 3.8 release
`#1049 <https://github.com/microsoft/graspologic/pull/1049>`

graspologic 3.2.0
-----------------
- Added Python 3.11 support
Expand Down
105 changes: 79 additions & 26 deletions docs/tutorials/plotting/heatmaps.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -16,27 +16,29 @@
"outputs": [],
"source": [
"import graspologic\n",
"\n",
"import numpy as np\n",
"%matplotlib inline"
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plotting graphs using heatmap\n",
"\n",
"### Simulate graphs using weighted stochastic block models\n",
"The 2-block model is defined as below:\n",
"## Plotting Simple Graphs using heatmap"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A 2-block Stochastic Block Model is defined as below:\n",
"\n",
"\\begin{align*}\n",
"P = \\begin{bmatrix}0.8 & 0.2 \\\\\n",
"0.2 & 0.8 \n",
"\\end{bmatrix}\n",
"\\end{align*}\n",
"\n",
"We generate two weight SBMs where the weights are distributed from a Poisson(3) and Normal(5, 1)."
"In simple cases, the model is unweighted. Below, we plot an unweighted SBM."
]
},
{
Expand All @@ -46,25 +48,28 @@
"outputs": [],
"source": [
"from graspologic.simulations import sbm\n",
"from graspologic.plot import heatmap\n",
"\n",
"n_communities = [50, 50]\n",
"p = [[0.8, 0.2], \n",
" [0.2, 0.8]]\n",
"\n",
"wt = np.random.poisson\n",
"wtargs = dict(lam=3)\n",
"A_poisson= sbm(n_communities, p, wt=wt, wtargs=wtargs)\n",
"\n",
"wt = np.random.normal\n",
"wtargs = dict(loc=5, scale=1)\n",
"A_normal = sbm(n_communities, p, wt=wt, wtargs=wtargs)"
"A, labels = sbm(n_communities, p, return_labels=True)\n",
"heatmap(A, title=\"Basic Heatmap function\");"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plotting with Hierarchy Labels"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plot the simulated weighted SBMs"
"If we have labels, we can use them to show communities on a Heatmap."
]
},
{
Expand All @@ -73,10 +78,54 @@
"metadata": {},
"outputs": [],
"source": [
"from graspologic.plot import heatmap\n",
"\n",
"title = 'Weighted Stochastic Block Model with Poisson(3)'\n",
"heatmap(A, inner_hier_labels=labels)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can plot outer hierarchy labels in addition to inner hierarchy labels."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"outer_labels = [\"Outer Labels\"] * 100\n",
"heatmap(A, inner_hier_labels=labels,\n",
" outer_hier_labels=outer_labels)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Weighted SBMs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also use heatmap when our graph is weighted. Here, we generate two weighted SBMs where the weights are distributed from a Poisson(3) and Normal(5, 1)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Draw weights from a Poisson(3) distribution\n",
"wt = np.random.poisson\n",
"wtargs = dict(lam=3)\n",
"A_poisson= sbm(n_communities, p, wt=wt, wtargs=wtargs)\n",
"\n",
"# Plot\n",
"title = 'Weighted Stochastic Block Model with \\n weights drawn from a Poisson(3) distribution'\n",
"fig= heatmap(A_poisson, title=title)"
]
},
Expand All @@ -86,18 +135,23 @@
"metadata": {},
"outputs": [],
"source": [
"title = 'Weighted Stochastic Block Model with Normal(5, 1)'\n",
"# Draw weights from a Normal(5, 1) distribution\n",
"wt = np.random.normal\n",
"wtargs = dict(loc=5, scale=1)\n",
"A_normal = sbm(n_communities, p, wt=wt, wtargs=wtargs)\n",
"\n",
"fig= heatmap(A_normal, title=title)"
"# Plot\n",
"title = 'Weighted Stochastic Block Model with \\n weights drawn from a Normal(5, 1) distribution'\n",
"fig = heatmap(A_normal, title=title)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### You can also change color maps\n",
"### Colormaps\n",
"\n",
"See [here](https://matplotlib.org/tutorials/colors/colormaps.html) for a list of colormaps"
"You can change colormaps. See [here](https://matplotlib.org/tutorials/colors/colormaps.html) for a list of colormaps."
]
},
{
Expand All @@ -107,8 +161,7 @@
"outputs": [],
"source": [
"title = 'Weighted Stochastic Block Model with Poisson(3)'\n",
"\n",
"fig= heatmap(A_poisson, title=title, transform=None, cmap=\"binary\", center=None)"
"fig = heatmap(A_poisson, title=title, transform=None, cmap=\"binary\", center=None)"
]
},
{
Expand Down Expand Up @@ -203,7 +256,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.0"
"version": "3.8.5"
}
},
"nbformat": 4,
Expand Down
4 changes: 2 additions & 2 deletions graspologic/embed/svd.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@
import scipy
import scipy.sparse as sp
import sklearn
from scipy.sparse import csr_array
from scipy.stats import norm
from typing_extensions import Literal

from graspologic.types import List, Tuple
from graspologic.utils import is_almost_symmetric

SvdAlgorithmType = Literal["full", "truncated", "randomized", "eigsh"]

Expand Down Expand Up @@ -107,7 +107,7 @@ def select_dimension(
pp.918-930.
"""
# Handle input data
if not isinstance(X, np.ndarray) and not sp.isspmatrix_csr(X):
if not isinstance(X, (np.ndarray, csr_array)):
msg = "X must be a numpy array or scipy.sparse.csr_array, not {}.".format(
type(X)
)
Expand Down
4 changes: 2 additions & 2 deletions graspologic/layouts/render.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,9 +127,9 @@ def _draw_graph(
for source, target in graph.edges():
edge_color_list.append(node_colors[source])

ax.set_xbound(x_domain)
ax.set_xbound(*x_domain)
ax.set_xlim(x_domain)
ax.set_ybound(y_domain)
ax.set_ybound(*y_domain)
ax.set_ylim(y_domain)

nx.draw_networkx_edges(
Expand Down
68 changes: 52 additions & 16 deletions graspologic/match/solver.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@
from beartype import beartype
from ot import sinkhorn
from scipy.optimize import linear_sum_assignment
from scipy.sparse import csr_array
from sklearn.utils import check_scalar

from graspologic.types import List, RngType, Tuple
Expand Down Expand Up @@ -82,6 +81,7 @@ def __init__(
transport_regularizer: Scalar = 100,
transport_tol: Scalar = 5e-2,
transport_max_iter: Int = 1000,
fast: bool = True,
):
# TODO check if init is doubly stochastic
self.init = init
Expand Down Expand Up @@ -113,6 +113,8 @@ def __init__(
)
self.transport_max_iter = transport_max_iter

self.fast = fast

if maximize:
self.obj_func_scalar = -1
else:
Expand Down Expand Up @@ -406,6 +408,7 @@ def compute_step_size(self, P: np.ndarray, Q: np.ndarray) -> float:
self.BA_ns,
self.BA_sn,
self.S_nn,
fast=self.fast,
)
if a * self.obj_func_scalar > 0 and 0 <= -b / (2 * a) <= 1:
alpha = -b / (2 * a)
Expand Down Expand Up @@ -538,6 +541,14 @@ def _compute_gradient(
return grad


def _fast_trace(X: np.ndarray, Y: np.ndarray) -> float:
return (X * Y.T).sum()


def _fast_traceT(X: np.ndarray, Y: np.ndarray) -> float:
return (X * Y).sum()


def _compute_coefficients(
P: np.ndarray,
Q: np.ndarray,
Expand All @@ -554,25 +565,50 @@ def _compute_coefficients(
BA_ns: MultilayerAdjacency,
BA_sn: MultilayerAdjacency,
S: AdjacencyMatrix,
fast: bool,
) -> Tuple[float, float]:
R = P - Q
# TODO make these "smart" traces like in the scipy code, couldn't hurt
# TODO can also refactor to not repeat multiplications like the old code but I was
# finding it harder to follow that way.

n_layers = len(A)
a_cross = 0
b_cross = 0
a_intra = 0
b_intra = 0
a_cross = 0.0
b_cross = 0.0
a_intra = 0.0
b_intra = 0.0

for i in range(n_layers):
a_cross += np.trace(AB[i].T @ R @ BA[i] @ R)
b_cross += np.trace(AB[i].T @ R @ BA[i] @ Q) + np.trace(AB[i].T @ Q @ BA[i] @ R)
b_cross += np.trace(AB_ns[i].T @ R @ BA_ns[i]) + np.trace(
AB_sn[i].T @ BA_sn[i] @ R
)
a_intra += np.trace(A[i] @ R @ B[i].T @ R.T)
b_intra += np.trace(A[i] @ Q @ B[i].T @ R.T) + np.trace(A[i] @ R @ B[i].T @ Q.T)
b_intra += np.trace(A_ns[i].T @ R @ B_ns[i]) + np.trace(A_sn[i] @ R @ B_sn[i].T)
if fast:
# could maybe be even faster if we do `opt_einsum` or something
ABiTR = AB[i].T @ R
BAiR = BA[i] @ R
AiR = A[i] @ R
RBi = R @ B[i]

a_cross += _fast_trace(ABiTR, BAiR)
b_cross += _fast_trace(ABiTR, BA[i] @ Q)
b_cross += _fast_trace(AB[i].T @ Q, BAiR)
b_cross += _fast_trace(AB_ns[i].T @ R, BA_ns[i])
b_cross += _fast_trace(AB_sn[i].T @ BA_sn[i], R)

a_intra += _fast_traceT(AiR, RBi)
b_intra += _fast_traceT(A[i] @ Q, RBi)
b_intra += _fast_traceT(AiR, Q @ B[i])
b_intra += _fast_trace(A_ns[i].T @ R, B_ns[i])
b_intra += _fast_traceT(A_sn[i] @ R, B_sn[i])
else:
a_cross += np.trace(AB[i].T @ R @ BA[i] @ R)
b_cross += np.trace(AB[i].T @ R @ BA[i] @ Q) + np.trace(
AB[i].T @ Q @ BA[i] @ R
)
b_cross += np.trace(AB_ns[i].T @ R @ BA_ns[i]) + np.trace(
AB_sn[i].T @ BA_sn[i] @ R
)
a_intra += np.trace(A[i] @ R @ B[i].T @ R.T)
b_intra += np.trace(A[i] @ Q @ B[i].T @ R.T) + np.trace(
A[i] @ R @ B[i].T @ Q.T
)
b_intra += np.trace(A_ns[i].T @ R @ B_ns[i]) + np.trace(
A_sn[i] @ R @ B_sn[i].T
)

a = a_cross + a_intra
b = b_cross + b_intra
Expand Down
9 changes: 8 additions & 1 deletion graspologic/match/wrappers.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ def graph_match(
transport_regularizer: Scalar = 100,
transport_tol: Scalar = 5e-2,
transport_max_iter: Int = 1000,
fast: bool = True,
) -> MatchResult:
"""
Attempts to solve the Graph Matching Problem or the Quadratic Assignment Problem
Expand Down Expand Up @@ -192,6 +193,12 @@ def graph_match(
Setting this value higher may provide more precise solutions at the cost of
longer computation time.

fast: bool, default=True
Whether to use numerical shortcuts to speed up the computation. Typically will
be faster for most applications, although requires storing intermediate
computations in memory which may be undesirable for very large inputs or when
memory is a bottleneck.

Returns
-------
res: MatchResult
Expand Down Expand Up @@ -281,7 +288,6 @@ def graph_match(
partial_match=partial_match,
init=init,
init_perturbation=init_perturbation,
verbose=solver_verbose,
shuffle_input=shuffle_input,
padding=padding,
maximize=maximize,
Expand All @@ -291,6 +297,7 @@ def graph_match(
transport_regularizer=transport_regularizer,
transport_tol=transport_tol,
transport_max_iter=transport_max_iter,
fast=fast,
)

def run_single_graph_matching(seed: RngType) -> MatchResult:
Expand Down
2 changes: 2 additions & 0 deletions graspologic/models/sbm_estimators.py
Original file line number Diff line number Diff line change
Expand Up @@ -516,6 +516,8 @@ def _calculate_block_p(
if return_counts:
p = np.count_nonzero(block)
else:
if block.size == 0:
continue
p = _calculate_p(block)
block_p[from_block, to_block] = p
return block_p
Expand Down