Release 3.3.0 (#1051)

* Added Python 3.11 (#1039) * add python 3.11 * fix numba dep warning * add release note entry (#1040) * Updated heatmap (#750) * test pytest ini * add binary_heatmap, updates to tutorial notebook * test plotting * re-add pytest.ini * Update plot.py Fix a few bugs * add new tests * Update setup.cfg * remove binary_heatmap * mpl version req back to normal * remove binary_heatmap import * black * formatting fix --------- Co-authored-by: Benjamin Pedigo <benjamindpedigo@gmail.com> * Fixed bugs in type specifications from Numpy 1.25 release (#1047) * attempted fix for matrix type erroring * fix spec of csr_array * add a random seed * remove isspmatrix * fix a type check * black * Added option for more efficient graph matching matrix operations (#1046) * more efficient graph matching matrix operations * fix formatting * remove an unused import (unrelated) * expose fast kwarg to user, docs * run black formatter * add typehint * fix mypy --------- Co-authored-by: bkj <ben@jataware.com> Co-authored-by: bdpedigo <benjamindpedigo@gmail.com> * Added an `ax` argument for `screeplot` * Add ax argument to screeplot (now returns None) * add a return value for screeplot() * Refactored for best practices * Fixed Matplotlib 3.8 compatibility issues (#1049) * specify angle as kwarg * try removing some maybe unnecessary code? * fight with mypy * black * fix sorts * try switch to explain * fix intersphinx * other instances of wrong tutorial intersphinx * 3.3.0 release prep (#1050) * add release notes * bump version --------- Co-authored-by: Alex Loftus <alexloftus2004@gmail.com> Co-authored-by: Ben Johnson <bkj.322@gmail.com> Co-authored-by: bkj <ben@jataware.com> Co-authored-by: Prajwal Agrawal <61898798+kidkoder432@users.noreply.github.com>
graspologic-org · Sep 29, 2023 · 7e02114 · 7e02114
1 parent 7d96aa2
commit 7e02114
Show file tree

Hide file tree

Showing 12 changed files with 213 additions and 85 deletions.
diff --git a/docs/reference/release.rst b/docs/reference/release.rst
@@ -3,6 +3,19 @@
 Release Log
 ===========
 
+graspologic 3.3.0
+-----------------
+- Added features and bugfixes to ``heatmap``
+  `#750 <https://github.com/microsoft/graspologic/pull/750>`
+- Fixed type specification bugs related to Numpy 1.25 release
+  `#1047 <https://github.com/microsoft/graspologic/pull/1047>`
+- Added option for more efficient graph matching matrix operations
+  `#1046 <https://github.com/microsoft/graspologic/pull/1046>`
+- Added an axis argument to ``screeplot``
+  `#1048 <https://github.com/microsoft/graspologic/pull/1048>`
+- Fixed compatibility issues related to matplotlib 3.8 release 
+  `#1049 <https://github.com/microsoft/graspologic/pull/1049>`
+
 graspologic 3.2.0
 -----------------
 - Added Python 3.11 support

diff --git a/docs/tutorials/plotting/heatmaps.ipynb b/docs/tutorials/plotting/heatmaps.ipynb
@@ -16,27 +16,29 @@
    "outputs": [],
    "source": [
     "import graspologic\n",
-    "\n",
-    "import numpy as np\n",
-    "%matplotlib inline"
+    "import numpy as np"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Plotting graphs using heatmap\n",
-    "\n",
-    "### Simulate graphs using weighted stochastic block models\n",
-    "The 2-block model is defined as below:\n",
+    "## Plotting Simple Graphs using heatmap"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "A 2-block Stochastic Block Model is defined as below:\n",
     "\n",
     "\\begin{align*}\n",
     "P = \\begin{bmatrix}0.8 & 0.2 \\\\\n",
     "0.2 & 0.8 \n",
     "\\end{bmatrix}\n",
     "\\end{align*}\n",
     "\n",
-    "We generate two weight SBMs where the weights are distributed from a Poisson(3) and Normal(5, 1)."
+    "In simple cases, the model is unweighted. Below, we plot an unweighted SBM."
    ]
   },
   {
@@ -46,25 +48,28 @@
    "outputs": [],
    "source": [
     "from graspologic.simulations import sbm\n",
+    "from graspologic.plot import heatmap\n",
     "\n",
     "n_communities = [50, 50]\n",
     "p = [[0.8, 0.2], \n",
     "     [0.2, 0.8]]\n",
     "\n",
-    "wt = np.random.poisson\n",
-    "wtargs = dict(lam=3)\n",
-    "A_poisson= sbm(n_communities, p, wt=wt, wtargs=wtargs)\n",
-    "\n",
-    "wt = np.random.normal\n",
-    "wtargs = dict(loc=5, scale=1)\n",
-    "A_normal = sbm(n_communities, p, wt=wt, wtargs=wtargs)"
+    "A, labels = sbm(n_communities, p, return_labels=True)\n",
+    "heatmap(A, title=\"Basic Heatmap function\");"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Plotting with Hierarchy Labels"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Plot the simulated weighted SBMs"
+    "If we have labels, we can use them to show communities on a Heatmap."
    ]
   },
   {
@@ -73,10 +78,54 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from graspologic.plot import heatmap\n",
-    "\n",
-    "title = 'Weighted Stochastic Block Model with Poisson(3)'\n",
+    "heatmap(A, inner_hier_labels=labels)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can plot outer hierarchy labels in addition to inner hierarchy labels."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "outer_labels = [\"Outer Labels\"] * 100\n",
+    "heatmap(A, inner_hier_labels=labels,\n",
+    "        outer_hier_labels=outer_labels)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Weighted SBMs"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can also use heatmap when our graph is weighted. Here, we generate two weighted SBMs where the weights are distributed from a Poisson(3) and Normal(5, 1)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Draw weights from a Poisson(3) distribution\n",
+    "wt = np.random.poisson\n",
+    "wtargs = dict(lam=3)\n",
+    "A_poisson= sbm(n_communities, p, wt=wt, wtargs=wtargs)\n",
     "\n",
+    "# Plot\n",
+    "title = 'Weighted Stochastic Block Model with \\n weights drawn from a Poisson(3) distribution'\n",
     "fig= heatmap(A_poisson, title=title)"
    ]
   },
@@ -86,18 +135,23 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "title = 'Weighted Stochastic Block Model with Normal(5, 1)'\n",
+    "# Draw weights from a Normal(5, 1) distribution\n",
+    "wt = np.random.normal\n",
+    "wtargs = dict(loc=5, scale=1)\n",
+    "A_normal = sbm(n_communities, p, wt=wt, wtargs=wtargs)\n",
     "\n",
-    "fig= heatmap(A_normal, title=title)"
+    "# Plot\n",
+    "title = 'Weighted Stochastic Block Model with \\n weights drawn from a Normal(5, 1) distribution'\n",
+    "fig = heatmap(A_normal, title=title)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### You can also change color maps\n",
+    "### Colormaps\n",
     "\n",
-    "See [here](https://matplotlib.org/tutorials/colors/colormaps.html) for a list of colormaps"
+    "You can change colormaps. See [here](https://matplotlib.org/tutorials/colors/colormaps.html) for a list of colormaps."
    ]
   },
   {
@@ -107,8 +161,7 @@
    "outputs": [],
    "source": [
     "title = 'Weighted Stochastic Block Model with Poisson(3)'\n",
-    "\n",
-    "fig= heatmap(A_poisson, title=title, transform=None, cmap=\"binary\", center=None)"
+    "fig = heatmap(A_poisson, title=title, transform=None, cmap=\"binary\", center=None)"
    ]
   },
   {
@@ -203,7 +256,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.0"
+   "version": "3.8.5"
   }
  },
  "nbformat": 4,

diff --git a/graspologic/embed/svd.py b/graspologic/embed/svd.py
@@ -7,11 +7,11 @@
 import scipy
 import scipy.sparse as sp
 import sklearn
+from scipy.sparse import csr_array
 from scipy.stats import norm
 from typing_extensions import Literal
 
 from graspologic.types import List, Tuple
-from graspologic.utils import is_almost_symmetric
 
 SvdAlgorithmType = Literal["full", "truncated", "randomized", "eigsh"]
 
@@ -107,7 +107,7 @@ def select_dimension(
         pp.918-930.
     """
     # Handle input data
-    if not isinstance(X, np.ndarray) and not sp.isspmatrix_csr(X):
+    if not isinstance(X, (np.ndarray, csr_array)):
         msg = "X must be a numpy array or scipy.sparse.csr_array, not {}.".format(
             type(X)
         )

diff --git a/graspologic/layouts/render.py b/graspologic/layouts/render.py
@@ -127,9 +127,9 @@ def _draw_graph(
     for source, target in graph.edges():
         edge_color_list.append(node_colors[source])
 
-    ax.set_xbound(x_domain)
+    ax.set_xbound(*x_domain)
     ax.set_xlim(x_domain)
-    ax.set_ybound(y_domain)
+    ax.set_ybound(*y_domain)
     ax.set_ylim(y_domain)
 
     nx.draw_networkx_edges(

diff --git a/graspologic/match/solver.py b/graspologic/match/solver.py
@@ -10,7 +10,6 @@
 from beartype import beartype
 from ot import sinkhorn
 from scipy.optimize import linear_sum_assignment
-from scipy.sparse import csr_array
 from sklearn.utils import check_scalar
 
 from graspologic.types import List, RngType, Tuple
@@ -82,6 +81,7 @@ def __init__(
         transport_regularizer: Scalar = 100,
         transport_tol: Scalar = 5e-2,
         transport_max_iter: Int = 1000,
+        fast: bool = True,
     ):
         # TODO check if init is doubly stochastic
         self.init = init
@@ -113,6 +113,8 @@ def __init__(
         )
         self.transport_max_iter = transport_max_iter
 
+        self.fast = fast
+
         if maximize:
             self.obj_func_scalar = -1
         else:
@@ -406,6 +408,7 @@ def compute_step_size(self, P: np.ndarray, Q: np.ndarray) -> float:
             self.BA_ns,
             self.BA_sn,
             self.S_nn,
+            fast=self.fast,
         )
         if a * self.obj_func_scalar > 0 and 0 <= -b / (2 * a) <= 1:
             alpha = -b / (2 * a)
@@ -538,6 +541,14 @@ def _compute_gradient(
     return grad
 
 
+def _fast_trace(X: np.ndarray, Y: np.ndarray) -> float:
+    return (X * Y.T).sum()
+
+
+def _fast_traceT(X: np.ndarray, Y: np.ndarray) -> float:
+    return (X * Y).sum()
+
+
 def _compute_coefficients(
     P: np.ndarray,
     Q: np.ndarray,
@@ -554,25 +565,50 @@ def _compute_coefficients(
     BA_ns: MultilayerAdjacency,
     BA_sn: MultilayerAdjacency,
     S: AdjacencyMatrix,
+    fast: bool,
 ) -> Tuple[float, float]:
     R = P - Q
-    # TODO make these "smart" traces like in the scipy code, couldn't hurt
-    # TODO can also refactor to not repeat multiplications like the old code but I was
-    # finding it harder to follow that way.
+
     n_layers = len(A)
-    a_cross = 0
-    b_cross = 0
-    a_intra = 0
-    b_intra = 0
+    a_cross = 0.0
+    b_cross = 0.0
+    a_intra = 0.0
+    b_intra = 0.0
+
     for i in range(n_layers):
-        a_cross += np.trace(AB[i].T @ R @ BA[i] @ R)
-        b_cross += np.trace(AB[i].T @ R @ BA[i] @ Q) + np.trace(AB[i].T @ Q @ BA[i] @ R)
-        b_cross += np.trace(AB_ns[i].T @ R @ BA_ns[i]) + np.trace(
-            AB_sn[i].T @ BA_sn[i] @ R
-        )
-        a_intra += np.trace(A[i] @ R @ B[i].T @ R.T)
-        b_intra += np.trace(A[i] @ Q @ B[i].T @ R.T) + np.trace(A[i] @ R @ B[i].T @ Q.T)
-        b_intra += np.trace(A_ns[i].T @ R @ B_ns[i]) + np.trace(A_sn[i] @ R @ B_sn[i].T)
+        if fast:
+            # could maybe be even faster if we do `opt_einsum` or something
+            ABiTR = AB[i].T @ R
+            BAiR = BA[i] @ R
+            AiR = A[i] @ R
+            RBi = R @ B[i]
+
+            a_cross += _fast_trace(ABiTR, BAiR)
+            b_cross += _fast_trace(ABiTR, BA[i] @ Q)
+            b_cross += _fast_trace(AB[i].T @ Q, BAiR)
+            b_cross += _fast_trace(AB_ns[i].T @ R, BA_ns[i])
+            b_cross += _fast_trace(AB_sn[i].T @ BA_sn[i], R)
+
+            a_intra += _fast_traceT(AiR, RBi)
+            b_intra += _fast_traceT(A[i] @ Q, RBi)
+            b_intra += _fast_traceT(AiR, Q @ B[i])
+            b_intra += _fast_trace(A_ns[i].T @ R, B_ns[i])
+            b_intra += _fast_traceT(A_sn[i] @ R, B_sn[i])
+        else:
+            a_cross += np.trace(AB[i].T @ R @ BA[i] @ R)
+            b_cross += np.trace(AB[i].T @ R @ BA[i] @ Q) + np.trace(
+                AB[i].T @ Q @ BA[i] @ R
+            )
+            b_cross += np.trace(AB_ns[i].T @ R @ BA_ns[i]) + np.trace(
+                AB_sn[i].T @ BA_sn[i] @ R
+            )
+            a_intra += np.trace(A[i] @ R @ B[i].T @ R.T)
+            b_intra += np.trace(A[i] @ Q @ B[i].T @ R.T) + np.trace(
+                A[i] @ R @ B[i].T @ Q.T
+            )
+            b_intra += np.trace(A_ns[i].T @ R @ B_ns[i]) + np.trace(
+                A_sn[i] @ R @ B_sn[i].T
+            )
 
     a = a_cross + a_intra
     b = b_cross + b_intra

diff --git a/graspologic/match/wrappers.py b/graspologic/match/wrappers.py
@@ -69,6 +69,7 @@ def graph_match(
     transport_regularizer: Scalar = 100,
     transport_tol: Scalar = 5e-2,
     transport_max_iter: Int = 1000,
+    fast: bool = True,
 ) -> MatchResult:
     """
     Attempts to solve the Graph Matching Problem or the Quadratic Assignment Problem
@@ -192,6 +193,12 @@ def graph_match(
         Setting this value higher may provide more precise solutions at the cost of
         longer computation time.
 
+    fast: bool, default=True
+        Whether to use numerical shortcuts to speed up the computation. Typically will
+        be faster for most applications, although requires storing intermediate
+        computations in memory which may be undesirable for very large inputs or when
+        memory is a bottleneck.
+
     Returns
     -------
     res: MatchResult
@@ -281,7 +288,6 @@ def graph_match(
         partial_match=partial_match,
         init=init,
         init_perturbation=init_perturbation,
-        verbose=solver_verbose,
         shuffle_input=shuffle_input,
         padding=padding,
         maximize=maximize,
@@ -291,6 +297,7 @@ def graph_match(
         transport_regularizer=transport_regularizer,
         transport_tol=transport_tol,
         transport_max_iter=transport_max_iter,
+        fast=fast,
     )
 
     def run_single_graph_matching(seed: RngType) -> MatchResult: