Skip to content

Commit 2677970

Browse files
committed
update
1 parent b3db1a0 commit 2677970

14 files changed

+40
-35
lines changed

DESCRIPTION

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Package: pald
22
Title: Partitioned Local Depth for Community Structure in Data
3-
Version: 0.0.2
3+
Version: 0.0.3
44
Authors@R:
55
c(person("Katherine", "Moore", email = "kmoore@amherst.edu", role = c("aut"),
66
comment = c(ORCID = "0000-0001-6943-2416")),
@@ -20,7 +20,7 @@ Description: Implementation of the Partitioned Local Depth (PaLD)
2020
License: MIT + file LICENSE
2121
Encoding: UTF-8
2222
Roxygen: list(markdown = TRUE)
23-
RoxygenNote: 7.1.2
23+
RoxygenNote: 7.2.3
2424
Imports:
2525
igraph,
2626
graphics,

NEWS.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
# pald 0.0.3
2+
3+
* Change output in `community_clusters` to be a data frame with two columns: `point` and `community`
4+
15
# pald 0.0.2
26

37
* Allow non-symmetric matrices to be input

R/pald_functions.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -462,7 +462,7 @@ plot_community_graphs <- function(c,
462462
#'
463463
#' @return A data frame with two columns:
464464
#' * `point`: The points from cohesion matrix `c`
465-
#' * `cluster`: The (community) cluster labels
465+
#' * `community`: The community cluster labels
466466
#'
467467
#' @examples
468468
#' D <- dist(exdata2)
@@ -475,7 +475,7 @@ community_clusters <- function(c) {
475475
cl <- igraph::clusters(c_graphs$G_strong)$membership
476476
data.frame(
477477
point = names(cl),
478-
cluster = cl
478+
community = cl
479479
)
480480
}
481481
#' Partitioned Local Depth (PaLD)

README.Rmd

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,7 @@ Each time the function `pald()` is called, the matrix of cohesion values is re-c
110110

111111

112112
## Cohesion Matrix
113+
113114
Cohesion reflects relationship strength from the perspective of relative position, see [@bmm22]. To begin PaLD analysis, we must first compute the matrix of cohesion values from the input distance matrix or `dist` object. Note that cohesion is not symmetric. The values, $C[x, w]$, in the cohesion matrix are interpretable probabilities which capture the strength of the alignment of $w$ to $x$. The sum of the cohesion matrix is always equal to $n/2$ (where $n$ is the number of data points).
114115

115116
```{r cohesion}

README.md

Lines changed: 29 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -55,15 +55,15 @@ nor optimization criteria are employed.
5555
The only information extracted from the distance matrix are
5656
within-triplet dissimilarity comparisons. As a result, outputs are
5757
unaffected by monotone transformations of the collection of distances
58-
(e.g., log<sub>2</sub>). Further, one may transform any measure of
59-
similarity, *s*(*x*,*y*), to a measure of dissimilarity, *d*(*x*,*y*),
60-
via any order-reversing monotone transformation, for instance by taking
61-
*d*(*x*,*y*) = 1/(1+*s*(*x*,*y*)). This provides the user some
62-
flexibility in the choice of dissimilarity (e.g., triangle inequality is
63-
not required) and care should be taken at this stage.
58+
(e.g., $\log_2$). Further, one may transform any measure of similarity,
59+
$s(x, y)$, to a measure of dissimilarity, $d(x,y)$, via any
60+
order-reversing monotone transformation, for instance by taking
61+
$d(x, y) = 1/(1 + s(x, y))$. This provides the user some flexibility in
62+
the choice of dissimilarity (e.g., triangle inequality is not required)
63+
and care should be taken at this stage.
6464

6565
The function `dist()` from the `stats` package converts an input data
66-
frame (with *n* rows) into an *n* × *n* distance matrix. In Euclidean
66+
frame (with $n$ rows) into an $n \times n$ distance matrix. In Euclidean
6767
examples here, we will use the default Euclidean distance.
6868

6969
## A Small Example
@@ -85,6 +85,11 @@ par(mfrow = c(1, 2), pty = "s")
8585

8686
D <- dist(exdata1)
8787
pald_results <- pald(D, emph_strong = 1, vertex.label.cex = 3)
88+
```
89+
90+
<img src="man/figures/README-pald-1.png" width="100%" />
91+
92+
``` r
8893

8994
###
9095

@@ -104,7 +109,7 @@ text(exdata1 + .23,
104109
cex = .8)
105110
```
106111

107-
<img src="man/figures/README-pald-1.png" width="100%" />
112+
<img src="man/figures/README-pald-2.png" width="100%" />
108113

109114
The wrapper function `pald()` returns a list containing: the cohesion
110115
matrix, local depths, (community) clusters, the threshold for
@@ -126,10 +131,10 @@ Cohesion reflects relationship strength from the perspective of relative
126131
position, see (Berenhaut, Moore, and Melvin 2022). To begin PaLD
127132
analysis, we must first compute the matrix of cohesion values from the
128133
input distance matrix or `dist` object. Note that cohesion is not
129-
symmetric. The values, *C*\[*x*,*w*\], in the cohesion matrix are
134+
symmetric. The values, $C[x, w]$, in the cohesion matrix are
130135
interpretable probabilities which capture the strength of the alignment
131-
of *w* to *x*. The sum of the cohesion matrix is always equal to *n*/2
132-
(where *n* is the number of data points).
136+
of $w$ to $x$. The sum of the cohesion matrix is always equal to $n/2$
137+
(where $n$ is the number of data points).
133138

134139
``` r
135140
D <- dist(exdata1)
@@ -184,10 +189,9 @@ strong_threshold(C)
184189
```
185190

186191
Pairs of points for which mutual cohesion (i.e.,
187-
min {*C*<sub>*x*, *w*</sub>, *C*<sub>*w*, *x*</sub>}) is greater than
188-
the above threshold are considered to be \`\`strongly cohesive.” The
189-
thresholded and symmetrized cohesion matrix can be obtained using the
190-
function ‘cohesion_strong.’
192+
$\min\{C_{x, w}, C_{w, x}$}) is greater than the above threshold are
193+
considered to be \`\`strongly cohesive.” The thresholded and symmetrized
194+
cohesion matrix can be obtained using the function ‘cohesion_strong.’
191195

192196
``` r
193197
round(cohesion_strong(C), 4)
@@ -209,11 +213,10 @@ round(cohesion_strong(C), 4)
209213
The overall structure of the data can be observed via the networks
210214
obtained from cohesion (referred to here as “community graphs”). The
211215
community graph is a symmetric, weighted graph which is obtained from
212-
symmetrizing the cohesion matrix (using
213-
min {*C*<sub>*x*, *w*</sub>, *C*<sub>*w*, *x*</sub>}) and removing
214-
self-loops. The “community cluster graph” is the subgraph consisting of
215-
only the edges for which mutual cohesion greater than the above
216-
threshold.
216+
symmetrizing the cohesion matrix (using $\min\{C_{x, w}, C_{w, x}\}$)
217+
and removing self-loops. The “community cluster graph” is the subgraph
218+
consisting of only the edges for which mutual cohesion greater than the
219+
above threshold.
217220

218221
The connected components of the community cluster graph, `G_strong`, are
219222
referred to the (community) clusters of the data. Note that no
@@ -374,7 +377,7 @@ plot_community_graphs(
374377
<img src="man/figures/README-lang-1.png" width="100%" />
375378

376379
One could alternatively use the wrapper function:
377-
`pald(cognate_dist, emph_strong = 3, edge_width_factor = 30, vertex.label = lang_lab_subset, vertex.label.cex = .65, vertex.size = 3)`.
380+
$\texttt{pald(cognate_dist, emph_strong = 3, edge_width_factor = 30, vertex.label = lang_lab_subset, vertex.label.cex = .65, vertex.size = 3)}$.
378381
It will return a list containing: the cohesion matrix, local depths,
379382
(community) clusters, the threshold for identifying strong ties, the
380383
thresholded and symmetrized cohesion matrix, the community graph whose
@@ -390,7 +393,7 @@ cohesion) and can be found directly from the cohesion matrix.
390393
library(igraph)
391394
G_strong_lang <- community_graphs(C_lang)$G_strong
392395
neighbors(G_strong_lang, "French")
393-
#> + 8/87 vertices, named, from c8a0516:
396+
#> + 8/87 vertices, named, from 8cc26e0:
394397
#> [1] Italian Ladin Provencal Walloon
395398
#> [5] French_Creole_C French_Creole_D Spanish Catalan
396399

@@ -409,7 +412,7 @@ density, see discussion in (Berenhaut, Moore, and Melvin 2022). Note
409412
that PaLD was able to detect the eight natural groups within the data
410413
without the use of any additional inputs (e.g., number of clusters) nor
411414
optimization criteria. Despite providing the “correct” number of
412-
clusters (i.e., *k* = 8) both *k*-means and hierarchical clustering did
415+
clusters (i.e., $k = 8$) both *k*-means and hierarchical clustering did
413416
not give the desired result.
414417

415418
``` r
@@ -432,11 +435,6 @@ plot_community_graphs(
432435
edge_width_factor = 2,
433436
vertex.size = 5
434437
)
435-
```
436-
437-
<img src="man/figures/README-vary-d-1.png" width="100%" />
438-
439-
``` r
440438
### The cluster vector is provided by `pald' and also may be computed via:
441439
library(igraph)
442440
cluster_graph <- community_graphs(C3)$G_strong
@@ -447,8 +445,10 @@ table(clusters(cluster_graph)$membership)
447445
#> 40 40 60 20 20 20 20 20
448446
```
449447

448+
<img src="man/figures/README-vary-d-1.png" width="100%" />
449+
450450
Here are the results for the data obtained from *k*-means and
451-
hierarchical clustering when *k* = 8.
451+
hierarchical clustering when $k = 8$.
452452

453453
``` r
454454
par(mfrow = c(1, 2), pty = "s")

man/community_clusters.Rd

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/figures/README-comm-1.png

1.35 KB
Loading

man/figures/README-fig-2-1.png

-35 Bytes
Loading

man/figures/README-k-mean-1.png

74 Bytes
Loading

man/figures/README-lang-1.png

-758 Bytes
Loading

0 commit comments

Comments
 (0)