@@ -55,15 +55,15 @@ nor optimization criteria are employed.
55
55
The only information extracted from the distance matrix are
56
56
within-triplet dissimilarity comparisons. As a result, outputs are
57
57
unaffected by monotone transformations of the collection of distances
58
- (e.g., log< sub >2</ sub > ). Further, one may transform any measure of
59
- similarity, * s * ( * x * , * y * ) , to a measure of dissimilarity, * d * ( * x * , * y * ),
60
- via any order-reversing monotone transformation, for instance by taking
61
- * d * ( * x * , * y * ) = 1/(1+ * s * ( * x * , * y * )) . This provides the user some
62
- flexibility in the choice of dissimilarity (e.g., triangle inequality is
63
- not required) and care should be taken at this stage.
58
+ (e.g., $\log_2$ ). Further, one may transform any measure of similarity,
59
+ $s(x, y)$ , to a measure of dissimilarity, $d(x,y)$, via any
60
+ order-reversing monotone transformation, for instance by taking
61
+ $d(x, y) = 1/(1 + s(x, y))$ . This provides the user some flexibility in
62
+ the choice of dissimilarity (e.g., triangle inequality is not required)
63
+ and care should be taken at this stage.
64
64
65
65
The function ` dist() ` from the ` stats ` package converts an input data
66
- frame (with * n * rows) into an * n * × * n * distance matrix. In Euclidean
66
+ frame (with $n$ rows) into an $n \times n$ distance matrix. In Euclidean
67
67
examples here, we will use the default Euclidean distance.
68
68
69
69
## A Small Example
@@ -85,6 +85,11 @@ par(mfrow = c(1, 2), pty = "s")
85
85
86
86
D <- dist(exdata1 )
87
87
pald_results <- pald(D , emph_strong = 1 , vertex.label.cex = 3 )
88
+ ```
89
+
90
+ <img src =" man/figures/README-pald-1.png " width =" 100% " />
91
+
92
+ ``` r
88
93
89
94
# ##
90
95
@@ -104,7 +109,7 @@ text(exdata1 + .23,
104
109
cex = .8 )
105
110
```
106
111
107
- <img src =" man/figures/README-pald-1 .png " width =" 100% " />
112
+ <img src =" man/figures/README-pald-2 .png " width =" 100% " />
108
113
109
114
The wrapper function ` pald() ` returns a list containing: the cohesion
110
115
matrix, local depths, (community) clusters, the threshold for
@@ -126,10 +131,10 @@ Cohesion reflects relationship strength from the perspective of relative
126
131
position, see (Berenhaut, Moore, and Melvin 2022). To begin PaLD
127
132
analysis, we must first compute the matrix of cohesion values from the
128
133
input distance matrix or ` dist ` object. Note that cohesion is not
129
- symmetric. The values, * C * \[ * x * , * w * \] , in the cohesion matrix are
134
+ symmetric. The values, $C [ x, w ] $ , in the cohesion matrix are
130
135
interpretable probabilities which capture the strength of the alignment
131
- of * w * to * x * . The sum of the cohesion matrix is always equal to * n * /2
132
- (where * n * is the number of data points).
136
+ of $w$ to $x$ . The sum of the cohesion matrix is always equal to $n/2$
137
+ (where $n$ is the number of data points).
133
138
134
139
``` r
135
140
D <- dist(exdata1 )
@@ -184,10 +189,9 @@ strong_threshold(C)
184
189
```
185
190
186
191
Pairs of points for which mutual cohesion (i.e.,
187
- min {* C* <sub >* x* , * w* </sub >, * C* <sub >* w* , * x* </sub >}) is greater than
188
- the above threshold are considered to be \`\` strongly cohesive.” The
189
- thresholded and symmetrized cohesion matrix can be obtained using the
190
- function ‘cohesion_strong.’
192
+ $\min\{ C_ {x, w}, C_ {w, x}$}) is greater than the above threshold are
193
+ considered to be \`\` strongly cohesive.” The thresholded and symmetrized
194
+ cohesion matrix can be obtained using the function ‘cohesion_strong.’
191
195
192
196
``` r
193
197
round(cohesion_strong(C ), 4 )
@@ -209,11 +213,10 @@ round(cohesion_strong(C), 4)
209
213
The overall structure of the data can be observed via the networks
210
214
obtained from cohesion (referred to here as “community graphs”). The
211
215
community graph is a symmetric, weighted graph which is obtained from
212
- symmetrizing the cohesion matrix (using
213
- min {* C* <sub >* x* , * w* </sub >, * C* <sub >* w* , * x* </sub >}) and removing
214
- self-loops. The “community cluster graph” is the subgraph consisting of
215
- only the edges for which mutual cohesion greater than the above
216
- threshold.
216
+ symmetrizing the cohesion matrix (using $\min\{ C_ {x, w}, C_ {w, x}\} $)
217
+ and removing self-loops. The “community cluster graph” is the subgraph
218
+ consisting of only the edges for which mutual cohesion greater than the
219
+ above threshold.
217
220
218
221
The connected components of the community cluster graph, ` G_strong ` , are
219
222
referred to the (community) clusters of the data. Note that no
@@ -374,7 +377,7 @@ plot_community_graphs(
374
377
<img src =" man/figures/README-lang-1.png " width =" 100% " />
375
378
376
379
One could alternatively use the wrapper function:
377
- ` pald(cognate_dist, emph_strong = 3, edge_width_factor = 30, vertex.label = lang_lab_subset, vertex.label.cex = .65, vertex.size = 3) ` .
380
+ $\texttt{ pald(cognate_dist, emph_strong = 3, edge_width_factor = 30, vertex.label = lang_lab_subset, vertex.label.cex = .65, vertex.size = 3)}$ .
378
381
It will return a list containing: the cohesion matrix, local depths,
379
382
(community) clusters, the threshold for identifying strong ties, the
380
383
thresholded and symmetrized cohesion matrix, the community graph whose
@@ -390,7 +393,7 @@ cohesion) and can be found directly from the cohesion matrix.
390
393
library(igraph )
391
394
G_strong_lang <- community_graphs(C_lang )$ G_strong
392
395
neighbors(G_strong_lang , " French" )
393
- # > + 8/87 vertices, named, from c8a0516 :
396
+ # > + 8/87 vertices, named, from 8cc26e0 :
394
397
# > [1] Italian Ladin Provencal Walloon
395
398
# > [5] French_Creole_C French_Creole_D Spanish Catalan
396
399
@@ -409,7 +412,7 @@ density, see discussion in (Berenhaut, Moore, and Melvin 2022). Note
409
412
that PaLD was able to detect the eight natural groups within the data
410
413
without the use of any additional inputs (e.g., number of clusters) nor
411
414
optimization criteria. Despite providing the “correct” number of
412
- clusters (i.e., * k * = 8 ) both * k* -means and hierarchical clustering did
415
+ clusters (i.e., $k = 8$ ) both * k* -means and hierarchical clustering did
413
416
not give the desired result.
414
417
415
418
``` r
@@ -432,11 +435,6 @@ plot_community_graphs(
432
435
edge_width_factor = 2 ,
433
436
vertex.size = 5
434
437
)
435
- ```
436
-
437
- <img src =" man/figures/README-vary-d-1.png " width =" 100% " />
438
-
439
- ``` r
440
438
# ## The cluster vector is provided by `pald' and also may be computed via:
441
439
library(igraph )
442
440
cluster_graph <- community_graphs(C3 )$ G_strong
@@ -447,8 +445,10 @@ table(clusters(cluster_graph)$membership)
447
445
# > 40 40 60 20 20 20 20 20
448
446
```
449
447
448
+ <img src =" man/figures/README-vary-d-1.png " width =" 100% " />
449
+
450
450
Here are the results for the data obtained from * k* -means and
451
- hierarchical clustering when * k * = 8 .
451
+ hierarchical clustering when $k = 8$ .
452
452
453
453
``` r
454
454
par(mfrow = c(1 , 2 ), pty = " s" )
0 commit comments