/
dna-manual.Rnw
3765 lines (3190 loc) · 256 KB
/
dna-manual.Rnw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[10pt]{report}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{geometry}
\geometry{margin=3cm}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{natbib}
\usepackage{scrextend}
\usepackage{graphicx}
\usepackage{placeins}
\usepackage{booktabs}
\usepackage{ltablex}
%\usepackage[table]{xcolor} % see global chunk options
\usepackage{soul} % include for kable
\usepackage{booktabs}
\usepackage{longtable}
\usepackage{array}
\usepackage{multirow}
\usepackage{wrapfig}
\usepackage{float}
\usepackage{colortbl}
\usepackage{pdflscape}
\usepackage{tabu}
\usepackage{threeparttable}
\usepackage[normalem]{ulem}
\usepackage[export]{adjustbox} % to add boxes around screenshots
\usepackage{tikz}
\usepackage{tikz-3dplot}
\graphicspath{ {Figures/} }
\newcommand{\dna}{\texttt{DNA}}
\newcommand{\rdna}{\texttt{rDNA}}
\newcommand{\rstudio}{\texttt{RStudio}}
\newcommand{\java}{\texttt{Java}}
\newcommand{\rjava}{\texttt{rJava}}
\newcommand{\R}{\texttt{R}}
\newcommand{\ucinet}{\texttt{Ucinet}}
\newcommand{\netminer}{\texttt{NetMiner}}
\newcommand{\gephi}{\texttt{Gephi}}
\newcommand{\visone}{\texttt{visone}}
\newcommand{\win}{\raisebox{-0.1em}{\includegraphics[height=1.5\fontcharht\font`\B]{03-6-winbutton}}}
\newcommand{\rrun}{\raisebox{-0.1em}{\includegraphics[height=1.5\fontcharht\font`\B]{03-5-rrun}}}
\newcommand{\github}{\href{https://github.com/leifeld/dna}{\texttt{GitHub}}}
\newcommand*{\fullref}[1]{\hyperref[{#1}]{ \nameref*{#1}}} % One single link
\newcommand{\code}[1]{% same color and decoration as knitr bash command
\textcolor{codecolort}{%
\sethlcolor{codecolorbg}\hl{%
\texttt{#1}%
}%
}%
}
\newcommand{\infobox}[2]{%
\begin{center}\fbox{%
\parbox{#1}{#2}%
}\end{center}%
}
\definecolor{codecolorbg}{rgb}{0.969, 0.969, 0.969}
\definecolor{codecolort}{rgb}{0.345, 0.345, 0.345}
\definecolor{black}{RGB}{0,0,0}
\definecolor{grey}{RGB}{240,240,240}
\definecolor{white}{RGB}{255,255,255}
\usepackage{suffix}
\newcommand\chapterauthor[1]{\authortoc{#1}\printchapterauthor{#1}}
\WithSuffix\newcommand\chapterauthor*[1]{\printchapterauthor{#1}}
\makeatletter
\newcommand{\printchapterauthor}[1]{%
{\parindent0pt\vspace*{-25pt}%
\linespread{1.1}\large\scshape#1%
\par\nobreak\vspace*{35pt}}
\@afterheading%
}
\newcommand{\authortoc}[1]{%
\addtocontents{toc}{\vskip-10pt}%
\addtocontents{toc}{%
\protect\contentsline{chapter}%
{\hskip1.3em\mdseries\scshape\protect\small#1}{}{}}
\addtocontents{toc}{\vskip5pt}%
}
\makeatother
\newcommand{\Autorname}{\addtocontents{toc}{\hspace{0.47cm}\emph{\aut}\par}}
\setlength{\parindent}{0em}
\setlength{\parskip}{0.5em}
\bibpunct[: ]{(}{)}{;}{a}{}{,}
\emergencystretch 1.5em
\widowpenalty=10000
\clubpenalty=10000
\raggedbottom
\pagenumbering{roman} % roman numbering in frontmatter
\usepackage[
unicode=true,
pdfusetitle,
bookmarks=true,
bookmarksnumbered=true,
bookmarksopen=true,
bookmarksopenlevel=2,
breaklinks=true,
colorlinks=true,
pdfstartview={XYZ null null 1},
citecolor={blue}
]{hyperref}
\begin{document}
<<setup, include=FALSE, cache=FALSE, results='hide', message=FALSE, warning=FALSE>>=
library("knitr")
# get latest R version
site <- tryCatch({
readLines("https://cran.r-project.org/bin/windows/base/", n = 10)
},
error = function(e) {warning("Internet connection neccessary to check for newest R version")})
R_vers <- site[grepl("^<title>", site)]
R_vers <- as.character(regmatches(R_vers, gregexpr("\\d+.\\d+.\\d+", R_vers)))
if (length(R_vers) == 0) {
R_vers <- "3.5.0"
}
# get latest RStudio version
site <- tryCatch({
readLines("https://www.rstudio.com/products/rstudio/download/")
},
error = function(e) {warning("Internet connection neccessary to check for newest R version")})
RS_vers <- site[grepl("<h4 id=\"download\"><strong>RStudio Desktop", site)]
RS_vers <- as.character(regmatches(RS_vers, gregexpr("\\d+.\\d+.\\d+", RS_vers)))
if (length(RS_vers) == 0) {
RS_vers <- "1.1.447"
}
# set global chunk options
opts_chunk$set(fig.path = 'figure/workshop-', fig.align = 'center', fig.show = 'hold', error = FALSE)
options(formatR.arrow = TRUE, width = 90, knitr.table.format = "latex")
knit_hooks$set(crop=hook_pdfcrop,
document = function(x) {
x <- gsub('$RS_vers$', RS_vers, x, fixed = TRUE)
x <- sub('\\usepackage[]{color}', '\\usepackage[table]{xcolor}', x, fixed = TRUE)
x
})
@
\title{Discourse Network Analyzer Manual}
\date{\footnotesize{Last update: DNA 2.0 beta 22 with rDNA \Sexpr{packageVersion("rDNA")} on \today.}}
\author{Philip Leifeld, Johannes Gruber and Felix Rolf Bossner}
\maketitle
\setcounter{tocdepth}{1}
% change linkcolor in TOC
{\hypersetup{linkcolor=black}
\tableofcontents
}
\chapter{Introduction} \label{chp:intro}
\chapterauthor{Philip Leifeld and Johannes Gruber}
\FloatBarrier
\pagenumbering{arabic}
This manual demonstrates how to install, set up, and use the open-source standalone software \texttt{Discourse Network Analyzer} (\dna) and its companion \R\ package \rdna\ \citep{leifeld2018rdna}, which are designed for researchers using the method \emph{discourse network analysis}.%
\footnote{\emph{This manual is a work in progress and will be continuously updated during the year 2018.
See \url{https://github.com/leifeld/dna/blob/master/manual/} for the most recent version}.}
By combining content analysis and dynamic network analysis, this method can reveal the structure and dynamics of policy debates.
The method comprises three basic steps:
\begin{enumerate}
\item annotating statements of actors in unstructured (text) sources,
\item creating networks from the resulting structured data,
\item analysing and interpreting the results by employing the toolbox of network analysis.
\end{enumerate}
The results can take a number of different forms, such as so-called congruence or conflict networks of actors or of concepts, affiliation networks of actors and concepts, and longitudinal versions of these networks (see Chapter~\ref{chp:algorithms} and \citealt{leifeld2017discourse} for a comprehensive overview of the method).
The benefit of using the \java\ software \dna\ is that it is specifically designed to aid the user in the first two of these basic steps of discourse network analysis.
It is mainly designed for qualitative annotation of actors' statements in order to structure the text.
The program can also create different kinds of network matrices based on these structured data and export them to other programs for further analysis and plotting.
Additionally, while the software is primarily designed for discourse network analysis of actors and concepts, it is also flexible with regard to the definition of new statement types, for example using user-defined variables like ``location'' or ``addressee'' (see Section~\ref{sec:stattype}).
While there are numerous alternative software packages for qualitative content analysis, there are very few which were specifically developed with discourse network analysis in mind, and therefore they lack the functionality necessary for exporting network data.
The companion package \rdna\ for the statistical computing environment \R\ additionally helps with the third step mentioned above: analysis of the annotated statements.
\rdna\ takes the structured data from \dna\ and permits further in-depth analysis using network analysis.
While data can also be exported to other software such as \ucinet, \visone, \netminer, and \gephi, \R\ is the preferred choice as it facilitates reproducible research, is free and open source, and has a large community of users and developers who are engaged in all kinds of data analysis tasks.
\R\ has several packages developed specifically for network analysis, such as \texttt{statnet} \citep{handcock2008statnet}, \texttt{igraph} \citep{csardi2006igraph}, \texttt{xergm} \citep{leifeld2018temporal, leifeld2017xergm}, \texttt{sna} \citep{butts2016sna}, \texttt{network} \citep{butts2008network}, \texttt{ggraph} \citep{linpedersen2017ggraph}, and \texttt{tidygraph} \citep{linpedersen2017tidygraph}.
Most of these packages work seamlessly with data processed by \rdna\ and therefore add a myriad of possibilities to the native functions of our own \R\ package.
In recent years, discourse network analysis has been employed by a growing number of scholars in a wide field of policy sectors, such as
pension politics \citep{leifeld2013reconceptualising, leifeld2016policy},
climate politics \citep{fisher2013mapping, fisher2013where, broadbent2014inter, gkiouzepas2015climate, manfredo2014society, schneider2014punctuations, stoddart2015canada, wagner2017trends, yun2014framing},
software patents and property rights \citep{leifeld2012software},
internet policy \citep{breindl2013discourse, haunss2009ip},
infrastructure projects \citep{nagel2016polarisierung},
energy policy \citep{brutschin2013dynamics, haunss2017ausstieg, imbert2017inquiry, rinscheid2015crisis},
shooting rampages \citep{hurka2013framing},
abortion \citep{muller2014beleidscontroverse, muller2014discourscoalities, muller2015discourse},
outdoor sports \citep{stoddartetal2015environmentalists},
deforestation \citep{rantala2014multistakeholder},
higher education \citep{naegler2015partner},
international financial politics \citep{haunss2017finace},
and online deception \citep{wu2015dobnet}.
While a default toolbox of methods is available for discourse network analysis, new methods are being developed presently.
For example, one promising approach is the application of inferential network models to the temporal network structure produced by \dna\ in order to model policy debates at the micro level.
This will soon allow us to develop and test theories on how actors contribute statements to policy debates, and to forecast debates based on these theories \citep[for an outlook, see][]{leifeld2017discourse}.
The outline of this manual is as follows.
Chapter~\ref{chp:algorithms} describes the types of networks \dna\ can export.
Chapter~\ref{chp:installation} explains how to install \dna\ and \rdna, which both rely on a correctly configured \java\ runtime environment.
Only consult this chapter if you experience problems with the installation on your own.
The following four sections describe the usage of \dna\ in detail:
Chapter~\ref{chp:dna-prep} describes how to set up a project in \dna, including the creation of a database, adding and managing users, and how to set up or edit statement types and variables.
Chapter~\ref{chp:dna-import} explains how you can import and organise your raw data (i.\,e., documents).
Chapter~\ref{chp:dna-coding} provides an overview of how you annotate statements in \dna.
Even though this process is very straightforward, the section also reveals some functions that can help you to annotate material faster and more reliably.
Chapter~\ref{chp:dna-export} explains how data can be exported to other programs for further analysis.
What may have seemed abstract in Chapter~\ref{chp:algorithms} quickly becomes clear at this point---once you have exported a few example networks yourself.
Chapter~\ref{chp:rdna} is an introductory tutorial on using the \rdna\ package to perform additional analysis and plotting tasks using the infrastructure provided by \R.
Both \dna\ and \rdna\ can be downloaded from \github\ (see Chapter~\ref{chp:installation}).
Please feel free to post questions and bug reports to the issue tracker on \github.
\chapter{Methods for Network Construction} \label{chp:algorithms}
\chapterauthor{Philip Leifeld}
\FloatBarrier
This chapter summarises the main network algorithms implemented in \dna\ graphically and using mathematical notation.
\section{Graphical Intuition}
Figure~\ref{fig:algo_aff} illustrates how actors (as yellow nodes on the left) and concepts (as blue nodes on the right) are connected by dashed lines.
These dashed lines represent the edges of a bipartite graph, also called an affiliation network.
Substantively, these edges represent statements that were annotated in a policy debate.
For example, actor~5 refers to concepts~3 and~4 in the debate.
\tikzset{
actor/.style ={circle,draw=red!50!yellow,fill=red!20!yellow,thick,inner sep=1pt},
category/.style ={circle,draw=blue!70,fill=blue!40,thick,inner sep=1pt},
grey/.style ={line width=0.5mm, dashed,color=gray,inner sep=0pt},
black/.style ={line width=0.5mm},
annotation/.style = {fill=gray!50, rounded corners=3}
}
\begin{figure}[tbp]
\begin{center}
\begin{tikzpicture}
\node [actor] (a1) at (0,4) {$a_1$};
\node [actor] (a2) at (1,3) {$a_2$};
\node [actor] (a3) at (0,2) {$a_3$};
\node [actor] (a4) at (1,1) {$a_4$};
\node [actor] (a5) at (0,0) {$a_5$};
\node [category] (c1) at (5,4) {$c_1$};
\node [category] (c2) at (7,3) {$c_2$};
\node [category] (c3) at (6,2) {$c_3$};
\node [category] (c4) at (5,0) {$c_4$};
\node [category] (c5) at (7,1) {$c_5$};
\draw [grey] (a1) to (c1);
\draw [grey] (a2) to (c1);
\draw [grey] (a2) to (c2);
\draw [grey] (a2) to (c3);
\draw [grey] (a3) to (c3);
\draw [grey] (a3) to (c5);
\draw [grey] (a4) to (c3);
\draw [grey] (a5) to (c3);
\draw [grey] (a5) to (c4);
\draw [black] (a1) to (a2);
\draw [black] (a2) to (a3);
\draw [black] (a2) to (a5);
\draw [black] (a3) to (a4);
\draw [black] (a3) to (a5);
\draw [black] (a2) to (a4);
\draw [black] (a4) to (a5);
\draw [black] (c1) to (c2);
\draw [black] (c1) to (c3);
\draw [black] (c2) to (c3);
\draw [black] (c3) to (c4);
\draw [black] (c3) to (c5);
\node at (0,-2) {};
\node at (7,5.5) {};
\node [annotation] at (0,5) {actors};
\node [annotation] at (6,5) {concepts};
\node [annotation, text width=1.5cm, right=-3mm] at (2.5,-1) {affiliation network};
\node [annotation, text width=1.5cm, right=-6mm] at (0,-1) {actor network};
\node [annotation, text width=1.5cm, right=-6mm] at (5.5,-1) {concept network};
\end{tikzpicture}
\caption{Illustration: Affiliation network (dashed lines) between actors (yellow nodes, variable~1) and concepts (blue nodes, variable~2) and their induced actor congruence network (solid lines on the left) and concept congruence network (solid lines on the right).}
\label{fig:algo_aff}
\end{center}
\end{figure}
Based on this bipartite graph, an actor congruence network and a concept congruence network can be inferred.
For example, actors~1 and~2 jointly refer to the same concept~1, hence they are directly connected by an edge in the actor congruence network illustrated on the left.
If actors~1 and~2 shared more than one concept, their edge weight would be proportional to the number of concepts they shared.
Substantively, the strength of connection between two actors can be interpreted as their similarity in terms of the concepts they employ in the policy debate.
Conversely, concepts~1 and~3 are jointly referred to by the same actor~2, hence they are directly connected by an edge in the concept congruence network illustrated on the right.
If concepts~1 and~3 were jointly referred to by more than one actor, their similarity would be greater than one.
Substantively, the edge weights between concepts can be interpreted as their similarity in terms of the actors that employ them in the policy debate.
Actors and concepts are merely a substantive application where other variables~1 and~2 could have been encoded instead, such as persons and locations or speakers and addressees.
Simply modelling referral of a concept by an actor, however, is insufficient to capture agreement and opposition in policy debates.
For example, actor~1 and actor~2 may either both support concept~1, they may both reject concept~1, or one of them may refer to concept~1 in a positive way while the other one may refer to concept~1 in a negative way.
Depending on these configurations, one would infer a congruent or a conflictual relationship between actors~1 and~2.
Figure~\ref{fig:algo_binarycongruence} illustrates how two types of networks can be generated, congruence and conflict networks.
In a congruence network, edges are counted when both actors co-support or co-reject a concept.
In a conflict network, edges are counted when the two actors' agreement patterns differ.
\begin{figure}[tbp]
\begin{center}
\begin{tikzpicture}
\node [actor] (a1) at (0,1) {$a_1$};
\node [actor] (a2) at (0,0) {$a_2$};
\node [category] (c1) at (3,0.5) {$c_1$};
\draw [grey] (a1) to node [above=1mm,circle,solid,line width=0.2mm] {\textbf{+}} (c1);
\draw [grey] (a2) to node [below=1mm,circle,solid,line width=0.2mm] {\textbf{+}} (c1);
\draw [black] (a1) to (a2);
\node [annotation,right=-3mm] at (0,2.3) {congruence networks};
\node [actor] (a1) at (5.5,1) {$a_1$};
\node [actor] (a2) at (5.5,0) {$a_2$};
\node [category] (c1) at (8.5,0.5) {$c_1$};
\draw [grey] (a1) to node [above=1mm,circle,solid,line width=0.2mm] {\textbf{+}} (c1);
\draw [grey] (a2) to node [below=1mm,circle,solid,line width=0.2mm] {\textbf{--}} (c1);
\draw [black] (a1) to (a2);
\node [annotation,right=-3mm] at (5.5,2.3) {conflict networks};
\node [actor] (a1) at (0,-1.5) {$a_1$};
\node [actor] (a2) at (0,-2.5) {$a_2$};
\node [category] (c1) at (3,-2) {$c_1$};
\draw [grey] (a1) to node [above=1mm,circle,solid,line width=0.2mm] {\textbf{--}} (c1);
\draw [grey] (a2) to node [below=1mm,circle,solid,line width=0.2mm] {\textbf{--}} (c1);
\draw [black] (a1) to (a2);
\node [actor] (a1) at (5.5,-1.5) {$a_1$};
\node [actor] (a2) at (5.5,-2.5) {$a_2$};
\node [category] (c1) at (8.5,-2) {$c_1$};
\draw [grey] (a1) to node [above=1mm,circle,solid,line width=0.2mm] {\textbf{--}} (c1);
\draw [grey] (a2) to node [below=1mm,circle,solid,line width=0.2mm] {\textbf{+}} (c1);
\draw [black] (a1) to (a2);
\end{tikzpicture}
\caption{Illustration: congruence and conflict networks with a binary qualifier variable.}
\label{fig:algo_binarycongruence}
\end{center}
\end{figure}
Qualifiers like ``agreement'' need not be binary (e.\,g., positive or negative).
They could be represented by integer weights, such as intensity of agreement on a scale from $-5$ to $+5$.
In this case, one would need to define the edge weights between nodes in a congruence network as the absolute difference between the two weights in the affiliation network subtracted from the maximum possible difference and divided by the maximum possible difference.
Conversely, in a conflict network, one would need to define the edge weights between nodes as the absolute difference between the two weights in the affiliation network divided by the maximum possible difference, such that a value of 0 represents no conflict and 1 represents maximal conflict.
In either case, these fractions would need to be counted over all concepts (or, more generally, over all nodes of the second variable).
This calculation is illustrated in Figure~\ref{fig:algo_integercongruence}.
\begin{figure}[tbp]
\begin{center}
\begin{tikzpicture}
\node [actor] (a1) at (0,1) {$a_1$};
\node [actor] (a2) at (0,0) {$a_2$};
\node [category] (c1) at (3,0.5) {$c_1$};
\draw [grey] (a1) to node [above=1mm,circle,solid,line width=0.2mm] {$+2$} (c1);
\draw [grey] (a2) to node [below=1mm,circle,solid,line width=0.2mm] {$+5$} (c1);
\draw [black] (a1) to node [left=1mm,circle,solid,line width=0.2mm] {$\frac{7}{10}$} (a2);
\node [annotation,right=-3mm] at (0,2.3) {congruence networks};
\node [actor] (a1) at (5.5,1) {$a_1$};
\node [actor] (a2) at (5.5,0) {$a_2$};
\node [category] (c1) at (8.5,0.5) {$c_1$};
\draw [grey] (a1) to node [above=1mm,circle,solid,line width=0.2mm] {$+2$} (c1);
\draw [grey] (a2) to node [below=1mm,circle,solid,line width=0.2mm] {$+5$} (c1);
\draw [black] (a1) to node [left=1mm,circle,solid,line width=0.2mm] {$\frac{3}{10}$} (a2);
\node [annotation,right=-3mm] at (5.5,2.3) {conflict networks};
\node [actor] (a1) at (0,-1.5) {$a_1$};
\node [actor] (a2) at (0,-2.5) {$a_2$};
\node [category] (c1) at (3,-2) {$c_1$};
\draw [grey] (a1) to node [above=1mm,circle,solid,line width=0.2mm] {$-4$} (c1);
\draw [grey] (a2) to node [below=1mm,circle,solid,line width=0.2mm] {$+2$} (c1);
\draw [black] (a1) to node [left=1mm,circle,solid,line width=0.2mm] {$\frac{4}{10}$} (a2);
\node [actor] (a1) at (5.5,-1.5) {$a_1$};
\node [actor] (a2) at (5.5,-2.5) {$a_2$};
\node [category] (c1) at (8.5,-2) {$c_1$};
\draw [grey] (a1) to node [above=1mm,circle,solid,line width=0.2mm] {$-4$} (c1);
\draw [grey] (a2) to node [below=1mm,circle,solid,line width=0.2mm] {$+2$} (c1);
\draw [black] (a1) to node [left=1mm,circle,solid,line width=0.2mm] {$\frac{6}{10}$} (a2);
\end{tikzpicture}
\caption{Illustration: congruence and conflict networks with an integer qualifier variable.}
\label{fig:algo_integercongruence}
\end{center}
\end{figure}
Finally, it may be necessary to normalise the resulting affiliation, congruence, or conflict network.
Normalisation may be necessary to avoid a core--periphery structure where those actors end up at the center of the network who refer to most concepts.
Normalisation corrects for the verbosity of actors (or, more generally, for the centrality of a node in the affiliation network).
Several normalisation methods are available, and they will be described below.
Graph clustering can then be applied to the normalised networks to identify coalitions in policy debates.
More details, especially on the topic of normalisation, can be found in \citet{leifeld2017discourse}.
The next section will introduce some formal notation to represent the data structures and transformations introduced above; then these transformations will be re-introduced more formally using mathematical notation, and finally normalisation methods will be proposed.
\section{Notation}\label{sec:notation}
$X$ is a three-dimensional array representing statement counts.
$x_{ijk}$ is a specific count value in this array, with the first index $i$ denoting an instance of the first variable (e.\,g., organization or actor $i$), the second index $j$ denoting an instance of the second variable (e.\,g., concept $j$), and the third index $k$ denoting a level on the qualifier variable (e.\,g., agreement = $1$).
For example, $x_{ijk} = 5$ could mean that actor $i$ mentions concept $j$ with intensity $k$ five times.
$X$ can be represented as a cuboid, as illustrated in Figure~\ref{fig:algo_cuboid}.
\begin{figure}
\begin{center}
% diagram adjusted from https://latex.org/know-how/440-tikz-3dplot
\tdplotsetmaincoords{60}{125}
\begin{tikzpicture}
[tdplot_main_coords,
grid/.style={very thin,gray},
axis/.style={->,blue,thick},
cube/.style={opacity=.5,very thick,fill=red}]
%draw a grid in the x-y plane
\foreach \x in {-0.5,0,...,2.5}
\foreach \y in {-0.5,0,...,2.5}
{
\draw[grid] (\x,-0.5) -- (\x,2.5);
\draw[grid] (-0.5,\y) -- (2.5,\y);
}
%draw the axes
\draw[axis] (0,0,0) -- (3,0,0) node[anchor=west]{$i$};
\draw[axis] (0,0,0) -- (0,3,0) node[anchor=west]{$j$};
\draw[axis] (0,0,0) -- (0,0,3) node[anchor=west]{$k$};
%draw the bottom of the cube
\draw[cube] (0,0,0) -- (0,2,0) -- (2,2,0) -- (2,0,0) -- cycle;
%draw the back-right of the cube
\draw[cube] (0,0,0) -- (0,2,0) -- (0,2,2) -- (0,0,2) -- cycle;
%draw the back-left of the cube
\draw[cube] (0,0,0) -- (2,0,0) -- (2,0,2) -- (0,0,2) -- cycle;
%draw the front-right of the cube
\draw[cube] (2,0,0) -- (2,2,0) -- (2,2,2) -- (2,0,2) -- cycle;
%draw the front-left of the cube
\draw[cube] (0,2,0) -- (2,2,0) -- (2,2,2) -- (0,2,2) -- cycle;
%draw the top of the cube
\draw[cube] (0,0,2) -- (0,2,2) -- (2,2,2) -- (2,0,2) -- cycle;
\end{tikzpicture}
\caption{Statements in the $X$ array can be represented in a cuboid data structure.}
\label{fig:algo_cuboid}
\end{center}
\end{figure}
Where the qualifier variable is binary, \emph{false} values are represented as $0$ and \emph{true} values as $1$ on the $k$ index, i.\,e., $K^\text{binary} = \{ 0; 1 \}$.
Where the qualifier variable is integer, the respective integer value is used as the level.
This implies that $k$ can take positive or negative values or 0, i.e, $K^\text{integer} \subseteq \mathbb{Z}$.
Note that all $k$ levels of the scale are included in $K$, not just those values that are empirically observed.
Indices with a prime denote a second instance of an element, e.\,g., $i'$ may denote another organization.
$Y$ denotes the output matrix to be obtained by applying a transformation to $X$.
Several transformations are possible and will be described below.
\section{Construction of One-Mode Networks} \label{sec:onemode}
\subsection{Congruence Networks} \label{subsec:congruence}
In a congruence network, the edge weight between nodes $i$ and $i'$ represents the number of times they co-support or co-reject second-variable nodes (if a binary qualifier is used) or the cumulative similarity between $i$ and $i'$ over their assessments of second-variable nodes (in the case of an integer qualifier variable).
In the integer case, the similarity between nodes $i$ and $i'$ is defined as the cumulative similarity over levels $k$ of the qualifier variable:
\begin{equation} \label{eq:congruence_integer}
y_{ii'}^\text{congruence} = \Phi_{ii'}\left( \sum_{j = 1}^n \sum_{k} \sum_{k'} x_{ijk} x_{i'jk'} \left( 1 - \frac{\vert k - k' \vert}{\vert K \vert - 1} \right) \right)
\end{equation}
where $\Phi_{ii'}(\cdot)$ denotes a normalization function (to be specified below).
Here, $\vert k - k' \vert$ is the difference in assessment of second-mode node $k$ (e.\,g., concept) by two first-mode nodes $i$ and $i'$.
$\vert K \vert - 1$ is the maximum diffference there can be, with $\vert K \vert$ indicating the number of levels in qualifier variable $K$.
For example, if the qualifier scale is $[-5; 5]$, $\vert K \vert - 1 = 10$.
The subtraction in the parentheses serves to convert distances (as in a conflict network) to similarities.
The distances are counted over all statements, meaning that nodes $i$ and $i'$ count these similarities over all combinations of the levels $k$ (for node $i$) and $k'$ (for node $i'$) for each second-variable node $j$ (e.\,g., each concept) and weight them by how often these combinations occur.
This weighting occurs in the $x_{ijk} x_{i'jk'}$ part.
For example, if $i$ mentions $j$ at intensity $-4$ twice and $i'$ mentions $j$ at intensity $+2$ three times on an intensity scale $[-5; +5]$, this contributes $2 \cdot 3 \cdot (1 - \frac{\vert-4 - 2\vert}{11 - 1}) = 4.2$ to the edge weight between $i$ and $i'$ in the congruence network.
The binary case with $\vert K \vert = 2$ is a special case of the integer congruence network with a negative or positive agreement pattern, for example reflecting rejection or support of a concept by an actor.
In the binary case, congruent opinions always reduce to $1 - \frac{\vert k - k' \vert}{\vert K \vert - 1} = 1$, and differences in opinion always reduce to $1 - \frac{\vert k - k' \vert}{\vert K \vert - 1} = 0$.
Hence the binary case can be more easily expressed by counting the matches on the $k$ qualifier for all $j$ items without computing any distances:
\begin{equation} \label{eq:congruence_binary}
y_{ii'}^\text{congruence binary} = \Phi_{ii'}\left( \sum_{j = 1}^n \sum_{k} x_{ijk} x_{i'jk} \right).
\end{equation}
\subsection{Conflict Networks} \label{subsec:conflict}
The same logic as for the congruence network can be applied to produce conflict networks.
In the integer case, Equation~\ref{eq:congruence_integer} must be modified such that the relative distances are not subtracted from one, while everything else stays the same:
\begin{equation}
y_{ii'}^\text{conflict} = \Phi_{ii'}\left( \sum_{j = 1}^n \sum_{k} \sum_{k'} x_{ijk} x_{i'jk'} \left( \frac{\vert k - k' \vert}{\vert K \vert - 1} \right) \right)
\end{equation}
In the binary case, Equation~\ref{eq:congruence_binary} must be modified such that contradictions instead of matches are counted.
In other words, instead of counting $x_{ijk} x_{i'jk}$, $x_{ijk} x_{i',j,(1-k)}$ must be counted:
\begin{equation}\label{eq:conflict_binary}
y_{ii'}^\text{conflict binary} = \Phi_{ii'}\left( \sum_{j = 1}^n \sum_{k} x_{ijk} x_{i',j,(1-k)} \right).
\end{equation}
\subsection{The Subtract Method}
In many empirical applications, it might make sense to combine the notions of congruence and conflict in a single signed and weighted network.
If only congruence is considered, for example, one misses out on the possible fact that two actors may contradict each other on more concepts than they agree on.
For this reason, it might make sense to subtract conflict edge weights from congruence edge weights and thereby construct a signed, weighted graph using the \emph{subtract} method as follows:
\begin{equation}
y_{ii'}^\text{subtract} = y_{ii'}^\text{congruence} - y_{ii'}^\text{conflict}
\end{equation}
Here, positive $y_{ii'}^\text{subtract}$ values indicate congruence in excess of conflict while negative values indicate conflict in excess of congruence.
In some practical applications---for example, for visualisations of the congruence network---, it may make sense to discard all negative values or introduce some other threshold value $c$ for recoding all $y_{ii'}^\text{subtract} < c$ values as $0$.
\subsection{The Ignore Method} \label{subsec:ignore}
In some applications, qualifiers do not matter substantively, or there is only one level on the qualifier variable.
In such applications, it is possible to just count all referrals of $j$ by $i$ across levels of $k$ to get the number of times $i$ mentions $j$ in any way, then do the same for $i'$, and multiply both to yield the similarity between $i$ and $i'$ in terms of overlap in $j$, disregarding the levels of $k$:
\begin{equation}\label{eq:ignore}
y_{ii'}^\text{ignore} = \Phi_{ii'}\left( \sum_{j = 1}^n \left( \left( \sum_{k} x_{ijk} \right) \left( \sum_{k} x_{i'jk} \right) \right) \right)
\end{equation}
\section{Normalisation for One-Mode Networks}\label{sec:normalis}
\citet{leifeld2017discourse} discusses the normalisation of congruence networks.
Normalisation, however, is also possible for affiliation networks, as will be demonstrated below.
Normalisation can be necessary to correct networks for the activity or popularity of nodes.
For example, if some first-variable nodes refer to a substantial number of second-variable nodes while others refer to few, the former will be more likely to be connected to many other nodes and especially those with similar levels of activity, which leads to a core--periphery structure of the discourse network.
Normalisation corrects for this pattern by cancelling out the effect of activity or popularity of nodes.
This will often lead to a clear cluster structure based on the similarity of node profiles, instead of a core--periphery structure.
In the simplest case, normalization can be switched off, in which case
\begin{equation}
\Phi_{ii'}^\text{no}(\omega) = \omega.
\end{equation}
\subsection{Average Activity Normalisation of One-Mode Networks}
Edge weights can be divided by the \emph{average activity} of nodes $i$ and $i'$:
\begin{equation}\label{eq:activity}
\Phi_{ii'}^\text{avg} (\omega) = \frac{\omega}{ \frac{1}{2} \left( \sum_{j = 1}^n \sum_{k} x_{ijk} + \sum_{j = 1}^n \sum_{k} x_{i'jk} \right) }.
\end{equation}
\emph{Average activity normalisation} is the most commonly applied form of normalisation and works both with binary and weighted $X$ arrays, i.\,e., with or without duplicate statements.
It divides each weight by the mean of the number of second-variable referrals of nodes $i$ and $i'$.
\subsection{Jaccard Normalisation for One-Mode Networks}
With \emph{Jaccard normalisation}, we do not just count $i$'s and $i'$'s activity and sum them up independently, but we add up both their independent activities and their joint activity, i.\,e., both matches and non-matches:
\begin{equation}\label{eq:jaccard}
\Phi_{ii'}^\text{Jaccard} (\omega) = \frac{\omega}{ \sum_{j = 1}^n \sum_{k} x_{ijk} [x_{i'jk} = 0] + \sum_{j = 1}^n \sum_{k} x_{i'jk}[x_{ijk} = 0] + \sum_{j = 1}^n \sum_{k} x_{ijk} x_{i'jk} }.
\end{equation}
Jaccard normalisation works best with binary $X$ arrays, i.\,e., if duplicate statements are not possible in the data structure.
\subsection{Cosine Normalisation for One-Mode Networks}
With \emph{cosine normalization}, we modify Equation~\ref{eq:activity} to take the product in the denominator instead of the mean:
\begin{equation}\label{eq:cosine}
\Phi_{ii'}^\text{cosine} (\omega) = \frac{\omega}{ \sqrt{(\sum_{j = 1}^n \sum_{k} x_{ijk})^2} \sqrt{(\sum_{j = 1}^n \sum_{k} x_{i'jk})^2} }.
\end{equation}
This works best when duplicates are admitted but can also be applied to binary $X$ arrays.
\section{Affiliation Networks}\label{sec:twomode}
While one-mode networks as portrayed in Section~\ref{sec:onemode} are most useful for analysing coalition structure in policy debates, affiliation networks convey more complexity.
This makes them harder to interpret with increasing complexity of the data but can be more informative for less complex discourse networks.
The simplest case is to ignore the qualifier variable:
\begin{equation}
y_{ij}^\text{affiliation ignore} = \Phi_{ij}\left(\sum_{k} x_{ijk} \right)
\end{equation}
This only makes sense if there is only one level in $K$ or if the qualifier variable does not matter substantively.
More interestingly, negative edges (e.\,g., rejection of concepts by actors) can be subtracted from positive edges (e.\,g., support of concepts by actors).
This yields a signed, weighted affiliation network.
In the integer case, the respective cells in $X$ can just be weighted by the respective level $k$.
If the weight is negative, this will subtract $x_{ijk}$ from the count:
\begin{equation}
y_{ij}^\text{affiliation subtract integer} = \Phi_{ij}\left(\sum_{k} k x_{ijk} \right)
\end{equation}
In the binary case (assuming $K = \{0; 1\}$), $0$ values need to be transformed into $-1$ before they can be subtracted:
\begin{equation}
y_{ij}^\text{affiliation subtract binary} = \Phi_{ij}\left(\sum_{k} \left( k x_{ijk} - (1 - k) x_{ijk} \right) \right)
\end{equation}
Alternatively, in the binary case (assuming $K = \{0; 1\}$), it is possible to map all combinations of $k$ for each $(ij)$ dyad into a multiplex network with three distinct types of edges, where $0$ represents neither agreement nor disagreement, $1$ represents agreement$, $2$ represents disagreement$, and $3$ represents a mix of both agreement and disagreement.
This can be useful, for example, for visualising agreement, disagreement, and ambiguity/ambivalence in the same affiliation network using different colours.
More formally:
\begin{equation}
y_{ij}^\text{affiliation combine binary} =
\begin{cases}
0 & \text{if } \sum_k x_{ijk} = 0 \\
1 & \text{if } x_{i,j,k=0} = 0 \wedge x_{i,j,k=1} > 0 \\
2 & \text{if } x_{i,j,k=1} = 0 \wedge x_{i,j,k=0} > 0 \\
3 & \text{if } x_{i,j,k=0} > 0 \wedge x_{i,j,k=1} > 0
\end{cases}
\end{equation}
\section{Normalisation of Affiliation Networks}
Like one-mode networks, affiliation networks can be normalised.
With \emph{activity normalisation}, ties from more active nodes receive lower weights:
\begin{equation}
\Phi_{ij}^\text{activity}(\omega) = \frac{\omega}{\sum_{j = 1}^n \sum_k x_{ijk}}
\end{equation}
With \emph{prominence normalisation}, ties to more prominent nodes receive lower weights:
\begin{equation}
\Phi_{ij}^\text{prominence}(\omega) = \frac{\omega}{\sum_{i = 1}^m \sum_k x_{ijk}}
\end{equation}
\section{Temporal Aggregation: Time Windows and Attenuation}\label{sec:longi}
Networks can be temporally smoothed.
For example, it is possible to create a series of temporally overlapping time slices and aggregate these slices into a single network to limit the temporal scope of congruence edges (\emph{time window algorithm}).
Using the same algorithm, it is possible to visualise change over time using animations.
Or it is possible to make the edge weight proportional to the time that has passed between the relevant statements of $i$ and $i'$ (\emph{attenuation algorithm}).
These methods are more advanced and are introduced in \citet{leifeld2016policy}.
\chapter{Installation of \dna\ and \rdna} \label{chp:installation}
\chapterauthor{Johannes Gruber and Philip Leifeld}
\FloatBarrier
This section explains how \dna\ and \rdna\ can be installed on common desktop operating systems.
As \dna\ is written in \java, both \dna\ and \rdna\ rely on \java\ to work on your computer properly.
Installing and configuring a valid \java\ Runtime Environment on your machine will thus be the first and only complicated step of the installation.
Following the simple steps below, one should not run into problems while setting up \java.
The advantage of the \java\ programming language for academic software is that it both runs on different operating systems without altering the source code, once the Runtime Environment is set up, and that it is, for the most part, open source.
Besides setting up the \java\ Runtime Environment, the installation of \dna\ and \rdna\ is identical on different operating systems.
If you feel confident that \java\ is already correctly set up on your computer, you can therefore skip to Section~\ref{sec:installdna} if you like.
Otherwise please continue to the section for the operating system you wish to install \dna\ and \rdna\ on:
\fullref{sec:windows},
\fullref{sec:mac} or
\fullref{sec:linux}.
\enlargethispage{1cm}
For more experienced users, here is a short version of the steps described below:
\begin{enumerate}
\item (On Mac: install \href{https://support.apple.com/downloads/DL1572/en_US/javaforosx.dmg}{Apple's legacy version of \java}---even though we will never use it.)
\item Install \java\ Runtime Environment (JDK) (Version 8) on your computer.
\item (On Windows and Mac: set up the \code{JAVA\_HOME} to the installation path of your JDK.)
\item Download the newest executable JAR from \url{https://github.com/leifeld/dna/releases}.
\item (On Linux: make the JAR file executable.) \\
(On Mac: allow executing apps from an unidentified developer.)
\item You can now run the standalone \dna\ or continue to install \rdna\ as well.
\item Download and install \R\ (and \rstudio).
\item In \R: install the necessary \R\ packages \texttt{rJava} and \texttt{devtools}.
\item In \R: install \rdna\ via
<<eval=FALSE, results = 'tex', message = FALSE>>=
devtools::install_github("leifeld/dna/rDNA", args = "--no-multiarch")
@
\end{enumerate}
\section{Windows} \label{sec:windows}
\subsection{Installing \java\ on Windows}
To install the necessary \java\ Runtime Environment on your Windows computer, simply go to \url{https://www.java.com/en/download/manual.jsp}, scroll down to and download \code{Windows Offline (64-bit)} (see Figure~\ref{fig:downljava}; download \code{Windows Offline} instead if you are using a 32-bit version of Windows).
During the installation, you can accept all the default options, including the installation path.
\begin{figure}[tbp]
\includegraphics[frame, width=\textwidth]{03-1-downljava}
\caption{Downloading JDK from Oracle}
\label{fig:downljava}
\end{figure}
Next, you should set \code{JAVA\_HOME} in your environmental variables to tell your Windows PC where your \java\ installation lives.
This step is optional, but can prevent many issues with \java\ users had in the past.
To set \code{JAVA\_HOME}, you need to navigate to the menu \code{edit the system environment variables}.
The easiest way to get there is to hit the \win\ button on your keyboard and enter \code{environment}.
Windows will then search for programs and settings menus that include this title and should usually display the menu we are looking for on top.\footnote{On older versions of Windows, this might not work.
On Windows~7 you can alternatively right-click on \code{My Computer} and select \code{Properties} $\rightarrow$ \code{Advanced}.
On Windows~8 \code{Control Panel} $\rightarrow$ \code{System} $\rightarrow$ \code{Advanced System Settings}.}
In this menu, you have to find the button \code{Environment variables...}.
Clicking this button should open the window shown in Figure~\ref{fig:javahome}.
Under \code{User Variables}, click \code{New}.\footnote{This sets \code{JAVA\_HOME} just for the current user.
If you want to make \java\ available for all users on the computer you are working on, you can create a \code{System Variable} instead.}
Enter the variable name \code{JAVA\_HOME} and the path to your Java installation in the field \code{Variable value}.
If you have not altered the default installation location, you should find \java\ in \code{"C:\textbackslash Program Files\textbackslash Java\textbackslash jre1.8.0\_151"} or, if you chose to install a 32-bit version of \java, in \code{"C:\textbackslash Program Files (x86)\textbackslash Java\textbackslash jre1.8.0\_151"} (which will cause problems, though, if you try to use it with a 64-bit version of \R).\footnote{Note that you have to repeat this procedure whenever the installation path of \java\ changes, for example whenever \java\ is updated.}
\begin{figure}
\centering
\includegraphics[width=0.6\textwidth]{03-2-javahome}
\caption{Edit JAVA\_HOME to tell Windows where your \java\ lives.}
\label{fig:javahome}
\end{figure}
Windows should now recognise \java\ and be able to run \java\ commands.
To test this, we can open the command prompt (press the \win\ button on your keyboard and simply enter \code{cmd} and then hit \code{Enter}) and type a \java\ command, e.\,g., \code{java -version}.
If the installation was successful, the output should display information about the \java-version and build as depicted in Figure~\ref{fig:javvers}.
\begin{figure}
\centering
\includegraphics[width=0.65\linewidth]{03-3-javaVersionCommand}
\caption{Testing Java installation in Windows command prompt}
\label{fig:javvers}
\end{figure}
After installing \java, you are ready to use \dna\ and could skip to Section~\ref{sec:installdna} if you are not interested in installing \rdna\ as well.
In order to use \rdna, the rest of this section will explain how to install \R\ and the recommended \href{https://en.wikipedia.org/wiki/Integrated_development_environment}{integrated development environment (IDE)} \href{https://www.rstudio.com/products/RStudio/}{\rstudio}, which makes working with \R\ a lot easier and also looks a lot better than the default interface.
\subsection{Installing \R\ on Windows} \label{subsec:installr-win}
\begin{enumerate}
\item First, you need to download \R\ from \url{https://cran.r-project.org/bin/windows/base/}.
\item At the top of the page, click on \code{Download R \Sexpr{R_vers} for Windows} (or a newer version if available).
\item Install the downloaded file, e.\,g., \code{R-\Sexpr{R_vers}-win.exe}.
Usually, it is fine to leave all default settings in the installation options.
\item Go to \url{https://www.rstudio.com/products/rstudio/download/}.
\item At the bottom of the page, under \code{Installers for Supported Platforms}, click on the link \code{RStudio \Sexpr{RS_vers} -- Windows Vista/7/8/10} (or a newer version if available).
Again, the default installation options are fine in most cases and can be accepted without changes.
\item After installation, you can use \R\ by opening \rstudio.
\end{enumerate}
\subsection{Testing the Installation of \rstudio} \label{subsec:rtest}
Traditionally, the first test you perform in a new programming language is to write a ``Hello, World!'' program.
To do this in \R, you simply type \code{print(``Hello World!'')} in the console (in the lower left corner of \rstudio).
Alternatively, you can make \R\ perform a simple mathematical operation.
If everything is set up correctly, the output should look like this:
<<eval=TRUE, results = 'tex'>>=
print("Hello World!")
# You can also use R as a calculator
2 * 3
@
The chunk of code above marks the first time we are using \R\ commands in this manual.
It might be worth explaining what this means for users who are not familiar with documents containing \R\ code.
Whenever code is shown in this manual it is decorated with a light grey background.
Comments in \R\ code (i.\,e., text targeted at the user to explain what is happening in a specific line) are marked with a \code{\#} and are formatted in italic font and in dark grey.
The output, which is generated by running a command, is marked by two \code{\#} and formatted in black.
This means that any line that does not start with \code{\#\#} contains \R\ code you can copy and paste to the console in \rstudio\ and run.
Alternatively, you can also copy the code into an \R\ script and execute it by either clicking on the \rrun\ button in the upper right corner of the console in \rstudio, or you can use the shortcut \code{Ctrl+Enter}.
Either way, the highlighted code or the line in which the caret is currently flashing are sent to the console and executed.
If this works fine, you should be able to continue to Section~\ref{sec:installdna}.
\section{macOS} \label{sec:mac}
\subsection{Installing \java\ on macOS}
On macOS, you have to install two versions of \java\ in order for \rdna\ to work properly.
The reasons behind this are too complicated to cover here.
Basically, Apple built its own version of \java, which needs to be on your machine, even though it is outdated.
Therefore we need to first install the legacy \java~6---which we will never use---before installing the correct \java\ Development Kit version~8.\footnote{If you do not wish to ever use \rdna\ or any other \R\ package that relies on \java, you might not need both versions and can just download the newest \java\ Runtime Environment.
However, installing \java\ version~8 before the legacy \java\ will cause problems if you ever change your mind.}
First, please download the file \url{https://support.apple.com/downloads/DL1572/en_US/javaforosx.dmg} and install it, accepting all defaults.
After this has finished, we can proceed to get the new version of the \java\ Development Kit.
Go to \url{http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html} and scroll down to \code{Java SE Development Kit 8u162}, accept the License Agreement and then click on \code{jdk-8u161-macosx-x64.dmg} to download the file (see Figure~\ref{fig:downljava2}).
Again, install the program accepting all defaults.
\begin{figure}[tbp]
\centering
\includegraphics[frame, width=0.75\textwidth]{03-1-downljava2}
\caption{Downloading JDK from Oracle.}
\label{fig:downljava2}
\end{figure}
After installing \java, you are ready to use \dna\ and could skip to Section~\ref{sec:installdna} if you are not interested in installing \rdna\ as well.
In order to use \rdna, the rest of this section will explain how to install \R\ and the recommended \href{https://en.wikipedia.org/wiki/Integrated_development_environment}{integrated development environment (IDE)} \href{https://www.rstudio.com/products/RStudio/}{\rstudio}, which makes working with \R\ a lot easier and also looks a lot better than \R's default interface.
\subsection{Installing \R\ on macOS} \label{subsec:installr-mac}
\begin{enumerate}
\item First, you need to download \R\ from \url{https://cran.r-project.org/bin/macosx/}.
\item At the top of the page, click on \code{R-\Sexpr{R_vers}.pkg} (or a newer version if available).
\item Install the downloaded file.
Usually, it is fine to leave all default settings in the installation options.
\item Go to \url{https://www.rstudio.com/products/rstudio/download/}.
\item At the bottom of the page, under \code{Installers for Supported Platforms}, click on the link \code{RStudio \Sexpr{RS_vers} -- Mac OS X 10.6+ (64-bit)} (or a newer version if available).
Install RStudio by simply dragging the application icon in the downloaded \code{.dmg} file to your Applications folder.
\item Then you need to install the program \texttt{Xcode} from the app store. The program is very large and will take a while to install.
\item After installation, you can use \R\ by opening \rstudio.
\end{enumerate}
To test your installation of \R, follow the instructions in Section~\ref{subsec:rtest}.
Working with \java\ from within \R\ on a Mac is a bit messy.
Apple's own version of \java, although important to have installed, does not run in combination with \R.
That is why we have to tell your system which version of \java\ to use by default.
To do this, we have to enter a few system commands, which you can either do in the Terminal app or directly from within \R\ using the \code{system} function:
<<eval=FALSE, results = 'tex'>>=
# list files in java_home
system("/usr/libexec/java_home -V")
##Matching Java Virtual Machines (3):
## 1.8.0_162, x86_64: "Java SE 8" /Library/Java/JavaVirtualMachines/jdk1.8.0...
## 1.6.0_65-b14-468, x86_64: "Java SE 6" /Library/Java/JavaVirtualMachines/...
## 1.6.0_65-b14-468, i386: "Java SE 6" /Library/Java/JavaVirtualMachines/1....
# see default version of Java
system("java -version")
##java version "1.8.0_162"
##Java(TM) SE Runtime Environment (build 1.8.0_162-b12)
##Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode)
@
If your output looks like the output above, you are almost ready to install \texttt{rJava}.
The only thing left to do is to associate \java\ with \R.
To do this, you can either use the terminal app, or you can invoke a system command directly from within \R\ using the \code{system} function:
<<eval=FALSE, engine = 'bash', results = 'tex'>>=
$sudo R CMD javareconf
@
Or in \R:
<<eval=FALSE, results = 'tex'>>=
system("sudo R CMD javareconf")
@
If \code{/usr/libexec/java\_home -V} does not show \code{1.8.0\_162} (or any other version staring with \code{1.8.}), you need to install \java\ version~8 again (see above) and possibly reboot your computer.
If \code{java -version} shows \code{java version "1.6.0\_65"}, but version~1.8 is listed in the output from the first command, you can set the default by excecuting the following command:
<<eval=FALSE, results = 'tex'>>=
# Set JAVA_HOME
system("export JAVA_HOME=`/usr/libexec/java_home -v 1.8`")
@
After this, you should be able to continue to Section~\ref{sec:installdna}. However, depending on prior installations and the configuration of your machine, there can be other problems. You can find one nice tutorial and trouble-shooting guide \href{https://github.com/MTFA/CohortEx/wiki/Run-rJava-with-RStudio-under-OSX-10.10,-10.11-(El-Capitan)-or-10.12-(Sierra)#using-rstudioapp-or-rapp}{here}.
\section{Linux} \label{sec:linux}
\subsection{Installing \java\ on Linux}
%To Do: Add Suse and Debian commands where different
Since you are using Linux, we assume that you are sufficiently comfortable with using the terminal.
First, check if \java\ might already be installed:
<<eval=FALSE, engine = 'bash', results = 'tex'>>=
$java -version
@
If not, install it, e.\,g., via \code{APT}:
<<eval=FALSE, engine = 'bash', results = 'tex'>>=
$sudo apt-get install default-jdk
@
After installing \java, you are ready to use \dna\ and could skip to Section~\ref{sec:installdna} if you are not interested in installing \rdna\ as well.
In order to use \rdna, the rest of this section will explain how to install \R\ and the recommended \href{https://en.wikipedia.org/wiki/Integrated_development_environment}{integrated development environment (IDE)} \href{https://www.rstudio.com/products/RStudio/}{\rstudio}, which makes working with \R\ a lot easier and also looks a better than the default user interface.
\subsection{Installing \R\ on Linux}\label{subsec:installr-linux}
\begin{enumerate}
\item Since the version of \R\ in the default repositories tends to be fairly outdated, we add the repository of the Comprehensive R Archive Network (CRAN) to our \code{sources.list}:
<<eval=FALSE, engine = 'bash', results = 'tex'>>=
$sudo add-apt-repository \
"deb [arch=amd64,i386] https://cran.rstudio.com/bin/linux/ubuntu \
$(lsb_release -cs)/"
@
Note, that \code{lsb\_release -a} automatically selects your flavour and version of Linux from the CRAN server.
Visit \href{https://cran.rstudio.com/bin/linux/}{CRAN} to see for which Linux distributions \R\ is available.
\code{cran.rstudio.com} is also just one of several \href{https://cran.r-project.org/mirrors.html}{CRAN mirrors}, so you could replace it with a different one if you prefer.
\item Next, you need to add \R\ to your keyring.
Here is how you would accomplish this in Ubuntu:
<<eval=FALSE, engine = 'bash', results = 'tex'>>=
$sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9
@
\item Update apt and install \R\ (or \code{r-base-dev} if you wish to compile packages from source):
<<eval=FALSE, engine = 'bash', results = 'tex'>>=
$sudo apt-get update
$sudo apt-get install r-base
@
\item Now install \rstudio\ via gdebi (and install gdebi first if you do not already have it):%
\footnote{Alternatively, you can download an installation file from \url{https://www.rstudio.com/products/rstudio/download/}.}\textsuperscript{,}
<<eval=FALSE, engine = 'bash', results = 'tex'>>=
$sudo apt-get install gdebi-core
$wget https://download1.rstudio.org/rstudio-$RS_vers$-amd64.deb
$sudo gdebi -n rstudio-$RS_vers$-amd64.deb
$rm rstudio-$RS_vers$-amd64.deb
@
Note, that as of version \Sexpr{RS_vers}, \rstudio\ depends on an outdated version of \texttt{libgstreamer}.
This version has already been deprecated in some linux distributions, which can lead to an error during installation of \rstudio.
If you run into trouble while installing \rstudio, you should try installing the old version of \texttt{libgstreamer} side-by-side the newer library:
<<eval=FALSE, engine = 'bash', results = 'tex'>>=
# Download files with wget
$wget http://ftp.ca.debian.org/debian/pool/main/g/gstreamer0.10/libgstreamer\
0.10-0_0.10.36-1.5_amd64.deb
$wget http://ftp.ca.debian.org/debian/pool/main/g/gst-plugins-base0.10/libgs\
treamer-plugins-base0.10-0_0.10.36-2_amd64.deb
# Now install with gdebi
$sudo gdebi libgstreamer0.10-0_0.10.36-1.5_amd64.deb
$sudo gdebi libgstreamer-plugins-base0.10-0_0.10.36-2_amd64.deb
# And then clean up
$sudo rm libgstreamer0.10-0_0.10.36-1.5_amd64.deb libgstreamer-plugins-base0.10-0_0.10.36-2_amd64.deb
@
\item For Linux, there are a few other system dependencies for \rdna\. You should install these using:
<<eval=FALSE, engine = 'bash', results = 'tex'>>=
$sudo apt-get install libudunits2-dev
$sudo apt-get build-dep libcurl4-gnutls-dev
$sudo apt-get install libcurl4-gnutls-dev
@
\item After the installation has finished, you can use \R\ by opening \rstudio.
\end{enumerate}
To test your installation of \R, follow the instructions in Section~\ref{subsec:rtest}.
Before we can actually run \rdna, we need to associate \java\ with \R.
To do this, you should go back to the terminal:
<<eval=FALSE, engine = 'bash', results = 'tex'>>=
$sudo apt-get install r-cran-rjava
$sudo R CMD javareconf
@
If this finishes without errors, you are ready to start installing \dna\ and the \rdna\ package as described in Section~\ref{sec:installdna}.
\section{Installing \dna\ and \rdna} \label{sec:installdna}
Once \java\ is set up correctly, you can simply download the latest version of \dna\ as a JAR file from \url{https://github.com/leifeld/dna/releases} (see Figure~\ref{fig:downloadjar}).
JAR or \texttt{.jar} files are technically self-contained and executable archive files, which usually contain a computer program written in \java, along with all the files necessary to run the program.
Once the download is finished, you can start the program by double-clicking on the downloaded file.
However, on Linux, it is sometimes necessary to make the file executable first (e.\,g., via \code{\$chmod +x /path/to/your/dna.jar} or using \href{https://askubuntu.com/a/484719/570716}{a GUI method}).
On newer version of macOS, a security exception needs to be made before you can run a program from an ``unidentified developer'' (i.\,e., if the program has not been registered with Apple).
To do so for \dna, control-click the program's icon, then choose \code{Open} from the shortcut menu.
If clicking on the file does not open the program on a Windows machine, right-click on the \texttt{.jar} file $\rightarrow$ \code{Open with} $\rightarrow$ \code{Use another app} and then navigate to the file \code{"C:\textbackslash Program Files\textbackslash Java\textbackslash jre1.8.0\_151\textbackslash bin\textbackslash javaw.exe"}.
If you are not interested in using \rdna, you can now skip to Chapter~\ref{chp:dna-prep}.
\begin{figure}
\includegraphics[frame, width=\textwidth]{03-4-downloadjar}
\caption{Download \dna\ jar file from GitHub releases page.}
\label{fig:downloadjar}
\end{figure}
At this point, it is assumed that you have installed \R\ and have at least a minimal understanding of how the program works (see Section~\ref{subsec:rtest}).
If that is the case, we can go ahead and install \rdna\ from within \R.
First, we need to install the package \rjava\ \citep{urbanek2017rjava}, which is the most important dependency of \rdna:%
\footnote{Again this sometimes doesn't work that easily on macOS. If the installation fails, you could try to install the package from source using \code{install.packages("rJava", type="source").}}\textsuperscript{,}%
\footnote{Alternativly, it can make sense on Linux systems to install \rjava\ via apt: \code{sudo apt-get install r-cran-rjava}.}
<<eval=FALSE, results = 'tex', message = FALSE>>=
install.packages("rJava")
@
To see if this worked, or to troubleshoot potential problems, we can run a few \java\ commands from within \R:
\footnote{Loading \rjava\ for the first time regulary fails on macOS with the warning \code{...}.
If this is the case, try the command \code{sudo ln -s \$(/usr/libexec/java\_home)/jre/lib/server/libjvm.dylib /usr/local/lib} in your terminal app.}
<<eval=TRUE, results = 'tex', message = FALSE>>=
library("rJava")
# 1. initialize JVM
.jinit()
# 2. retrieve the Java-version
.jcall("java/lang/System", "S", "getProperty", "java.version")
# 3. retrieve JAVA_HOME location
.jcall("java/lang/System", "S", "getProperty", "java.home")
# 4. retrieve Java architecture
.jcall("java/lang/System", "S", "getProperty", "sun.arch.data.model")
# 5. retreive architecture of OS (This should have 64 in it if step 4 displays
# "64")
.jcall("java/lang/System", "S", "getProperty", "os.arch")
# 6. retrieve architecture of R as well (This should again have 64 in it if
# step 4 and 5 display 64)
R.Version()$arch
@
For \rdna\ to work properly, you need to ensure that \rjava\ works correctly.
In particular, it is essential that the architectures of \java, your operating system, and your version of \R\ match (see comments~4, 5, and 6 in the code chunk above).
Once this is done, you should install the package \texttt{devtools} \citep{wickham2018devtools}, which permits installing \R\ packages from \github.
<<eval=FALSE, results = 'tex', message = FALSE>>=
install.packages("devtools")
@
Since we only need one function from the package \texttt{devtools} at this point, it is not necessary to invoke the \code{library} command to load the whole package.
Instead, you can write \code{devtools::} and then type the function you want to use.\footnote{The option \code{args = "--no-multiarch"} should normally not be necessary, but prevents errors on some operating systems.
Since \texttt{devtools} tries to test both the 32-bit and 64-bit version of a package during installation, the process inevitably fails as only one architecture of \java\ is available.}
<<eval=FALSE, results = 'tex', message = FALSE>>=
devtools::install_github("leifeld/dna/rDNA", args = "--no-multiarch")
@
After this is done as well, the final step of the installation is to test if \rdna\ can be loaded into \R\ correctly and to perform a basic operation with it---opening \dna\ from within \R.
In order to do so, you first need to download \dna, which can also be done in \R\ with the \code{dna\_downloadJar} command (see Chapter~\ref{chp:rdna} for more details on what these commands mean).
<<eval=TRUE, echo = FALSE, warning = FALSE>>=
rDNA::dna_downloadJar()
@
<<eval=FALSE, results = 'tex', message = FALSE>>=
# download rDNA JAR
dna_downloadJar() # download DNA jar
# load library
library("rDNA")
# initialise the file you just downloaded
dna_init()
# start up DNA from R with the sample file to see if everything worked
dna_gui(infile = dna_sample())
@
If these commands can be executed correctly, you are ready to use both \dna\ and \rdna.
\chapter{Preparation of your \dna\ Workspace} \label{chp:dna-prep}
\chapterauthor{Felix Rolf Bossner and Johannes Gruber}
\FloatBarrier
After installing the program (see Chapter~\ref{chp:installation}), you can now create your first DNA database for your own research project.
How you set up a DNA database will mainly depend on the needs of your personal research design---which should usually be clear before you start analysing data.
Therefore, \dna\ can be customised during the creation of a new database in accordance with how you are planning to use the tool.
\section{Creating a new DNA Database}\label{sec:createnewdb}
In order to create a new DNA database file, you have to click on the index tab \code{File} (in the upper left corner of your DNA program window) and select the option \code{New DNA database} (see Figure~\ref{fig:newdb}).
As a result, a new window will open (see Figure~\ref{fig:dbchoose}), in which you find a menu that provides you with a step-by-step guidance for specifying the configuration of your personal DNA database
\begin{figure}
\includegraphics[frame, width=\linewidth]{04-1-newDatabase}
\caption{Starting a new Database}