/
flow_control2.Rmd
136 lines (94 loc) · 3.12 KB
/
flow_control2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
---
title: "More Loops and Conditionals"
output:
html_document:
df_print: paged
---
### Using multiple flow controls
Loops can get really complicated really fast when depending on the complexity of your question. For example here is how to find the biggest thing in a simple data frame (in a really inefficient way).
```{r}
loop.df <- data.frame(lets = LETTERS[1:5],
nums = c(10, 3, 36, 12 , 9))
x <- NULL
y <- NULL
for(i in 1:nrow(loop.df)){
if(is.null(x)){
x <- loop.df[i,1]
y <- loop.df[i,2]
}else{
if(y < loop.df[i,2]){
x <- loop.df[i,1]
y <- loop.df[i,2]
}
}
}
x
```
Now for only odd numbers.
```{r}
loop.df <- data.frame(lets = LETTERS[1:5],
nums = c(10, 3, 36, 12 , 39))
x <- NULL
y <- NULL
for(i in 1:nrow(loop.df)){
if(loop.df[i,2] %% 2 != 0 ){
if(is.null(x)){
x <- loop.df[i,1]
y <- loop.df[i,2]
}else{
if(y < loop.df[i,2]){
x <- loop.df[i,1]
y <- loop.df[i,2]
}
}
}
}
x
```
Now for only odd numbers less than 30.
```{r}
loop.df <- data.frame(lets = LETTERS[1:5],
nums = c(10, 3, 36, 12 , 39))
x <- NULL
y <- NULL
for(i in 1:nrow(loop.df)){
if(loop.df[i,2] > 30 ){
print("too big")
break
}
if(loop.df[i,2] %% 2 != 0 ){
if(is.null(x)){
x <- loop.df[i,1]
y <- loop.df[i,2]
}else{
if(y < loop.df[i,2]){
x <- loop.df[i,1]
y <- loop.df[i,2]
}
}
}
}
x
```
Break can restrict the range and scope of loops
***
Just so you know a more efficient way of doing this for the first example
```{r}
loop.df <- data.frame(lets = LETTERS[1:5],
nums = c(10, 3, 36, 12 , 9))
loop.df <- loop.df[order(loop.df$nums,decreasing = T),] # now reorder
loop.df[1,1]
```
For the second example ...
```{r}
loop.df <- data.frame(lets = LETTERS[1:5],
nums = c(10, 2, 36, 7 , 9))
loop.df <- loop.df[which(loop.df$nums %% 2 != 0),]
loop.df
loop.df <- loop.df[order(loop.df$nums,decreasing = T),]
loop.df
loop.df[1,1]
```
This introduces a couple new pieces of syntax and a very important concept: reduce your problem into manageable sub-problems for which a efficient solution exists. Here we use the functions `which` and `order`. Both we have previously used in the context of manipulating data frames. This illustrates the point that sometimes a loop is not the best solution to your problem, particularly if you have a large amount of data.
#### Exercise
Select a blast/diamond report you have previously generated that has more than 100 hits. Get the first 100 hits in the report and save them to a new file. Then import this new file into R and ***using loops and conditionals*** find the overall best hit (regardless of query or sequence) in the file based off 1) percent ID, 2) bit score, 3) evalue (three separate loops). Next design a loop that will 1) first check to ensure that the total length the alignment is at least 95% of the shorter sequence, then 2) return the best hit (percent ID) that passes this test.