-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathloops.Rmd
More file actions
158 lines (102 loc) · 3.52 KB
/
loops.Rmd
File metadata and controls
158 lines (102 loc) · 3.52 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
---
title: "Loops"
description: |
Over and Over again...
output:
distill::distill_article:
toc: true
toc_float: true
toc_depth: 4
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
We could explicitly call the function as many times need be to clean the tables we're interested in.
```{r eval=FALSE}
table9 <- clean_table(file_01)
table10 <- clean_table(file_02)
table11 <- clean_table(file_03)
```
But that would be redundant! Imagine if we were reading in 15 of those csvs! There would be extra lines of code and as many extra variable names in your global environment to keep track of. Let's see how loops paired with lists can help us!
# Anatomy
A `for` loop has three parts:
1. The Output: stuff can be stored or perform an action
2. The Sequence: code within `()` that shows what to loop over
3. The Body: code within `{}` that does the work
## Sequence
The sequence lies within the `()`. It will follow this structure:
([variable name of your choice] in [list or vector])
The sequence tells the machine what to loop over. The variable name of your choice will represent a single element within the list or vector in a loop.
In the example below, with every iteration, `df` will be a counter and represent a different data frame in `l`
```{r eval=FALSE}
l <- list(file_01, file_02, file_03)
# for every data frame (df) in list (l)...
for(df in l) {
}
```
Try printing the `head()` of each data frame in our list
```{r eval=FALSE}
l <- list(file_01, file_02, file_03)
for(df in l) {
print(head(df))
}
```
Try printing a version of each data frame with `clean_table()`
```{r eval=FALSE}
for(df in l) {
print(clean_table(df))
}
```
## Output
Now instead of printing stuff, let's store stuff in a list!
A way to use both lists and loops is to read data. Let's create a loop to read-in csvs 1 through 11 and store them into a list.
- Initiate the Output (`dfs`) to store the end result (data frames from csvs)
- Construct the sequence you'll be looping over (`csv`) (file names of csvs)
- Read csv (`t`)
```{r eval=FALSE}
dfs <- list()
csv <- paste0('Table', 1:11, '.csv')
for(c in csv) {
t <- read_csv(here('data', '050', c))
dfs[[c]] <- t
}
```
Rename the elements in the list
```{r eval=FALSE}
names(dfs) <- paste0('Table', 1:11)
```
# Put it all together
Edit the loop so that we clean the tables as we're reading in the csvs.
```{r eval=FALSE}
for(c in csv) {
t <- read_csv(here('data', '050', c))
ct <- clean_table(t)
dfs[[c]] <- ct
}
```
With loops and a list to store the output, we've removed code redundancy. Instead of calling `clean_table()` for every table in our list, it just required editing a couple lines within the loop to make that adjustment.
# Alter the Flow
Not everything has to be uniformly applied within a loop. Exceptions are allowed and part of control flow includes skipping to the next element or breaking out of the loop altogether.
## Break
To exit the loop entirely, use `break` with an if statement.
```{r eval=FALSE}
dfs <- list()
csv <- paste0('Table', 1:11, '.csv')
for(c in csv) {
if(c == 'Table6.csv') break
t <- read_csv(here('data', '050', c))
dfs[[c]] <- t
}
```
## Next
To exit the current iteration, use `next` with an if statement. The code below only reads in odd numbered tables.
```{r eval=FALSE}
dfs <- list()
csv <- paste0('Table', 1:11, '.csv')
even_num <- seq(from = 2, to = length(csv), by = 2)
for(c in csv) {
if(c %in% paste0('Table', even_num, '.csv')) next
t <- read_csv(here('data', '050', c))
dfs[[c]] <- t
}
```