POL 478H1 F

POL 478H1 FLoopsOlga Chyzh [www.olgachyzh.com]1 / 14

Writing Reproducible Code

Why is it important to write easy-to-follow code?
What are some ways to enhance the readability of your code?

Why Use Loops?

Loops are a way to shorten repetitive code like this

ALA<-read.table("ALA_PctResults20161108.txt", sep="\t", header=FALSE, fill=TRUE)
BRA<-read.table("BRA_PctResults20161108.txt", sep="\t", header=FALSE, fill=TRUE)
CAL<-read.table("CAL_PctResults20161108.txt", sep="\t", header=FALSE, fill=TRUE)
mydata<-rbind(ALA,BRA,CAL)

myfilenames<-c("ALA_PctResults20161108.txt","BRA_PctResults20161108.txt","CAL_PctResults20161108.txt")
mydata<-NULL
for (i in myfilenames){
d<-read.table(i, sep="\t", header=FALSE, fill=TRUE)  
mydata<-rbind(mydata,d)
}

3 / 14

Loops Help

Shorten/clarify the code
Reduce the probability of typing errors
Speed up coding
Can loop over indices, names, and values

4 / 14

Loop Components

The wrapper for (variable in vector){}
An (initially empty) object to store the result, usually outside the loop
A series of commands that will be applied to each element in the vector indexed by variable
The last line usually (but not always) appends the result to the empty object we started with in 2.

5 / 14

Example 1: Florida Elections Returns 2016

Download the zip file that contains Florida 2016 Elections Returns here;
Unzip the data, make sure to set your working directory to the folder where you saved the data;
We are going to write a loop that opens each of the 68 files in this folder and combines them into a single dataset.

myfilenames<-list.files()
mydata<-NULL
for (i in myfilenames){
d<-read.table(i, quote="",comment.char="", sep="\t", header=FALSE)  
mydata<-rbind(mydata,d)
}

6 / 14

Options for `read.table`

header--whether the first row of the data contains variable names
quote--whether to interpret quotes as a part of text (e.g., a name with an apostrophe) or are quotes used to denote character variables. If not specified, the function tries to read text within quotes as character.

d<-read.table(myfilenames[18], quote="",sep="\t", header=FALSE, fill=TRUE)

comment.char--the default behavior for read.table is to treat # as the beginning of a comment and ignore what follows. We need to turn this option off, as some of files contain # to denote district number.

d<-read.table(myfilenames[62], quote="", comment.char="",sep="\t", header=FALSE, fill=TRUE)

sep--the column separator; the default is white space, but in this case it is tab.

7 / 14

Your Turn

Edit our loop, so that we only keep data on Trump Vote and Clinton Vote for each county. Use a pipe.

8 / 14

Example 3: World Bank Data

Change your working directory to where you stored WDI data from last week.
We can get these data on the long form the way we did in the homework or using a loop with the indicator name as our variable.

The old way:

d<-read_csv("WDIData.csv", col_names=T) %>% filter(`Indicator Name` %in% c("GDP (constant 2010 US$)","Foreign direct investment, net inflows (% of GDP)")) %>% select(-`Indicator Code`,-`Country Code`,-`2020`,-X66) %>% slice(-(1:94)) %>% pivot_longer(`1960`:`2019`,names_to="year", values_to="Indicator") %>% pivot_wider(names_from="Indicator Name", values_from="Indicator")

The new way:

indname<-c("GDP (constant 2010 US$)","Foreign direct investment, net inflows (% of GDP)")
mydata<-NULL
for (ind in indname){
d<-read_csv("WDIData.csv", col_names=T)  %>% filter(`Indicator Name`==ind) %>% select(-`Indicator Code`,-`Country Code`,-`2020`,-X66) %>% slice(-(1:47)) %>% pivot_longer(`1960`:`2019`,names_to="year", values_to="Indicator") 
mydata<-rbind(mydata,d)  
} 
mydata<-mydata %>% pivot_wider(names_from="Indicator Name", values_from="Indicator")

9 / 14

Automate Indicator Names

d<-read_csv("WDIData.csv", col_names=T)
indname<-unique(d$`Indicator Name`)
mydata<-NULL
for (ind in indname[1:5]){
d<-read_csv("WDIData.csv", col_names=T)  %>% filter(`Indicator Name`==ind) %>% select(-`Indicator Code`,-`Country Code`,-`2020`,-X66) %>% slice(-(1:47)) %>% pivot_longer(`1960`:`2019`,names_to="year", values_to="Indicator") 
mydata<-rbind(mydata,d)  
} 
mydata<-mydata %>% pivot_wider(names_from="Indicator Name", values_from="Indicator")

10 / 14

A Word of Caution

Loops are not always faster (above example)
R built-in loop functions, such as apply are generally faster, but (beyond the simplest cases) require more advanced programming.

11 / 14

Loops Are Invaluable

Long repetitive scripts
Working with network data

12 / 14

Your Turn 1

Convert the following repeated code into a loop:

library(classdata)
data("terr_attacks.wide")
a<-mean(terr_attacks.wide[,5],na.rm=T)
b<-mean(terr_attacks.wide[,6],na.rm=T)
d<-mean(terr_attacks.wide[,7],na.rm=T)
e<-mean(terr_attacks.wide[,8],na.rm=T)
f<-mean(terr_attacks.wide[,9],na.rm=T)
g<-mean(terr_attacks.wide[,10],na.rm=T)
h<-mean(terr_attacks.wide[,11],na.rm=T)
i<-mean(terr_attacks.wide[,12],na.rm=T)
j<-mean(terr_attacks.wide[,13],na.rm=T)
k<-mean(terr_attacks.wide[,14],na.rm=T)
l<-mean(terr_attacks.wide[,15],na.rm=T)
m<-mean(terr_attacks.wide[,16],na.rm=T)
mymeans<-c(a,b,d,e,f,g,h,i,j,k,l,m)

Now get the means of these variables using summarise

Which one is easier/faster?

13 / 14

Your Turn 2

Convert the following repeated code into a loop:

library(classdata)
data("terr_attacks.wide")
a<-mean(terr_attacks.wide$GDPpc,na.rm=T)
b<-mean(terr_attacks.wide$population,na.rm=T)
d<-mean(terr_attacks.wide$tradeofgdp,na.rm=T)
e<-mean(terr_attacks.wide$`Hostage Taking (Kidnapping)`,na.rm=T)
f<-mean(terr_attacks.wide$Hijacking,na.rm=T)
mymeans<-c(a,b,d,e,f)

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

POL 478H1 F

Loops

Olga Chyzh [www.olgachyzh.com]

Writing Reproducible Code

Why Use Loops?

Loops Help

Loop Components

Example 1: Florida Elections Returns 2016

Options for read.table

Your Turn

Example 3: World Bank Data

Automate Indicator Names

A Word of Caution

Loops Are Invaluable

Your Turn 1

Your Turn 2

Writing Reproducible Code

Help

Options for `read.table`