+ - 0:00:00
Notes for current slide
Notes for next slide

POL 478H1 F

Loops

Olga Chyzh [www.olgachyzh.com]

1 / 14

Writing Reproducible Code

  1. Why is it important to write easy-to-follow code?

  2. What are some ways to enhance the readability of your code?

More Reading:

2 / 14

Why Use Loops?

Loops are a way to shorten repetitive code like this

ALA<-read.table("ALA_PctResults20161108.txt", sep="\t", header=FALSE, fill=TRUE)
BRA<-read.table("BRA_PctResults20161108.txt", sep="\t", header=FALSE, fill=TRUE)
CAL<-read.table("CAL_PctResults20161108.txt", sep="\t", header=FALSE, fill=TRUE)
mydata<-rbind(ALA,BRA,CAL)

as

myfilenames<-c("ALA_PctResults20161108.txt","BRA_PctResults20161108.txt","CAL_PctResults20161108.txt")
mydata<-NULL
for (i in myfilenames){
d<-read.table(i, sep="\t", header=FALSE, fill=TRUE)
mydata<-rbind(mydata,d)
}
3 / 14

Loops Help

  • Shorten/clarify the code

  • Reduce the probability of typing errors

  • Speed up coding

  • Can loop over indices, names, and values

4 / 14

Loop Components

  1. The wrapper for (variable in vector){}

  2. An (initially empty) object to store the result, usually outside the loop

  3. A series of commands that will be applied to each element in the vector indexed by variable

  4. The last line usually (but not always) appends the result to the empty object we started with in 2.

5 / 14

Example 1: Florida Elections Returns 2016

  1. Download the zip file that contains Florida 2016 Elections Returns here;

  2. Unzip the data, make sure to set your working directory to the folder where you saved the data;

  3. We are going to write a loop that opens each of the 68 files in this folder and combines them into a single dataset.

myfilenames<-list.files()
mydata<-NULL
for (i in myfilenames){
d<-read.table(i, quote="",comment.char="", sep="\t", header=FALSE)
mydata<-rbind(mydata,d)
}
6 / 14

Options for read.table

  • header--whether the first row of the data contains variable names

  • quote--whether to interpret quotes as a part of text (e.g., a name with an apostrophe) or are quotes used to denote character variables. If not specified, the function tries to read text within quotes as character.

d<-read.table(myfilenames[18], quote="",sep="\t", header=FALSE, fill=TRUE)
  • comment.char--the default behavior for read.table is to treat # as the beginning of a comment and ignore what follows. We need to turn this option off, as some of files contain # to denote district number.
d<-read.table(myfilenames[62], quote="", comment.char="",sep="\t", header=FALSE, fill=TRUE)
  • sep--the column separator; the default is white space, but in this case it is tab.
7 / 14

Your Turn

Edit our loop, so that we only keep data on Trump Vote and Clinton Vote for each county. Use a pipe.

8 / 14

Example 3: World Bank Data

  1. Change your working directory to where you stored WDI data from last week.

  2. We can get these data on the long form the way we did in the homework or using a loop with the indicator name as our variable.

The old way:

d<-read_csv("WDIData.csv", col_names=T) %>% filter(`Indicator Name` %in% c("GDP (constant 2010 US$)","Foreign direct investment, net inflows (% of GDP)")) %>% select(-`Indicator Code`,-`Country Code`,-`2020`,-X66) %>% slice(-(1:94)) %>% pivot_longer(`1960`:`2019`,names_to="year", values_to="Indicator") %>% pivot_wider(names_from="Indicator Name", values_from="Indicator")

The new way:

indname<-c("GDP (constant 2010 US$)","Foreign direct investment, net inflows (% of GDP)")
mydata<-NULL
for (ind in indname){
d<-read_csv("WDIData.csv", col_names=T) %>% filter(`Indicator Name`==ind) %>% select(-`Indicator Code`,-`Country Code`,-`2020`,-X66) %>% slice(-(1:47)) %>% pivot_longer(`1960`:`2019`,names_to="year", values_to="Indicator")
mydata<-rbind(mydata,d)
}
mydata<-mydata %>% pivot_wider(names_from="Indicator Name", values_from="Indicator")
9 / 14

Automate Indicator Names

d<-read_csv("WDIData.csv", col_names=T)
indname<-unique(d$`Indicator Name`)
mydata<-NULL
for (ind in indname[1:5]){
d<-read_csv("WDIData.csv", col_names=T) %>% filter(`Indicator Name`==ind) %>% select(-`Indicator Code`,-`Country Code`,-`2020`,-X66) %>% slice(-(1:47)) %>% pivot_longer(`1960`:`2019`,names_to="year", values_to="Indicator")
mydata<-rbind(mydata,d)
}
mydata<-mydata %>% pivot_wider(names_from="Indicator Name", values_from="Indicator")
10 / 14

A Word of Caution

  • Loops are not always faster (above example)

  • R built-in loop functions, such as apply are generally faster, but (beyond the simplest cases) require more advanced programming.

11 / 14

Loops Are Invaluable

  • Long repetitive scripts

  • Working with network data

12 / 14

Your Turn 1

  • Convert the following repeated code into a loop:
library(classdata)
data("terr_attacks.wide")
a<-mean(terr_attacks.wide[,5],na.rm=T)
b<-mean(terr_attacks.wide[,6],na.rm=T)
d<-mean(terr_attacks.wide[,7],na.rm=T)
e<-mean(terr_attacks.wide[,8],na.rm=T)
f<-mean(terr_attacks.wide[,9],na.rm=T)
g<-mean(terr_attacks.wide[,10],na.rm=T)
h<-mean(terr_attacks.wide[,11],na.rm=T)
i<-mean(terr_attacks.wide[,12],na.rm=T)
j<-mean(terr_attacks.wide[,13],na.rm=T)
k<-mean(terr_attacks.wide[,14],na.rm=T)
l<-mean(terr_attacks.wide[,15],na.rm=T)
m<-mean(terr_attacks.wide[,16],na.rm=T)
mymeans<-c(a,b,d,e,f,g,h,i,j,k,l,m)
  • Now get the means of these variables using summarise
  • Which one is easier/faster?
13 / 14

Your Turn 2

Convert the following repeated code into a loop:

library(classdata)
data("terr_attacks.wide")
a<-mean(terr_attacks.wide$GDPpc,na.rm=T)
b<-mean(terr_attacks.wide$population,na.rm=T)
d<-mean(terr_attacks.wide$tradeofgdp,na.rm=T)
e<-mean(terr_attacks.wide$`Hostage Taking (Kidnapping)`,na.rm=T)
f<-mean(terr_attacks.wide$Hijacking,na.rm=T)
mymeans<-c(a,b,d,e,f)
14 / 14

Writing Reproducible Code

  1. Why is it important to write easy-to-follow code?

  2. What are some ways to enhance the readability of your code?

More Reading:

2 / 14
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow