POL 478H1 F

POL 478H1 FReshapingOlga Chyzh [www.olgachyzh.com]1 / 12

Reshaping Datatidyr: pivot_longer and pivot_wider (former gather and spread)
2 / 12

Multiple ways to store the same information

Option #1

##                    province X2017 X2010 X2000
## 2                   Alberta  81.5  81.3  79.4
## 3          British Columbia  82.2  82.1  80.3
## 4                  Manitoba  80.0  79.9  78.0
## 5             New Brunswick  80.7  80.7  78.9
## 6 Newfoundland and Labrador  79.8  79.6  77.3
## 7     Northwest Territories  77.0  77.8  75.5

Option #2

##           province year life_exp
## 1          Alberta 2017     81.5
## 2          Alberta 2010     81.3
## 3          Alberta 2000     79.4
## 4          Alberta 1990     77.8
## 5          Alberta 1980     75.0
## 6 British Columbia 2017     82.2

Source: Wikipedia

3 / 12

Example: Canada Life Expectancy Data

Copy the data on Canada's life expectancy available here into a text doc.
Since this dataset is stored in txt format, you can load it using read.table()

canada_le<-read.table("canada_le_long.txt", header=T)

Or, you can skip the copy/paste:

myurl<-"https://raw.githubusercontent.com/ochyzh/POLS478/master/public/materials/canada_le_long.txt"
canada_le<-read.table(myurl, header=T)

4 / 12

Reshape from long to wide

##           province year life_exp
## 1          Alberta 2017     81.5
## 2          Alberta 2010     81.3
## 3          Alberta 2000     79.4
## 4          Alberta 1990     77.8
## 5          Alberta 1980     75.0
## 6 British Columbia 2017     82.2

canada_le %>% pivot_wider(names_from=year, values_from=life_exp)

Or:

canada_le %>% spread(key=year, value=life_exp)

##                     province 1980 1990 2000 2010 2017
## 1                    Alberta 75.0 77.8 79.4 81.3 81.5
## 2           British Columbia 76.0 78.1 80.3 82.1 82.2
## 3                   Manitoba 75.3 77.4 78.0 79.9 80.0
## 4              New Brunswick 74.5 77.3 78.9 80.7 80.7
## 5  Newfoundland and Labrador 74.8 76.1 77.3 79.6 79.8
## 6      Northwest Territories 68.7 72.4 75.5 77.8 77.0
## 7                Nova Scotia 74.6 76.7 78.5 80.3 80.5
## 8                    Nunavut 62.4 65.5 67.5 70.4 71.6
## 9                    Ontario 75.4 77.8 79.4 81.7 82.4
## 10      Prince Edward Island 76.2 76.9 78.4 81.3 81.8
## 11                    Quebec 74.4 77.0 79.0 81.4 82.6
## 12              Saskatchewan 75.9 77.9 78.5 79.6 80.3
## 13                     Yukon 69.8 74.4 75.0 77.3 77.8

canada_le.wide<-canada_le %>% spread(key=year, value=life_exp)

5 / 12

Reshape from wide to long

##                    province 1980 1990 2000 2010 2017
## 1                   Alberta 75.0 77.8 79.4 81.3 81.5
## 2          British Columbia 76.0 78.1 80.3 82.1 82.2
## 3                  Manitoba 75.3 77.4 78.0 79.9 80.0
## 4             New Brunswick 74.5 77.3 78.9 80.7 80.7
## 5 Newfoundland and Labrador 74.8 76.1 77.3 79.6 79.8
## 6     Northwest Territories 68.7 72.4 75.5 77.8 77.0

canada_le.wide %>% pivot_longer(cols=c(2:6),names_to="year",values_to="life_exp")

Or:

canada_le.wide %>% gather(key="year",value="life_exp",-province)

##                     province year life_exp
## 1                    Alberta 1980     75.0
## 2           British Columbia 1980     76.0
## 3                   Manitoba 1980     75.3
## 4              New Brunswick 1980     74.5
## 5  Newfoundland and Labrador 1980     74.8
## 6      Northwest Territories 1980     68.7
## 7                Nova Scotia 1980     74.6
## 8                    Nunavut 1980     62.4
## 9                    Ontario 1980     75.4
## 10      Prince Edward Island 1980     76.2
## 11                    Quebec 1980     74.4
## 12              Saskatchewan 1980     75.9
## 13                     Yukon 1980     69.8
## 14                   Alberta 1990     77.8
## 15          British Columbia 1990     78.1
## 16                  Manitoba 1990     77.4
## 17             New Brunswick 1990     77.3
## 18 Newfoundland and Labrador 1990     76.1
## 19     Northwest Territories 1990     72.4
## 20               Nova Scotia 1990     76.7
## 21                   Nunavut 1990     65.5
## 22                   Ontario 1990     77.8
## 23      Prince Edward Island 1990     76.9
## 24                    Quebec 1990     77.0
## 25              Saskatchewan 1990     77.9
## 26                     Yukon 1990     74.4
## 27                   Alberta 2000     79.4
## 28          British Columbia 2000     80.3
## 29                  Manitoba 2000     78.0
## 30             New Brunswick 2000     78.9
## 31 Newfoundland and Labrador 2000     77.3
## 32     Northwest Territories 2000     75.5
## 33               Nova Scotia 2000     78.5
## 34                   Nunavut 2000     67.5
## 35                   Ontario 2000     79.4
## 36      Prince Edward Island 2000     78.4
## 37                    Quebec 2000     79.0
## 38              Saskatchewan 2000     78.5
## 39                     Yukon 2000     75.0
## 40                   Alberta 2010     81.3
## 41          British Columbia 2010     82.1
## 42                  Manitoba 2010     79.9
## 43             New Brunswick 2010     80.7
## 44 Newfoundland and Labrador 2010     79.6
## 45     Northwest Territories 2010     77.8
## 46               Nova Scotia 2010     80.3
## 47                   Nunavut 2010     70.4
## 48                   Ontario 2010     81.7
## 49      Prince Edward Island 2010     81.3
## 50                    Quebec 2010     81.4
## 51              Saskatchewan 2010     79.6
## 52                     Yukon 2010     77.3
## 53                   Alberta 2017     81.5
## 54          British Columbia 2017     82.2
## 55                  Manitoba 2017     80.0
## 56             New Brunswick 2017     80.7
## 57 Newfoundland and Labrador 2017     79.8
## 58     Northwest Territories 2017     77.0
## 59               Nova Scotia 2017     80.5
## 60                   Nunavut 2017     71.6
## 61                   Ontario 2017     82.4
## 62      Prince Edward Island 2017     81.8
## 63                    Quebec 2017     82.6
## 64              Saskatchewan 2017     80.3
## 65                     Yukon 2017     77.8

6 / 12

Your Turn

Use pivot_wider or spread to reshape terr_attacks data into a wide format.
Use pivot_longer or gather to reshape terr_attacks.wide into a long format.
Is terr_attacks.wide the only way to reshape terr_attacks into a wide form? What are some other ways? Try them.

7 / 12

Tips for Reshaping Wide (Advanced)

Identify the reason for repeating observations (e.g., repeating countries over multiple years, repeating country-years for each attack type).
Determine what variables vary and what variables stay the same. For example, in the terr_data dataset, variables country, ccode, and cabb do not vary by year (for the same country), but variables num_attacks,GDPpc, population, tradeofgdp, and polity do. So if you wanted to spread values across columns so that each year is in its own column, you would have to spread values of not just num_attacks, but all those other time-varying variables as well.
Decide what variable you would like to spread across multiple columns (e.g., num_attacks so that each year is in its own column )---then the variable you are spreading becomes your values_from argument, and the variable that describes these new columns is your names_from argument. If more than one variable varies by year, in this example, you must specify all of the varying variables as values_from (e.g., values_from=c(num_attacks,population,GDPpc,tradeofgdp,polity2)) or remove them from the data (select(-population,-GDPpc,-tradeofgdp,-polity2)). If you do this incorrectly, there will be a lot of missing values in the new dataset.

8 / 12

Tips for Reshaping Long

Determine the reason for repeating columns (e.g., each attack type is in a separate column).
Specify the repeating columns as your cols argument, your names_to argument is just the name you want to give to the variable that will contain the names of the repeating columns, and your values_to argument is the name you want to give to the variable that will contain all the values from these columns.
Try your code without saving the output into a new object (view it on the screen only) until it works.

9 / 12

Messy Data 1: World Development Indicators

Download the World Development Indicators Data directly from the World Bank. Click on CSV download under Bulk Downloads. The file is quite large, do give it some time. The file you need is WDIdata.csv (feel free to delete the rest).
As you can see if you open it, this file is very large and unwieldy, yet it contains information on key variables used in Social Sciences.
We are going to clean up these data in the following ways:

Extract just the data on GDP (constant 2010 USD). Remove all the unnecessary rows and columns.
Convert these data from wide to long form.
Add a variable ccode that will allow us to merge these data with other datasets.

10 / 12

Your Turn

Take a minute to open the file and look at it. Can you complete steps 1 and 2?

11 / 12

Adding `ccode`

Package countrycode

library(countrycode)
d$ccode<-countrycode(d$`Country Name`,'country.name','cown')

Your Turn

Edit the code we used for the previous exercise, so as to extract GDP and Foreign direct investment, net inflows (% of GDP).
Use pivot_wider and/or pivot_longer functions to transform these data into long format (i.e., country-years as rows; country names, country codes, year, GDP and FDI as columns)

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

POL 478H1 F

Reshaping

Olga Chyzh [www.olgachyzh.com]

Reshaping Data

Multiple ways to store the same information

Example: Canada Life Expectancy Data

Reshape from long to wide

Reshape from wide to long

Your Turn

Tips for Reshaping Wide (Advanced)

Tips for Reshaping Long

Messy Data 1: World Development Indicators

Your Turn

Adding ccode

Your Turn

Reshaping Data

Help

Adding `ccode`