tidyr: pivot_longer
and pivot_wider
(former gather
and spread
)Option #1
## province X2017 X2010 X2000## 2 Alberta 81.5 81.3 79.4## 3 British Columbia 82.2 82.1 80.3## 4 Manitoba 80.0 79.9 78.0## 5 New Brunswick 80.7 80.7 78.9## 6 Newfoundland and Labrador 79.8 79.6 77.3## 7 Northwest Territories 77.0 77.8 75.5
Option #2
## province year life_exp## 1 Alberta 2017 81.5## 2 Alberta 2010 81.3## 3 Alberta 2000 79.4## 4 Alberta 1990 77.8## 5 Alberta 1980 75.0## 6 British Columbia 2017 82.2
Source: Wikipedia
read.table()
canada_le<-read.table("canada_le_long.txt", header=T)
Or, you can skip the copy/paste:
myurl<-"https://raw.githubusercontent.com/ochyzh/POLS478/master/public/materials/canada_le_long.txt"canada_le<-read.table(myurl, header=T)
## province year life_exp## 1 Alberta 2017 81.5## 2 Alberta 2010 81.3## 3 Alberta 2000 79.4## 4 Alberta 1990 77.8## 5 Alberta 1980 75.0## 6 British Columbia 2017 82.2
canada_le %>% pivot_wider(names_from=year, values_from=life_exp)
Or:
canada_le %>% spread(key=year, value=life_exp)
## province 1980 1990 2000 2010 2017## 1 Alberta 75.0 77.8 79.4 81.3 81.5## 2 British Columbia 76.0 78.1 80.3 82.1 82.2## 3 Manitoba 75.3 77.4 78.0 79.9 80.0## 4 New Brunswick 74.5 77.3 78.9 80.7 80.7## 5 Newfoundland and Labrador 74.8 76.1 77.3 79.6 79.8## 6 Northwest Territories 68.7 72.4 75.5 77.8 77.0## 7 Nova Scotia 74.6 76.7 78.5 80.3 80.5## 8 Nunavut 62.4 65.5 67.5 70.4 71.6## 9 Ontario 75.4 77.8 79.4 81.7 82.4## 10 Prince Edward Island 76.2 76.9 78.4 81.3 81.8## 11 Quebec 74.4 77.0 79.0 81.4 82.6## 12 Saskatchewan 75.9 77.9 78.5 79.6 80.3## 13 Yukon 69.8 74.4 75.0 77.3 77.8
canada_le.wide<-canada_le %>% spread(key=year, value=life_exp)
## province 1980 1990 2000 2010 2017## 1 Alberta 75.0 77.8 79.4 81.3 81.5## 2 British Columbia 76.0 78.1 80.3 82.1 82.2## 3 Manitoba 75.3 77.4 78.0 79.9 80.0## 4 New Brunswick 74.5 77.3 78.9 80.7 80.7## 5 Newfoundland and Labrador 74.8 76.1 77.3 79.6 79.8## 6 Northwest Territories 68.7 72.4 75.5 77.8 77.0
canada_le.wide %>% pivot_longer(cols=c(2:6),names_to="year",values_to="life_exp")
Or:
canada_le.wide %>% gather(key="year",value="life_exp",-province)
## province year life_exp## 1 Alberta 1980 75.0## 2 British Columbia 1980 76.0## 3 Manitoba 1980 75.3## 4 New Brunswick 1980 74.5## 5 Newfoundland and Labrador 1980 74.8## 6 Northwest Territories 1980 68.7## 7 Nova Scotia 1980 74.6## 8 Nunavut 1980 62.4## 9 Ontario 1980 75.4## 10 Prince Edward Island 1980 76.2## 11 Quebec 1980 74.4## 12 Saskatchewan 1980 75.9## 13 Yukon 1980 69.8## 14 Alberta 1990 77.8## 15 British Columbia 1990 78.1## 16 Manitoba 1990 77.4## 17 New Brunswick 1990 77.3## 18 Newfoundland and Labrador 1990 76.1## 19 Northwest Territories 1990 72.4## 20 Nova Scotia 1990 76.7## 21 Nunavut 1990 65.5## 22 Ontario 1990 77.8## 23 Prince Edward Island 1990 76.9## 24 Quebec 1990 77.0## 25 Saskatchewan 1990 77.9## 26 Yukon 1990 74.4## 27 Alberta 2000 79.4## 28 British Columbia 2000 80.3## 29 Manitoba 2000 78.0## 30 New Brunswick 2000 78.9## 31 Newfoundland and Labrador 2000 77.3## 32 Northwest Territories 2000 75.5## 33 Nova Scotia 2000 78.5## 34 Nunavut 2000 67.5## 35 Ontario 2000 79.4## 36 Prince Edward Island 2000 78.4## 37 Quebec 2000 79.0## 38 Saskatchewan 2000 78.5## 39 Yukon 2000 75.0## 40 Alberta 2010 81.3## 41 British Columbia 2010 82.1## 42 Manitoba 2010 79.9## 43 New Brunswick 2010 80.7## 44 Newfoundland and Labrador 2010 79.6## 45 Northwest Territories 2010 77.8## 46 Nova Scotia 2010 80.3## 47 Nunavut 2010 70.4## 48 Ontario 2010 81.7## 49 Prince Edward Island 2010 81.3## 50 Quebec 2010 81.4## 51 Saskatchewan 2010 79.6## 52 Yukon 2010 77.3## 53 Alberta 2017 81.5## 54 British Columbia 2017 82.2## 55 Manitoba 2017 80.0## 56 New Brunswick 2017 80.7## 57 Newfoundland and Labrador 2017 79.8## 58 Northwest Territories 2017 77.0## 59 Nova Scotia 2017 80.5## 60 Nunavut 2017 71.6## 61 Ontario 2017 82.4## 62 Prince Edward Island 2017 81.8## 63 Quebec 2017 82.6## 64 Saskatchewan 2017 80.3## 65 Yukon 2017 77.8
Use pivot_wider
or spread
to reshape terr_attacks
data into a wide format.
Use pivot_longer
or gather
to reshape terr_attacks.wide
into a long format.
Is terr_attacks.wide
the only way to reshape terr_attacks
into a wide form? What are some other ways? Try them.
Identify the reason for repeating observations (e.g., repeating countries over multiple years, repeating country-years for each attack type).
Determine what variables vary and what variables stay the same. For example, in the terr_data
dataset, variables country
, ccode
, and cabb
do not vary by year
(for the same country), but variables num_attacks
,GDPpc
, population
, tradeofgdp
, and polity
do. So if you wanted to spread values across columns so that each year is in its own column, you would have to spread values of not just num_attacks
, but all those other time-varying variables as well.
Decide what variable you would like to spread across multiple columns (e.g., num_attacks
so that each year is in its own column )---then the variable you are spreading becomes your values_from
argument, and the variable that describes these new columns is your names_from
argument. If more than one variable varies by year, in this example, you must specify all of the varying variables as values_from
(e.g., values_from=c(num_attacks,population,GDPpc,tradeofgdp,polity2)
) or remove them from the data (select(-population,-GDPpc,-tradeofgdp,-polity2)
). If you do this incorrectly, there will be a lot of missing values in the new dataset.
Determine the reason for repeating columns (e.g., each attack type is in a separate column).
Specify the repeating columns as your cols
argument, your names_to
argument is just the name you want to give to the variable that will contain the names of the repeating columns, and your values_to
argument is the name you want to give to the variable that will contain all the values from these columns.
Try your code without saving the output into a new object (view it on the screen only) until it works.
Download the World Development Indicators Data directly from the World Bank. Click on CSV download
under Bulk Downloads.
The file is quite large, do give it some time. The file you need is WDIdata.csv
(feel free to delete the rest).
As you can see if you open it, this file is very large and unwieldy, yet it contains information on key variables used in Social Sciences.
We are going to clean up these data in the following ways:
ccode
that will allow us to merge these data with other datasets.Take a minute to open the file and look at it. Can you complete steps 1 and 2?
ccode
Package countrycode
library(countrycode)d$ccode<-countrycode(d$`Country Name`,'country.name','cown')
Foreign direct investment, net inflows (% of GDP)
.pivot_wider
and/or pivot_longer
functions to transform these data into long format (i.e., country-years as rows; country names, country codes, year, GDP and FDI as columns)tidyr: pivot_longer
and pivot_wider
(former gather
and spread
)Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |