class: center, middle, inverse, title-slide # POL 478H1 F ## Intro to Data Management ### Olga Chyzh [www.olgachyzh.com] --- ## Outline - Logical operators - Subseting the data: `filter` and `subset` - The pipe operator - Other `dplyr` functions --- ## Logical Operators - `<`, `>`, `==`, `!=`, `<=`, `>=` - `x %in% c("a","b","c")` - Combine logical expressions using `&` and `|` as *and* and *or* - `!` stands for negation - Use with `filter` and `subset` --- ## `filter {dplyr}` - `filter(data, ...)` finds the subset of `data` that meets the conditions specified in ... ```r filter(terr_attacks.wide, country=="Canada") filter(terr_attacks.wide, year==2014) filter(terr_attacks.wide, year==2014,country=="Canada") filter(terr_attacks.wide, year==2014 | country=="Canada") ``` - Multiple expressions are implicitly combined with a logical & - Can also use `subset {base}` ```r subset(terr_attacks.wide, country=="Canada") subset(terr_attacks.wide, year==2014) subset(terr_attacks.wide, year==2014 & country=="Canada") subset(terr_attacks.wide, year==2014 | country=="Canada") ``` --- ## The Pipe Operator `%>%` Allows to apply consecutive functions to an object - read `%>%` as "then do" ```r library(classdata) library(ggplot2) library(tidyverse) library(magrittr) data(terr_attacks.wide) ``` ```r ggplot(data=filter(terr_attacks.wide, country=="Canada"), aes(x=year,y=`Bombing/Explosion`))+geom_point() ``` becomes: ```r terr_attacks.wide %>% filter(country=="Canada") %>% ggplot(aes(x=year,y=`Bombing/Explosion`))+geom_point() ``` --- ## Your Turn - Use a pipe operator to first subset `terr_attacks` to just Hijackings or Kidnappings, then aggregate the number of attacks to the highest number of attacks by type in each year, then make a bar graph of each attack type (y-axis) by year (x-axis) and use fill to denote the attack type. --- ## `mutate {dplyr}` `mutate` is a function from the `{dplyr}` package It allows us to generate/recode variables in a dataset Unlike `summarise`, applies functions within rows rather than columns. Can be used in pipes: ```r terr_attacks.wide %>% mutate("Hostage Taking"=`Hostage Taking (Barricade Incident)`+`Hostage Taking (Kidnapping)`) ->terr_attacks.wide ``` --- ## Your Turn Use a pipe operator and `mutate` to add the following variables to the `terr_attacks.wide` dataset: 1. `num_attks` which contains the total number of attacks (all types); 2. `pop_thou` which contain the total population in thousands; 3. `GDP` (can you figure out how to calculate this with the variables we have?) --- ## Other `dplyr` verbs - `arrange`--reorder (sort) the rows by a column - `select`--select specific columns/variables of a data frame (as opposed to `filter` that selects rows) ```r data("terr_attacks") terr_attacks %>% filter(country=="Canada") %>% select(year,type,num_attacks) %>% arrange(-num_attacks) ```