Recreating the New York Times mask utilization survey data with the R opensource Leaflet package

Recreating the New York Times mask utilization survey data with the R opensource Leaflet package
visualization
Author

Michael Luu

Published

October 4, 2020

We’re going to recreate the NY Times mask-use survey data using R and the leaflet open source interactive mapping package. We can start off by loading the data from the New York Times github repository found here

url <- 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/mask-use/mask-use-by-county.csv'

df <- read_csv(url)

Now that we have the data loaded, lets have a look at the data to see what we’re working with

glimpse(df)
Rows: 3,142
Columns: 6
$ COUNTYFP   <chr> "01001", "01003", "01005", "01007", "01009", "01011", "0101…
$ NEVER      <dbl> 0.053, 0.083, 0.067, 0.020, 0.053, 0.031, 0.102, 0.152, 0.1…
$ RARELY     <dbl> 0.074, 0.059, 0.121, 0.034, 0.114, 0.040, 0.053, 0.108, 0.0…
$ SOMETIMES  <dbl> 0.134, 0.098, 0.120, 0.096, 0.180, 0.144, 0.257, 0.130, 0.1…
$ FREQUENTLY <dbl> 0.295, 0.323, 0.201, 0.278, 0.194, 0.286, 0.137, 0.167, 0.1…
$ ALWAYS     <dbl> 0.444, 0.436, 0.491, 0.572, 0.459, 0.500, 0.451, 0.442, 0.5…
df
# A tibble: 3,142 × 6
   COUNTYFP NEVER RARELY SOMETIMES FREQUENTLY ALWAYS
   <chr>    <dbl>  <dbl>     <dbl>      <dbl>  <dbl>
 1 01001    0.053  0.074     0.134      0.295  0.444
 2 01003    0.083  0.059     0.098      0.323  0.436
 3 01005    0.067  0.121     0.12       0.201  0.491
 4 01007    0.02   0.034     0.096      0.278  0.572
 5 01009    0.053  0.114     0.18       0.194  0.459
 6 01011    0.031  0.04      0.144      0.286  0.5  
 7 01013    0.102  0.053     0.257      0.137  0.451
 8 01015    0.152  0.108     0.13       0.167  0.442
 9 01017    0.117  0.037     0.15       0.136  0.56 
10 01019    0.135  0.027     0.161      0.158  0.52 
# ℹ 3,132 more rows

According to the repository, the definitions of the variables are as follows:

They are also plotting the probability of encountering a mask usage among 5 random encounters in the county.

The chance all five people are wearing masks in five random encounters is calculated by assuming that survey respondents who answered ‘Always’ were wearing masks all of the time, those who answered ‘Frequently’ were wearing masks 80 percent of the time, those who answered ‘Sometimes’ were wearing masks 50 percent of the time, those who answered ‘Rarely’ were wearing masks 20 percent of the time and those who answered ‘Never’ were wearing masks none of the time.

We can calculate this simply by using the supplied weights (1, .8, .5, .2, and 0) among ALWAYS, FREQUENTLY, SOMETIMES, RARELY, and NEVER mask usage, and taking the sum of the proportion of mask usage among all 5 different types of individuals that have equal probability of encountering.

df <- df %>%
  mutate(
    prob = ((ALWAYS * 1) + (FREQUENTLY * .8) + (SOMETIMES * .5) + (RARELY * .2) + (NEVER * 0)) 
  )

Since we have the county FIPS code data available, we’ll need to merge this data with county geojson data for the United States which I was able to obtain from here

counties <- rgdal::readOGR('https://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_050_00_5m.json')
OGR data source with driver: GeoJSON 
Source: "https://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_050_00_5m.json", layer: "gz_2010_us_050_00_5m"
with 3221 features
It has 6 fields

After reading in the US counties data, we can merge the mask usage survey data with the geojson file, by the state and FIPS code. We can create a COUNTYFP variable by pasting together the STATE and COUNTY code

counties@data <- counties@data %>%
  mutate(
    COUNTYFP = paste0(STATE, COUNTY)
  ) %>%
  left_join(
    df
  )

Furthermore after merging the data, we can create a label by merging together the % mask usage data into a HTML string

counties@data <- counties@data %>%
  mutate(
    label = glue::glue(
      '<b>{NAME}</b><br>
      {paste0(format(round(NEVER*100, 1), 1), "%")} estimated NEVER wear a mask <br>
      {paste0(format(round(RARELY*100, 1), 1), "%")} estimated RARELY wear a mask <br>
      {paste0(format(round(SOMETIMES*100, 1), 1), "%")} estimated SOMETIMES wear a mask <br>
      {paste0(format(round(FREQUENTLY*100, 1), 1), "%")} estimated FREQUENTLY wear a mask <br>
      {paste0(format(round(ALWAYS*100, 1), 1), "%")} estimated ALWAYS wear a mask <br><br>
      This translates to a <b>{paste0(format(round(prob*100, 1), 1), "%")}</b> chance that everyone is masked in five random encounters'
    ),
    label = map(label, ~ htmltools::HTML(.x))
  )

Finally, let’s put this all together and create a Chloropleth map using the leaflet package

color_pal <- colorNumeric('plasma', counties$prob)

map <- leaflet(counties) %>%
  addTiles() %>%
  fitBounds(
    lng1 = -131.519605,
    lng2 = -64.312607,
    lat1 = 50.623510,
    lat2 = 23.415249
  ) %>%
  addPolygons(
    fillColor = ~ color_pal(prob),
    fillOpacity = .75,
    weight = 1,
    color = 'white',
    label = ~ label,
    labelOptions = labelOptions(
      style = list("font-weight" = "normal", padding = "3px 8px"),
      textsize = "15px",
      direction = "auto"
    )
  ) %>%
  addLegend(
    position = 'bottomright',
    pal = color_pal,
    values = ~ counties$prob,
    title = '% Mask Usage',
    labFormat = labelFormat(
      suffix = '%',
      transform = function(x)
        x * 100
    )
  )

map

Session info

sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] leaflet_2.1.2   lubridate_1.9.2 forcats_1.0.0   stringr_1.5.0  
 [5] dplyr_1.1.1     purrr_1.0.1     readr_2.1.4     tidyr_1.3.0    
 [9] tibble_3.2.1    ggplot2_3.4.2   tidyverse_2.0.0

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0   xfun_0.38          lattice_0.20-45    colorspace_2.1-0  
 [5] vctrs_0.6.1        generics_0.1.3     viridisLite_0.4.1  htmltools_0.5.4   
 [9] yaml_2.3.6         utf8_1.2.3         rlang_1.1.0        pillar_1.9.0      
[13] glue_1.6.2         withr_2.5.0        RColorBrewer_1.1-3 sp_1.6-0          
[17] bit64_4.0.5        lifecycle_1.0.3    munsell_0.5.0      gtable_0.3.3      
[21] htmlwidgets_1.6.2  evaluate_0.19      knitr_1.41         tzdb_0.3.0        
[25] fastmap_1.1.0      crosstalk_1.2.0    parallel_4.2.2     curl_5.0.0        
[29] fansi_1.0.4        renv_0.16.0        scales_1.2.1       vroom_1.6.1       
[33] jsonlite_1.8.4     farver_2.1.1       bit_4.0.5          gridExtra_2.3     
[37] hms_1.1.3          digest_0.6.31      stringi_1.7.8      grid_4.2.2        
[41] rgdal_1.6-5        cli_3.6.1          tools_4.2.2        magrittr_2.0.3    
[45] crayon_1.5.2       pkgconfig_2.0.3    ellipsis_0.3.2     timechange_0.2.0  
[49] viridis_0.6.2      rmarkdown_2.19     rstudioapi_0.14    R6_2.5.1          
[53] compiler_4.2.2    

Reuse

Citation

BibTeX citation:
@online{luu2020,
  author = {Luu, Michael},
  title = {Recreating the {New} {York} {Times} Mask Utilization Survey
    Data with the {R} Opensource {Leaflet} Package},
  date = {2020-10-04},
  langid = {en}
}
For attribution, please cite this work as:
Luu, Michael. 2020. “Recreating the New York Times Mask Utilization Survey Data with the R Opensource Leaflet Package.” October 4, 2020.