<- 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/mask-use/mask-use-by-county.csv'
url
<- read_csv(url) df
We’re going to recreate the NY Times mask-use survey data using R and the leaflet open source interactive mapping package. We can start off by loading the data from the New York Times github repository found here
Now that we have the data loaded, lets have a look at the data to see what we’re working with
glimpse(df)
Rows: 3,142
Columns: 6
$ COUNTYFP <chr> "01001", "01003", "01005", "01007", "01009", "01011", "0101…
$ NEVER <dbl> 0.053, 0.083, 0.067, 0.020, 0.053, 0.031, 0.102, 0.152, 0.1…
$ RARELY <dbl> 0.074, 0.059, 0.121, 0.034, 0.114, 0.040, 0.053, 0.108, 0.0…
$ SOMETIMES <dbl> 0.134, 0.098, 0.120, 0.096, 0.180, 0.144, 0.257, 0.130, 0.1…
$ FREQUENTLY <dbl> 0.295, 0.323, 0.201, 0.278, 0.194, 0.286, 0.137, 0.167, 0.1…
$ ALWAYS <dbl> 0.444, 0.436, 0.491, 0.572, 0.459, 0.500, 0.451, 0.442, 0.5…
df
# A tibble: 3,142 × 6
COUNTYFP NEVER RARELY SOMETIMES FREQUENTLY ALWAYS
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 01001 0.053 0.074 0.134 0.295 0.444
2 01003 0.083 0.059 0.098 0.323 0.436
3 01005 0.067 0.121 0.12 0.201 0.491
4 01007 0.02 0.034 0.096 0.278 0.572
5 01009 0.053 0.114 0.18 0.194 0.459
6 01011 0.031 0.04 0.144 0.286 0.5
7 01013 0.102 0.053 0.257 0.137 0.451
8 01015 0.152 0.108 0.13 0.167 0.442
9 01017 0.117 0.037 0.15 0.136 0.56
10 01019 0.135 0.027 0.161 0.158 0.52
# ℹ 3,132 more rows
According to the repository, the definitions of the variables are as follows:
- COUNTYFP: The county FIPS code.
- NEVER: The estimated share of people in this county who would say never in response to the question “How often do you wear a mask in public when you expect to be within six feet of another person?”
- RARELY: The estimated share of people in this county who would say rarely
- SOMETIMES: The estimated share of people in this county who would say sometimes
- FREQUENTLY: The estimated share of people in this county who would say frequently
- ALWAYS: The estimated share of people in this county who would say always
They are also plotting the probability of encountering a mask usage among 5 random encounters in the county.
The chance all five people are wearing masks in five random encounters is calculated by assuming that survey respondents who answered ‘Always’ were wearing masks all of the time, those who answered ‘Frequently’ were wearing masks 80 percent of the time, those who answered ‘Sometimes’ were wearing masks 50 percent of the time, those who answered ‘Rarely’ were wearing masks 20 percent of the time and those who answered ‘Never’ were wearing masks none of the time.
We can calculate this simply by using the supplied weights (1, .8, .5, .2, and 0) among ALWAYS, FREQUENTLY, SOMETIMES, RARELY, and NEVER mask usage, and taking the sum of the proportion of mask usage among all 5 different types of individuals that have equal probability of encountering.
<- df %>%
df mutate(
prob = ((ALWAYS * 1) + (FREQUENTLY * .8) + (SOMETIMES * .5) + (RARELY * .2) + (NEVER * 0))
)
Since we have the county FIPS code data available, we’ll need to merge this data with county geojson data for the United States which I was able to obtain from here
<- rgdal::readOGR('https://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_050_00_5m.json') counties
OGR data source with driver: GeoJSON
Source: "https://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_050_00_5m.json", layer: "gz_2010_us_050_00_5m"
with 3221 features
It has 6 fields
After reading in the US counties data, we can merge the mask usage survey data with the geojson file, by the state and FIPS code. We can create a COUNTYFP variable by pasting together the STATE and COUNTY code
@data <- counties@data %>%
countiesmutate(
COUNTYFP = paste0(STATE, COUNTY)
%>%
) left_join(
df )
Furthermore after merging the data, we can create a label by merging together the % mask usage data into a HTML string
@data <- counties@data %>%
countiesmutate(
label = glue::glue(
'<b>{NAME}</b><br>
{paste0(format(round(NEVER*100, 1), 1), "%")} estimated NEVER wear a mask <br>
{paste0(format(round(RARELY*100, 1), 1), "%")} estimated RARELY wear a mask <br>
{paste0(format(round(SOMETIMES*100, 1), 1), "%")} estimated SOMETIMES wear a mask <br>
{paste0(format(round(FREQUENTLY*100, 1), 1), "%")} estimated FREQUENTLY wear a mask <br>
{paste0(format(round(ALWAYS*100, 1), 1), "%")} estimated ALWAYS wear a mask <br><br>
This translates to a <b>{paste0(format(round(prob*100, 1), 1), "%")}</b> chance that everyone is masked in five random encounters'
),label = map(label, ~ htmltools::HTML(.x))
)
Finally, let’s put this all together and create a Chloropleth map using the leaflet package
<- colorNumeric('plasma', counties$prob)
color_pal
<- leaflet(counties) %>%
map addTiles() %>%
fitBounds(
lng1 = -131.519605,
lng2 = -64.312607,
lat1 = 50.623510,
lat2 = 23.415249
%>%
) addPolygons(
fillColor = ~ color_pal(prob),
fillOpacity = .75,
weight = 1,
color = 'white',
label = ~ label,
labelOptions = labelOptions(
style = list("font-weight" = "normal", padding = "3px 8px"),
textsize = "15px",
direction = "auto"
)%>%
) addLegend(
position = 'bottomright',
pal = color_pal,
values = ~ counties$prob,
title = '% Mask Usage',
labFormat = labelFormat(
suffix = '%',
transform = function(x)
* 100
x
)
)
map
Session info
sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] leaflet_2.1.2 lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0
[5] dplyr_1.1.1 purrr_1.0.1 readr_2.1.4 tidyr_1.3.0
[9] tibble_3.2.1 ggplot2_3.4.2 tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] tidyselect_1.2.0 xfun_0.38 lattice_0.20-45 colorspace_2.1-0
[5] vctrs_0.6.1 generics_0.1.3 viridisLite_0.4.1 htmltools_0.5.4
[9] yaml_2.3.6 utf8_1.2.3 rlang_1.1.0 pillar_1.9.0
[13] glue_1.6.2 withr_2.5.0 RColorBrewer_1.1-3 sp_1.6-0
[17] bit64_4.0.5 lifecycle_1.0.3 munsell_0.5.0 gtable_0.3.3
[21] htmlwidgets_1.6.2 evaluate_0.19 knitr_1.41 tzdb_0.3.0
[25] fastmap_1.1.0 crosstalk_1.2.0 parallel_4.2.2 curl_5.0.0
[29] fansi_1.0.4 renv_0.16.0 scales_1.2.1 vroom_1.6.1
[33] jsonlite_1.8.4 farver_2.1.1 bit_4.0.5 gridExtra_2.3
[37] hms_1.1.3 digest_0.6.31 stringi_1.7.8 grid_4.2.2
[41] rgdal_1.6-5 cli_3.6.1 tools_4.2.2 magrittr_2.0.3
[45] crayon_1.5.2 pkgconfig_2.0.3 ellipsis_0.3.2 timechange_0.2.0
[49] viridis_0.6.2 rmarkdown_2.19 rstudioapi_0.14 R6_2.5.1
[53] compiler_4.2.2
Reuse
Citation
@online{luu2020,
author = {Luu, Michael},
title = {Recreating the {New} {York} {Times} Mask Utilization Survey
Data with the {R} Opensource {Leaflet} Package},
date = {2020-10-04},
langid = {en}
}