KCRUG: Obtaining, wrangling, and mapping U.S. Census data using tidycensus

David W. Body

June 11, 2022

Introduction

  • Independent software developer and consultant (Big Creek Software, LLC)

  • Currently working primarily on a machine learning project with video data

  • Casual user of Census data – Not an expert!

  • Agreed to give this talk to give myself an incentive to finish reading Kyle Walker’s book.

Outline

Today we will cover

  • How US Census data is organized
  • Obtaining the data you want
  • Wrangling the data
  • Visualizing the data
    • Plots
    • Choropleth maps

Some relevant R packages

The first 4 are by Kyle Walker.

Analyzing US Census Data: Methods, Maps, and Models in R

by Kyle Walker

Available free online: https://walker-data.com/census-r/

Hard copy available for pre-order from CRC Press.

Census API key

To use tidycensus, you will need a Census API key, which you can get for free from http://api.census.gov/data/key_signup.html.


Run the following once to install your key in your .Renviron file so it will automatically be available for future sessions.

library(tidycensus)
census_api_key("YOUR KEY GOES HERE", install = TRUE)

How is US Census data organized?

US Census data is organized by

  • Datasets
  • Variables
  • Geographies

Today we will use just two datasets, but many more are available.

  • 2020 Decennial Census
  • 2016-2020 American Community Survey

Decennial Census

The 2020 Decennial Census is supposed to be an enumeration of the whole US population. It collects a limited number of variables on race, ethnicity, age, sex, and housing tenure.

Note

The limited data currently available was released under P.L. 94-171 for political redistricting purposes.

American Community Survey

  • ACS is sent to about 3.5 million households per year
  • Asks many more questions, including about income, education, language, and much more.
  • ACS data are estimates. The uncertainty in these estimates is expressed as margins of error for each variable.

Note

The ACS had a lower than usual response rate in 2020 due to the COVID-19 pandemic. As a result of the smaller sample size, the margins of error are larger for 2020 data.

Geographies and variables

Census datasets are broken down into

  • Geographies
  • Variables

Census Geographies

US Census Geographies, Source: US Census

Census variables

  • Different census datasets have different variables
  • 2020 Decennial Census currently has relatively few variables (~300)
    • More variables will be released in the future
    • Example: P1_001N = Total population
  • 2016-2020 ACS has around 28,000 variables
    • Example: B15002_016 = Number of males with a master’s degree in the population age 25 and over

Demo

Other Resources