Trump Votes vs. Vaccination Status by U.S. County
Background
During my time at LBS, our professor provided us with an article (The Racial Factor: There’s 77 Counties Which Are Deep Blue But Also Low-Vaxx. Guess What They Have In Common?) that compared the % of Trump votes in each US county with the respective % of population (by county) that is fully vaccinated against Covid-19. The graph was intended to generate some insights regarding the willingness of Trump voters towards being vaccinated against Covid-19. Furthermore, the graph looked at the population of each county, as larger counties usually have a bigger influence on elections, as well as on fighting Covid-19.
The graph generated some really interesting insights and I decided to reproduce the graph by using R coding.
Data Analysis
Before I was able to conduct any analysis, I had to collect data by using three different sources:
1. Vaccination by county: CDC data
2. Election Results: County Presidential Election Returns 2000-2020
3. Population estimate: Population by county
Loading Libraries
library(tidyverse)
library(ggthemes)
library(skimr)
library(janitor)
library(vroom)
Downloading data
# Download CDC vaccination by county
cdc_url <- "https://data.cdc.gov/api/views/8xkx-amqh/rows.csv?accessType=DOWNLOAD"
vaccinations <- vroom(cdc_url) %>%
janitor::clean_names() %>%
filter(fips != "UNK") # remove counties that have an unknown (UNK) FIPS code
# Download County Presidential Election Returns
# https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/VOQCHQ
election2020_results <- vroom(here::here("data/countypres_2000-2020.csv")) %>%
janitor::clean_names() %>%
# just keep the results for the 2020 election
filter(year == "2020") %>%
# change original name county_fips to fips, to be consistent with the other two files
rename (fips = county_fips)
# Download county population data
population_url <- "https://www.ers.usda.gov/webdocs/DataFiles/48747/PopulationEstimates.csv?v=2232"
population <- vroom(population_url) %>%
janitor::clean_names() %>%
# select the latest data, namely 2019
select(fips = fip_stxt, pop_estimate_2019) %>%
# pad FIPS codes with leading zeros, so they are always made up of 5 characters
mutate(fips = stringi::stri_pad_left(fips, width=5, pad = "0"))
The plan is to setup 3 separate dataframes for population, vaccination rate and election results and ultimately merging the dataframe by using the fips variable. Before joining the dataframes, I had to clean the data and make sure it has exactly the information I need for my analysis.
Clean and prepare election data
First, a new dataframe needs to be created to only include Donald Trump votes. Subsequently, the number of votes needs to be expressed as a percentage of the overall votes for each district/county.
# First we filter the dataset only for Donald Trump votes
election_trump <- election2020_results%>%
filter(candidate =="DONALD J TRUMP")
#Let's have a look at the dataset to scan for duplicates
skim(election_trump)
#By looking at the dataset, we can discover that there are only 3153 unique fips,
#even though the dataset hat more than 5000 rows. By looking at the mode,
#we can see that there are 16 different ways of voting,
#which is where the duplicate entries are coming from. We need to add them up.
election_unique <- aggregate(candidatevotes~fips,data=election_trump, FUN="sum")
#As we lost the total votes for each county by filtering for, we need to add them back again by using left_join().
election_unique_join <- unique(inner_join(election_unique,
election2020_results[c("fips", "totalvotes")], by="fips",
copy=FALSE))
#Calculate the percentage of people voting for Trump
election_percentage<- election_unique_join%>%
mutate(
vote_percentage=candidatevotes/totalvotes
)
#Slice dataframe to only include Fips and the vote percentage per county
election_trump_filtered <- election_percentage %>%
select(fips, vote_percentage)
#Quick look at the finished dataframe
skim(election_trump_filtered)
Clean and prepare vaccination data
A new dataframe is created that only contains the county and percentage of population being fully vaccinated on September 2 (last available date).
#First look at the dataset shows us that there are counties with no
#vaccinated people at all, which is strange.
skim(vaccinations)
#Let's find out where those zeros are coming from
vaccinations %>%
filter(series_complete_pop_pct==0) %>%
count(recip_state, sort=TRUE)
#Most of the zeros seem to be coming from Texas, so let's have a closer look at
#the state Texas
vaccinations %>%
filter(recip_state=="TX")
#The rows for Texas are filled with zeros, implying no data is being recorded for
#Texas. By looking at the CDC database online, it can be seen that Texas provides
#aggregate dose count on a national level instead of individual doses.
#Therefore, no data is available for Texas. As other counties also had no data
#available, we'll simply take out all counties with a vaccination rate of 0%.
#Filter for the last available date and take out rows were no data was recorded.
vaccinations_recent <- vaccinations %>%
filter(date=="09/02/2021", series_complete_pop_pct!=0)
#Check for values that don't seem reasonable where vaccination percentage is
#comparatively low
vaccinations_below10 <- vaccinations_recent %>%
filter(series_complete_pop_pct <=10)
#By looking at the skim function below, we can see that 102 states have less than or
#equal to 10% of the population vaccinated, which is quite low. But those values
#are still valid and need to be included.
skim(vaccinations_below10)
#Select only fips and vaccination percentage, as we only need those to construct
#the dataframe that will ultimately build the graph
vaccinations_fips <- vaccinations_recent %>%
select(fips, series_complete_pop_pct)
#Convert vaccination percentage in numerical percentage to adjust it to the other
#dataframe with the election results
vaccinations_fips$series_complete_pop_pct <-
vaccinations_fips$series_complete_pop_pct/100
#Quick look at the final dataframe
skim(vaccinations_fips)
Merging the 3 dataframes (tibbles) into 1
As all dataframes have been prepared to include the necessary data, they can now be joined by using inner_join().
#Merge vaccination and trump votes
vaccination_vote <- inner_join(vaccinations_fips, election_trump_filtered,
by="fips", copy=FALSE)
#Add population numbers
vacc_vote_pop <- inner_join(vaccination_vote, population, by="fips", copy=FALSE)
#Omit all rows with any NA values
vacc_vote_pop <- na.omit(vacc_vote_pop)
Draw the graph
ggplot(vacc_vote_pop, aes(x=vote_percentage, y=series_complete_pop_pct,
size=pop_estimate_2019))+
geom_point(alpha=0.3)+
geom_smooth(method = "lm")+
theme_bw()+
#Add 5 horizontal lines
geom_hline(yintercept=0.539, linetype="dashed",
color = "black", size=1)+
geom_hline(yintercept=0.85, linetype="dashed",
color = "black", size=1)+
geom_hline(yintercept=0.508, linetype="dashed",
color = "black", size=1)+
geom_hline(yintercept=0.4, size=1)+
geom_hline(yintercept=0.6,size=1)+
#Add 2 vertical lines
geom_vline(xintercept=0.4,size=1)+
geom_vline(xintercept=0.6,size=1)+
#Put labels according to graph pictured
annotate("text", x=0.08, y=0.56, label="TARGET: 53.9%", color = "blue", size=3)+
annotate("text", x=0.13, y=0.87, label="Herd immunity threshold (?)",
color = "blue", size=3)+
annotate("text", x=0.08 , y=0.52, label="ACTUAL: 50.8%", color = "blue", size=3)+
annotate("text", x=0.5, y=0.95, label="EVERY U.S. COUNTY", color="black", size=8)+
theme(panel.grid.major = element_line(colour = "grey80"), legend.position = "none", plot.title=element_text(hjust=0.5))+
labs(
title="COVID-19 VACCINATION LEVELS OUT OF POPULATION BY COUNTY",
x="2020 Trump Vote %",
y="% of Total Population Vaccinated"
,size=7)+
#Change scale to comply with picture
scale_y_continuous(breaks=seq(0,1,0.05),labels=function(x) paste0(x*100, "%"),
expand=c(0,0), limits=c(0,1))+
scale_x_continuous(breaks=seq(0,1,0.05),labels = function(x) paste0(x*100, "%"),
expand=c(0,0), limits=c(0,1))

Summary of findings
The graph shows a negative correlation between Trump votes and the vaccination status. More precise, in counties with more trump voters, a comparably lower proportion of the population is fully vaccinated against Covid-19. Hence, Trump voters seem to be rather reluctant to Covid-19 vaccinations.
Furthermore, we can see that only very few counties are above the herd immunity threshold, while many counties still have less than 5% of the population vaccinated.
The points are proportional in size to the county population, therefore, larger points display larger counties. Looking at the graph, we see a lot of large counties are clustered around 50% vaccination status and therefore often above the target of 53.9%. Also, most of the large counties were rather against Trumpm with often less than 40% of the population voting for Trump.