Why Men Quit and Women Don’t. The tweet caught my eye.
Boston Marathon 2018: For men, the dropout rate was up almost 80 percent from 2017; for women, it was up only about 12 percent. https://t.co/dZyUUBS3VX
— Tracy Chou (@triketora) April 21, 2018
I wanted to see for myself the percentage of women and men that finished the Boston Marathon. I found the 2018 data here: http://registration.baa.org/2018/cf/Public/iframe_Statistics.htm. I replaced the year in the URL to get earlier years back to 2010. Then I scraped the data using R and the package rvest.
library(tidyverse)
library(rvest)
library(binom)
library(DT)
prefix <- "http://registration.baa.org/"
suffix <- "/cf/Public/iframe_Statistics.htm"
Year <- 2010:2018
# create empty list for results
results <- setNames(vector("list", length(Year)), Year)
for(i in 1:9) {
# the 2018 data were in different rows than the other years
getrows <- if(i<9) 4:5 else 6:7
df <- read_html(paste0(prefix, Year[i], suffix)) %>%
html_nodes("table") %>%
# grab the 4th table
.[4] %>%
html_table(fill=TRUE) %>%
# extract the first element from the list
.[[1]] %>%
# extract given rows and columns from the data frame
.[getrows, c(1, 3:4)]
# assign names (again, to deal with the different 2018 data)
names(df) <- c("Gender", "Starters", "Finishers")
results[[i]] <- df
}
# convert list to data frame
race <- bind_rows(results, .id="Year") %>%
# convert numbered columns to numeric
type_convert() %>%
# convert Gender to a factor
mutate(
Gender = factor(Gender, level=c("female", "male"),
label=c("Female", "Male"))
)
I used the binom.confint()
function from the binom package to calculate exact binomial confidence intervals around the percentage of starters that finished the race. Then I plotted the percentage of finishers, by gender, over time. (I did not include the data from 2013, when the race was cut short by the bombing.)
prop <- with(race, binom.confint(Finishers, Starters, method="exact"))
raceprop <- bind_cols(race, 100*prop[, 4:6]) %>%
rename(
PctFinishers=mean,
PctFinLo=lower,
PctFinHi=upper
) %>%
mutate(
# set 2013 finishers to missing, marathon was cut short by bombing
Finishers = ifelse(Year==2013, NA, Finishers),
PctFinishers = ifelse(Year==2013, NA, PctFinishers),
PctFinLo = ifelse(Year==2013, NA, PctFinLo),
PctFinHi = ifelse(Year==2013, NA, PctFinHi)
)
ggplot(raceprop, aes(Year, PctFinishers, color=Gender)) +
geom_errorbar(aes(ymin=PctFinLo, ymax=PctFinHi), width=0.2, alpha=0.5,
position=position_dodge(width=0.2)) +
geom_point() +
geom_line(size=1) +
labs(y="Finishers (% of starters)", title="Boston Marathon Finishers") +
theme_bw(base_size=14)
In most years, a higher percentage of males finish the race than females. The two exceptions were 2012, which was “an unusually hot 86-degree day” and 2018, which had “horizontal rain and freezing temperatures.” Interesting.
datatable(raceprop, options=list(pageLength=25)) %>%
formatCurrency(c("Starters", "Finishers"), currency="", digits=0) %>%
formatRound(c("PctFinishers", "PctFinLo", "PctFinHi"), 1)