An engineering department gave 10 awards in 2022 and 9 awards in 2021 to high-achieving students. “The prerequisite to receive the award is a grade of 6.0 [highest possible grade in Switzerland] for the thesis, an average final grade of at least 5.25 in the Master’s program as well as a written endorsement by the thesis supervisor” (https://mavt.ethz.ch/news-and-events/d-mavt-news/2022/06/eth-medal-and-willi-studer-prize-2022.html).

All awardees appeared to be male though the department has roughly 12% female students. **So what are the chances of this happening by chance**, i.e., with no bias involved and assuming that male and female enrolled students have on average equal true abilities?

**About 7%**. Here is the simulation in R:

```
## number of runs through the loop
runs <- 10000
## number of men and women
# (choose numbers so that the proportions are accurate;
# although total class size also matters a bit:
# smaller means even less probable that no women get an award)
NM <- 88
NW <- 12
## total N
N <- NM + NW
## number of awards
nawards <- 19
## set up empty data frame for each student's merit value and indication of getting award
df <- as.data.frame(matrix(nrow = N, ncol = 3))
names(df) <- c("gender", "merit", "award")
## fill up the data frame, start with no awards
df$gender <- as.factor(c(rep("M", NM), rep("W", NW)))
df$award <- FALSE
## set up the sampling from a normal distribution of merit
## (assumptions: merit is quantifiable, normally distributed, same distribution for M and W)
meritmean <- 80 # set the overall mean merit at 80
meritsd <- 10 # standard deviation 10
meritmin <- 30 # assume lowest merit is 30
meritmax <- 100 # max merit is 100
## sample merit values for all students from the specified normal distribution
library(truncnorm)
df$merit <- rtruncnorm(N, mean = meritmean,
sd = meritsd,
a = meritmin,
b = meritmax)
## look at the sampled values, by gender
hist(df$merit)
par(mfrow=c(2,1))
hist(df$merit[df$gender=="M"]); hist(df$merit[df$gender=="W"])
par(mfrow=c(1,1))
plot(df$merit ~ df$gender)
## give award to those 19 students with the highest merit values
df$award[df$merit >= sort(df$merit, decreasing = T)[nawards]] <- TRUE
table(df$award, df$gender)
## set up an empty awards data frame to collect results of doing this many times
awards <- as.data.frame((matrix(nrow = runs, ncol = 2)))
names(awards) <- c("M", "W")
## Loop: run the merit sampling a bunch of times and save the data about who gets the award
for (i in 1:runs){
df$award <- FALSE
require(truncnorm)
df$merit <- rtruncnorm(N, mean = meritmean,
sd = meritsd,
a = meritmin,
b = meritmax)
df$award[df$merit >= sort(df$merit, decreasing = T)[nawards]] <- TRUE
awards[i, ] <- table(df$award, df$gender)[2,]
}
## Look at the outputs
hist(awards$M) # number of awards (out of 19) given to men
hist(awards$W) # number of awards (out of 19) given to women
table(awards$M)
table(awards$W)
## extract the relevant figure: number of times 0 women got an award
P <- table(awards$W)[1]/runs * 100
paste0("0 woman awardees: ",
P,
"%")
```