# Bias in Merit Awards?

An engineering department gave 10 awards in 2022 and 9 awards in 2021 to high-achieving students. “The prerequisite to receive the award is a grade of 6.0 [highest possible grade in Switzerland] for the thesis, an average final grade of at least 5.25 in the Master’s program as well as a written endorsement by the thesis supervisor” (https://mavt.ethz.ch/news-and-events/d-mavt-news/2022/06/eth-medal-and-willi-studer-prize-2022.html).

All awardees appeared to be male though the department has roughly 12% female students. So what are the chances of this happening by chance, i.e., with no bias involved and assuming that male and female enrolled students have on average equal true abilities?

About 7%. Here is the simulation in R:

```## number of runs through the loop
runs <- 10000

## number of men and women
# (choose numbers so that the proportions are accurate;
# although total class size also matters a bit:
# smaller means even less probable that no women get an award)
NM <- 88
NW <- 12

## total N
N <- NM + NW

## number of awards
nawards <- 19

## set up empty data frame for each student's merit value and indication of getting award
df <- as.data.frame(matrix(nrow = N, ncol = 3))
names(df) <- c("gender", "merit", "award")

df\$gender <- as.factor(c(rep("M", NM), rep("W", NW)))
df\$award <- FALSE

## set up the sampling from a normal distribution of merit
## (assumptions: merit is quantifiable, normally distributed, same distribution for M and W)
meritmean <- 80 # set the overall mean merit at 80
meritsd <- 10 # standard deviation 10
meritmin <- 30 # assume lowest merit is 30
meritmax <- 100 # max merit is 100

## sample merit values for all students from the specified normal distribution
library(truncnorm)
df\$merit <- rtruncnorm(N, mean = meritmean,
sd = meritsd,
a = meritmin,
b = meritmax)

## look at the sampled values, by gender
hist(df\$merit)
par(mfrow=c(2,1))
hist(df\$merit[df\$gender=="M"]); hist(df\$merit[df\$gender=="W"])
par(mfrow=c(1,1))
plot(df\$merit ~ df\$gender)

## give award to those 19 students with the highest merit values
df\$award[df\$merit >= sort(df\$merit, decreasing = T)[nawards]] <- TRUE
table(df\$award, df\$gender)

## set up an empty awards data frame to collect results of doing this many times
awards <- as.data.frame((matrix(nrow = runs, ncol = 2)))
names(awards) <- c("M", "W")

## Loop: run the merit sampling a bunch of times and save the data about who gets the award
for (i in 1:runs){
df\$award <- FALSE
require(truncnorm)
df\$merit <- rtruncnorm(N, mean = meritmean,
sd = meritsd,
a = meritmin,
b = meritmax)
df\$award[df\$merit >= sort(df\$merit, decreasing = T)[nawards]] <- TRUE
awards[i, ] <- table(df\$award, df\$gender)[2,]
}

## Look at the outputs
hist(awards\$M) # number of awards (out of 19) given to men
hist(awards\$W) # number of awards (out of 19) given to women
table(awards\$M)
table(awards\$W)

## extract the relevant figure: number of times 0 women got an award
P <- table(awards\$W)[1]/runs * 100
paste0("0 woman awardees: ",
P,
"%")

```