# Bias in Merit Awards?

An engineering department gave 10 awards in 2022 and 9 awards in 2021 to high-achieving students. “The prerequisite to receive the award is a grade of 6.0 [highest possible grade in Switzerland] for the thesis, an average final grade of at least 5.25 in the Master’s program as well as a written endorsement by the thesis supervisor” (https://mavt.ethz.ch/news-and-events/d-mavt-news/2022/06/eth-medal-and-willi-studer-prize-2022.html). All awardees appeared to be male though the department has roughly 12% female students. So what are the chances of this happening by chance, i.e., with no bias involved and assuming that male and female enrolled students have equal abilities?

About 7%. Here is the simulation in R:

```## number of runs through the loop
runs <- 10000

## number of men and women
# (choose numbers so that the proportions are accurate;
# although total class size also matters a bit:
# smaller means even less probable that no women get an award)
NM <- 88
NW <- 12

## total N
N <- NM + NW

## number of awards
nawards <- 19

## set up empty data frame for each student's merit value and indication of getting award
df <- as.data.frame(matrix(nrow = N, ncol = 3))
names(df) <- c("gender", "merit", "award")

df\$gender <- as.factor(c(rep("M", NM), rep("W", NW)))
df\$award <- FALSE

## set up the sampling from a normal distribution of merit
## (assumptions: merit is quantifiable, normally distributed, same distribution for M and W)
meritmean <- 80 # set the overall mean merit at 80
meritsd <- 10 # standard deviation 10
meritmin <- 30 # assume lowest merit is 30
meritmax <- 100 # max merit is 100

## sample merit values for all students from the specified normal distribution
library(truncnorm)
df\$merit <- rtruncnorm(N, mean = meritmean,
sd = meritsd,
a = meritmin,
b = meritmax)

## look at the sampled values, by gender
hist(df\$merit)
par(mfrow=c(2,1))
hist(df\$merit[df\$gender=="M"]); hist(df\$merit[df\$gender=="W"])
par(mfrow=c(1,1))
plot(df\$merit ~ df\$gender)

## give award to those 19 students with the highest merit values
df\$award[df\$merit >= sort(df\$merit, decreasing = T)[nawards]] <- TRUE
table(df\$award, df\$gender)

## set up an empty awards data frame to collect results of doing this many times
awards <- as.data.frame((matrix(nrow = runs, ncol = 2)))
names(awards) <- c("M", "W")

## Loop: run the merit sampling a bunch of times and save the data about who gets the award
for (i in 1:runs){
df\$award <- FALSE
require(truncnorm)
df\$merit <- rtruncnorm(N, mean = meritmean,
sd = meritsd,
a = meritmin,
b = meritmax)
df\$award[df\$merit >= sort(df\$merit, decreasing = T)[nawards]] <- TRUE
awards[i, ] <- table(df\$award, df\$gender)[2,]
}

## Look at the outputs
hist(awards\$M) # number of awards (out of 19) given to men
hist(awards\$W) # number of awards (out of 19) given to women
table(awards\$M)
table(awards\$W)

## extract the relevant figure: number of times 0 women got an award
P <- table(awards\$W)/runs * 100
paste0("0 woman awardees: ",
P,
"%")

```