Bias in Merit Awards?

An engineering department gave 10 awards in 2022 and 9 awards in 2021 to high-achieving students. “The prerequisite to receive the award is a grade of 6.0 [highest possible grade in Switzerland] for the thesis, an average final grade of at least 5.25 in the Master’s program as well as a written endorsement by the thesis supervisor” (https://mavt.ethz.ch/news-and-events/d-mavt-news/2022/06/eth-medal-and-willi-studer-prize-2022.html). All awardees appeared to be male though the department has roughly 12% female students. So what are the chances of this happening by chance, i.e., with no bias involved and assuming that male and female enrolled students have equal abilities?

About 7%. Here is the simulation in R:

## number of runs through the loop
runs <- 10000

## number of men and women
# (choose numbers so that the proportions are accurate;
# although total class size also matters a bit:
# smaller means even less probable that no women get an award)
NM <- 88
NW <- 12

## total N
N <- NM + NW

## number of awards
nawards <- 19

## set up empty data frame for each student's merit value and indication of getting award
df <- as.data.frame(matrix(nrow = N, ncol = 3))
names(df) <- c("gender", "merit", "award")

## fill up the data frame, start with no awards
df$gender <- as.factor(c(rep("M", NM), rep("W", NW)))
df$award <- FALSE

## set up the sampling from a normal distribution of merit
## (assumptions: merit is quantifiable, normally distributed, same distribution for M and W)
meritmean <- 80 # set the overall mean merit at 80
meritsd <- 10 # standard deviation 10
meritmin <- 30 # assume lowest merit is 30
meritmax <- 100 # max merit is 100

## sample merit values for all students from the specified normal distribution
library(truncnorm)
df$merit <- rtruncnorm(N, mean = meritmean, 
                       sd = meritsd, 
                       a = meritmin, 
                       b = meritmax)

## look at the sampled values, by gender
hist(df$merit)
par(mfrow=c(2,1))
hist(df$merit[df$gender=="M"]); hist(df$merit[df$gender=="W"])
par(mfrow=c(1,1))
plot(df$merit ~ df$gender)

## give award to those 19 students with the highest merit values
df$award[df$merit >= sort(df$merit, decreasing = T)[nawards]] <- TRUE
table(df$award, df$gender)

## set up an empty awards data frame to collect results of doing this many times
awards <- as.data.frame((matrix(nrow = runs, ncol = 2)))
names(awards) <- c("M", "W")

## Loop: run the merit sampling a bunch of times and save the data about who gets the award
for (i in 1:runs){
  df$award <- FALSE
  require(truncnorm)
  df$merit <- rtruncnorm(N, mean = meritmean, 
                         sd = meritsd, 
                         a = meritmin, 
                         b = meritmax)
  df$award[df$merit >= sort(df$merit, decreasing = T)[nawards]] <- TRUE
  awards[i, ] <- table(df$award, df$gender)[2,]
}

## Look at the outputs
hist(awards$M) # number of awards (out of 19) given to men
hist(awards$W) # number of awards (out of 19) given to women
table(awards$M)
table(awards$W)

## extract the relevant figure: number of times 0 women got an award
P <- table(awards$W)[1]/runs * 100
paste0("0 woman awardees: ",
       P,
       "%")



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s