Opendata.swiss currently provides about 8000 open government data sets from agriculture to health to culture. Here, I'll be looking at a data set from the Federal Statistical Office containing the 500 most successful Swiss films by theater admissions. This post is mostly about preparing data for ggplot and customizing figures in R. You can download … Continue reading Citizenship, Bees, and Zucchini (ggplot2)
Category: Stuff with R
R, RStudio, ggplot / tidyverse, methodology, statistics, simulation
Media Technology Adoption in Europe
This post is inspired by the #tidytuesday CHAT data set and focuses on the diffusion and adoption rates of media technologies since 1992. Most interesting is probably the current data about internet users, whereas the statistics on radios, television sets, and newspaper circulation are only available up to about the year 2000. On the technical … Continue reading Media Technology Adoption in Europe
False Positive Results Visualized: A Simulation in R
If 10,000 studies were run and only those that find substantive and significant results get published, there is evidently a problem. The probability of finding the "truth", i.e., a non-substantive effect in a given study, is only about 50%.
Bias in Merit Awards?
An engineering department gave 10 awards in 2022 and 9 awards in 2021 to high-achieving students. "The prerequisite to receive the award is a grade of 6.0 [highest possible grade in Switzerland] for the thesis, an average final grade of at least 5.25 in the Master’s program as well as a written endorsement by the … Continue reading Bias in Merit Awards?
Analyzing Between-Person and Within-Person Associations
Explanations and implied causal mechanisms for digital media use often operate at the individual level. For example, the hypothesis photo sharing with friends increases social connectedness implies that when people share more photos they will feel more connected. A typical test of such a hypothesis might rely on a linear regression with a count measure … Continue reading Analyzing Between-Person and Within-Person Associations
What’s in a Library?
After not using a reference manager at first (2014–2016) and later being very frustrated with Mendeley after a couple of years, I started using Zotero in 2018. I am extremely happy with the software and its features – it just works very well for everything I do. The browser plugin to import the full citation … Continue reading What’s in a Library?
Success and Luck
If you assume there are 5 jobs available for a pool of N candidates, luck will play a more important role in determining success, i.e., being selected for the job, the larger N is. Here is a simulation example in R adapted from this video. Let's assume a candidate's "true score" can be objectively assessed … Continue reading Success and Luck
Simulating Sample Size Effects
Simulate and plot data in R to see the effects of sample size differences Results: https://twitter.com/MoritzBuchi/status/1394967444209471488 library(truncnorm) # modified version of rnorm() to allow min and max specification n <- 20 # base n f <- 1:75 # sample size multiplication vector N <- n * f # vector of 75 different sample sizes (20 … Continue reading Simulating Sample Size Effects
Quantifying Internet Use
This post summarizes key findings from our article How Long and What For? Tracking a Nationally Representative Sample to Quantify Internet Use published in the Journal of Quantitative Description: Digital Media. Read more about this new journal here. The internet is increasingly used across multiple devices, often on the go, and very much integrated into … Continue reading Quantifying Internet Use
How the Participation Gap Biases Group Evaluation
It is misleading to use the top performing individuals to compare groups of unequal sizes. Say you wanted to know whether men or women are better at chess or which country has the best athletes; using the top performers as representatives for each group (gender or country) would bias the evaluation simply because of group … Continue reading How the Participation Gap Biases Group Evaluation