Media Technology Adoption in Europe

This post is inspired by the #tidytuesday CHAT data set and focuses on the diffusion and adoption rates of media technologies since 1992. Most interesting is probably the current data about internet users, whereas the statistics on radios, television sets, and newspaper circulation are only available up to about the year 2000.

On the technical side, this post shows how to combine information from different data sources in R / RStudio and visualize the results using tidyverse and ggplot2 in particular.

Importing, merging, and reshaping data from three different sources

# install.packages("tidytuesdayR")
library(tidytuesdayR)
library(tidyverse)
library(scales)
library(janitor)
theme_set(theme_light())

# the main data set
tuesdata <- tidytuesdayR::tt_load(2022, week = 29)
tech <- tuesdata$technology
tech
rm(tuesdata)
comm <- tech %>% filter(category == "Communications") %>% select(-category,-group)
comm %>% count(variable)

# extract the variable descriptions for plot labeling later
labels <- tech %>% filter(variable %in%
                  c("internetuser", "radio", "tv", "newspaper", "label")) %>% 
  select(label) %>% unique()
(labels <- labels$label)

In the communications category of the raw data, there are 12 different media technologies of which we’ll select the four mentioned above. The data has quite an odd format in that it has a generic ‘value’ column paired with the name of the media technologies in the ‘variable’ column in a long format, rather than having separate columns for each technology.

# A tibble: 70,858 x 5
   variable label                              iso3c  year    value
   <chr>    <chr>                              <chr> <dbl>    <dbl>
 1 cabletv  Households that subscribe to cable AFG    1992       0 
 2 cabletv  Households that subscribe to cable AFG    1993       0 
 3 cabletv  Households that subscribe to cable AFG    1994       0 
 4 cabletv  Households that subscribe to cable AGO    1999    6319.
 5 cabletv  Households that subscribe to cable AGO    2000   11396.
 6 cabletv  Households that subscribe to cable ALB    1996       0 
 7 cabletv  Households that subscribe to cable ALB    2002    7160.
 8 cabletv  Households that subscribe to cable ARG    1993 3605527 
 9 cabletv  Households that subscribe to cable ARG    1994 4159364 
10 cabletv  Households that subscribe to cable ARG    1995 4713815 
# ... with 70,848 more rows

# A tibble: 12 x 2
   variable                  n
   <chr>                 <int>
 1 cabletv                1451
 2 cell_subsc             7855
 3 computer               2144
 4 internetuser           4883
 5 mail                   6270
 6 newspaper              5964
 7 radio                 10344
 8 servers                2333
 9 telegram               6183
10 telephone              7297
11 telephone_canning_wdi 11143
12 tv                     4991

This data on media technology adoption basically only has absolute numbers for a lot of years and a lot of countries, but it is difficult to compare across countries since we don’t have the population sizes yet. For example, comparing the number of internet users across Switzerland, the United States, and China, we can’t see that the adoption is in fact highest in Switzerland:

The population data is available to download from the World Bank and can be saved locally as a CSV file and then imported into R. I also found a table that lists the region and subregion of each country online which might be useful for grouping and can be directly imported into R from the web. I am using dplyr’s inner_join() function here to merge the new data with the tech adoption data and matching the three-letter country codes and then pivot_wider() to get the data into a more useful shape.

## Data on regions from web ----
countries <- read.csv("https://raw.githubusercontent.com/lukes/ISO-3166-Countries-with-Regional-Codes/master/all/all.csv",
                      encoding = "UTF-8")
countries <- countries %>% select(name, alpha.3, region, sub.region)

comm <- comm %>%
  inner_join(countries, by = c("iso3c" = "alpha.3")) %>%
  janitor::clean_names()

## World Bank population data (local CSV downloaded from https://data.worldbank.org/indicator/SP.POP.TOTL) ----

wb_pop <- read_csv("WB_pop.csv", skip = 4) # from readr/tidyverse, not base read.csv
#wb_pop <- wb_pop %>% janitor::clean_names()
wb_pop <- wb_pop %>%
  select(-`Country Name`, -`Indicator Name`, -`Indicator Code`,
         -`2021`, -X67)

wb_pop_long <- wb_pop %>% pivot_longer(`1960`:`2020`,
                        names_to = "year",
                        values_to = "population") %>%
  rename(country_code = `Country Code`) %>% 
  mutate(year = as.numeric(year))

comm <- comm %>% inner_join(wb_pop_long, by = c("iso3c" = "country_code",
                                        "year" = "year"))
# For pivot_wider, need to exclude the label variable,
# otherwise we get a separate row for each unique label
comm <- comm %>% select(-label) %>%
  pivot_wider(names_from = variable, values_from = value)

A last data cleaning step takes care of some remaining issues such as factor orders and naming in order to produce nicer plots:

comm_plots <- comm %>% 
  mutate(country_code = as_factor(iso3c),
         iso3c = NULL,
         country = name,
         name = NULL, 
         country = str_trunc(country, 20, "right"),
         country = as_factor(country),
         region = as_factor(region),
         sub_region = as_factor(sub_region)) %>%
  group_by(region) %>% # for the order of facet_wrap
  mutate(N = n()) %>% 
  ungroup() %>% 
  mutate(country = fct_reorder(country, N)) %>% 
  filter(year > 1992) %>%
  filter(country_code != "PRK")# North Korea has bad data 

With this data in this form, we can for example look at the diffusion of the internet in 200+ countries:

A comparative analysis for Europe

First, we’ll create a separate data frame for ease of use, filtering for European countries, and precalculate the average levels (later used as reference lines) of the four media technologies in the last year they were measured for a good chunk of the countries:

comm_plots_eu <- comm_plots %>% 
  group_by(sub_region) %>% # for the order of facet_wrap
  mutate(N = n()) %>% 
  ungroup() %>% 
  mutate(country = fct_reorder(country, N)) %>% 
  filter(region == "Europe")

ref_internetuser <- comm_plots_eu %>%
  filter(year == 2020) %>% 
  summarise(M_internetuser = mean(internetuser/population, na.rm = TRUE)) %>% 
  unlist() %>% unname()
ref_internetuser <- data.frame(yintercept = ref_internetuser,
                               desc = "European average\n in 2020")

ref_radio <- comm_plots_eu %>%
  filter(year == 1999) %>% 
  summarise(M_radio = mean(radio/population, na.rm = TRUE)) %>% 
  unlist() %>% unname()
ref_radio <- data.frame(yintercept = ref_radio,
                               desc = "European average\n in 1999")

ref_tv <- comm_plots_eu %>%
  filter(year == 1999) %>% 
  summarise(M_tv = mean(tv/population, na.rm = TRUE)) %>% 
  unlist() %>% unname()
ref_tv <- data.frame(yintercept = ref_tv,
                        desc = "European average\n in 1999")

ref_newspaper <- comm_plots_eu %>%
  filter(year == 1999) %>% 
  summarise(M_newspaper = mean(newspaper/population, na.rm = TRUE)) %>% 
  unlist() %>% unname()
ref_newspaper <- data.frame(yintercept = ref_newspaper,
                        desc = "European average\n in 1999")

Now to the plots, producing a smoothed diffusion line for each country and colored by European regions:

#### Internet
comm_plots_eu %>% 
  filter(!is.na(internetuser), internetuser > 100000) %>%
  ggplot(aes(x = year, y = internetuser/population, color = sub_region)) +
  geom_hline(aes(yintercept = yintercept, linetype = desc),
             data = ref_internetuser, color = "gray70") +
  geom_smooth(se = F, span = 0.7) +
  facet_wrap(~country) +
  scale_y_continuous(labels = scales::percent) +#, breaks = c(0,.25,.5,.75,1)
  theme(legend.title = element_blank()) +
  labs(title = labels[1], y = NULL, x = NULL)

#### Radio
comm_plots_eu %>% filter(!is.na(radio), radio > 100000) %>%
  ggplot(aes(x = year, y = radio/population, color = sub_region)) +
  geom_hline(aes(yintercept = yintercept, linetype = desc),
             data = ref_radio, color = "gray70") +
  geom_smooth(se = F, span = 0.7) +
  geom_hline(yintercept = ref_radio, lty = 2, color = "gray50") +
  facet_wrap(~country) +
  #scale_y_continuous(labels = scales::percent) +
  theme(legend.title = element_blank()) +
  labs(title = paste(labels[3], "per capita from 1992 to 1999"), y = NULL, x = NULL)

#### TV
comm_plots_eu %>% filter(!is.na(tv), tv > 100000) %>%
  filter(!country_code %in% c("ISL")) %>% 
  ggplot(aes(x = year, y = tv/population, color = sub_region)) +
  geom_hline(aes(yintercept = yintercept, linetype = desc),
             data = ref_tv, color = "gray70") +
  geom_smooth(se = F, span = 0.7) +
  facet_wrap(~country) +
  #scale_y_continuous(labels = scales::percent) +
  scale_x_continuous(breaks = seq(1992, 2002, 2)) +
  theme(legend.title = element_blank()) +
  labs(title = paste(labels[4], "per capita from 1992 to 2002"), y = NULL, x = NULL)

#### Newspaper
comm_plots_eu %>% filter(!is.na(newspaper), newspaper > 100000) %>%
  filter(!country_code %in% c("BLR", "LTU", "SVK", "EST", "BIH")) %>% # those with no plot data
  ggplot(aes(x = year, y = newspaper/population, color = sub_region)) +
  geom_hline(aes(yintercept = yintercept, linetype = desc),
             data = ref_newspaper, color = "gray70") +
  geom_smooth(se = F, span = 0.7) +
  facet_wrap(~country) +
  #scale_y_continuous(labels = scales::percent) +
  theme(legend.title = element_blank())  +
  labs(title = paste(labels[2], "per capita from 1992 to 1999"), y = NULL, x = NULL)

Click on images to enlarge:

The awards go to…

  • Luxembourg for most online country
  • Bosnia and Herzegovina for fastest adoption in the past 10 years
  • Czechia for most theory-conforming diffusion curve
  • Finland for 90s radio nation
  • Latvia for 90s TV nation
  • Norway for 90s newspaper nation

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s