04 Introduction to PISA

1 Introduction to PISA

1.1 Pre-session tasks

1.1.1 Pre-reading

Please read section 1 (“What is PISA?”) of the PISA 2022 Assessment and Analytical Framework: PISA Assessment Framework

1.1.2 Getting set up

Remember to load the PISA 2022 data set

library(arrow)
library(tidyverse)

PISA_2022 <- read_parquet(r"[<folder>PISA_2022_student_subset.parquet]")

1.2 The PISA assessments

The first International Large-Scale Assessment (ILSA) comparing the learning outcomes of school students between countries was attempted in the 1960s. However, ILSAs only became established and regular in the late 1990s and 2000s.

The Organisation for Economic Co-operation and Development’s (OECD) Programme for International Student Assessment (PISA) has tested 15-year-old students in a range of “literacies” or “competencies” every three years since 2000. There is a rotating focus on reading, mathematics and science, with PISA 2021 focusing on mathematics but delayed by the global pandemic until 2022 and the results only published in December 2023. Until then, PISA 2018, with a focus on reading, was the most recently available cycle and PISA 2015 remains the most recent cycle focusing on science.

In addition to reading, mathematics and science, PISA has tested students on a range of “novel” competencies including problem-solving, global competence, financial literacy, and creative thinking. In addition to these tests, PISA also administers questionnaires to students, teachers and parents to identify “factors” which explain test score differences within and between countries.

Since 2000, more than 90 “countries and economies” and around 3,000,000 students have participated in PISA. The growth in the number of countries participating in each cycle of PISA is reflected in the growth in the number of students taking the PISA tests and responding to the PISA questionnaires, as shown in Table 1.

Table 1: Number of students participating in PISA by year

Year Number completing assessment
2000 265,000
2003 275,000
2006 400,000
2009 470,000
2012 510,000
2015 540,000
2018 600,000
2022 690,000

There is a degree of inherent error in all educational and psychological assessments - and indeed in all social or physical measurement. ILSAs such as PISA may be more prone to error because their comparisons across large and diverse populations make them particularly complex. However, it is particularly important to minimise the error in ILSAs because they influence education policy and practice across a large number of education systems, impacting a vast population of students beyond those sampled for the assessments.

According to the OECD (2019), three sources of error are worth considering. First, sampling error, uncertainty in the degree to which results from the sample generalise to the wider population - in 2018, the OECD average sampling error was 0.4 of a PISA point score (the value was not reported for 2022). Second, measurement error, uncertainty in the extent to which test items measure proficiency. In 2018, the measurement error was around 0.8 of a point in mathematics and science and 0.5 of a score point in reading (the measurement error was not reported for 2022). Third, the link error is the uncertainty in comparison between scores in different years. For comparisons of science scores between 2018 and 2015, the link error is 1.5 points. For 2018-2022, the link errors are reading (1.47), mathematics (2.24) and science (1.61) (OECD 2022, 293)

PISA uses a probabilistic, stratified clustered survey design (Jerrim et al. 2017). However, sampling issues including sample representativeness, non-response rates and population coverage have been identified (Zieger et al. 2022; Rutkowski and Rutkowski 2016; Gillis, Polesel, and Wu 2016; Hopmann, Brinek, and Retzl 2007). Furthermore, Anders et al. (2021) and Jerrim (2021) have shown that assumptions for imputing values (imputing means estimating any missing values based on existing data - for example by adding a mean or mode score for a missing test) for non-participating students used to construct the sample may have significant impacts on achievement scores.

Since PISA 2015, the majority of participating countries have switched from paper-based assessment to computer-based assessment (Jerrim 2016). A randomised controlled trial conducted by the OECD prior to the switch indicated a difference in score between the two modes of delivery. The OECD introduced an adjustment to compensate for this difference, but it is not entirely removed by the adjustment Jerrim et al. (2018), with implications for any time series comparisons between PISA cycles. Nonetheless, Jerrim (2016) notes that “in terms of cross-country rankings, there remains a high degree of consistency… the vast majority of countries are simply ‘shifted’ by a uniform amount” (pp. 508-509).

In summary, comparisons within and between countries and comparisons over time using ILSAs need careful interpretations that bear in mind the specific design of each ILSA. In practice, this means considering a range of potential explanations for score differences. Does a difference in science ranking between two countries simply reflect sampling error? Does the same parental occupation or home possessions amount to the same economic, social and cultural status in different countries (e.g. the social status of a parent as a teacher or the economic status of the number of cars a family owns)? Does a difference in mathematical self-efficacy (i.e. student self-confidence in mathematics) between the USA and Japan reflect sociocultural differences in self-enhancement and modesty, respectively? How do score differences between boys and girls indicate gender inequalities in education that reflect wider society?

Tip

For useful critique and discussion of the construction of the measure of socio-economic status in PISA data see: Avvisati’s (2020) paper.

1.3 A reminder about summarising data, graphing and categorising

1.3.1 Summarising data

Recall you can use group_by and summarise to group individual student measures and find means and standard deviations for countries. For example, to find the mean wealth scores for the countries, and rank in descending order, we first select the variables of interest CNT and HOMEPOS (home possessions, a proxy for wealth), then group_by CNT and summarise to get the mean. As there are some NA values, we need to include na.rm=TRUE to tell summarise to ignore the missing values. Finally, we arrange in descending order by the new variable we create meanwealth. We can do the same and add a calculation to get the standard deviation.

# Create a data frame of PISA 2022 data of country mean wealth

PISA2022WealthRank <- PISA_2022 %>%
2  select(CNT, HOMEPOS) %>%
3  group_by(CNT) %>%
4  summarise(meanwealth = mean(HOMEPOS, na.rm = TRUE)) %>%
5  arrange(desc(meanwealth))

PISA2022WealthRank

# With standard deviations

PISA2022WealthRank <- PISA_2022 %>%
  select(CNT, HOMEPOS) %>% 
  group_by(CNT) %>% 
  summarise(meanwealth = mean(HOMEPOS, na.rm = TRUE),  
            sdwealth=sd(HOMEPOS, na.rm = TRUE)) %>%
  arrange(desc(meanwealth)) 

PISA2022WealthRank
2
line 2 - select the variables of interest
3
line 3 - treat the data as grouped by country (group_by(CNT))
4
line 4 - summarise to calculate the mean score of HOMEPOS in a new column meanwealth, setting na.rm=TRUE to ignore NA values
5
line 5 - arrange in descending order by meanwealth
# A tibble: 80 × 2
   CNT                          meanwealth
   <fct>                             <dbl>
 1 Norway                          0.547  
 2 Australia                       0.483  
 3 Korea                           0.371  
 4 New Zealand                     0.367  
 5 Canada                          0.348  
 6 Iceland                         0.346  
 7 Sweden                          0.327  
 8 Ireland                         0.318  
 9 Malta                           0.308  
10 Austria                         0.280  
11 Netherlands                     0.255  
12 Denmark                         0.237  
13 Switzerland                     0.221  
14 Czech Republic                  0.194  
15 Slovenia                        0.186  
16 Estonia                         0.178  
17 Finland                         0.162  
18 Germany                         0.149  
19 Singapore                       0.124  
20 United Kingdom                  0.116  
21 United States                   0.115  
22 Hungary                         0.104  
23 Italy                           0.0887 
24 Belgium                         0.0866 
25 Poland                          0.0825 
26 Portugal                        0.0755 
27 Spain                           0.0739 
28 Israel                          0.0499 
29 Latvia                          0.00480
30 Lithuania                      -0.0451 
31 Croatia                        -0.117  
32 France                         -0.139  
33 United Arab Emirates           -0.157  
34 Slovak Republic                -0.187  
35 Hong Kong (China)              -0.198  
36 Japan                          -0.226  
37 Serbia                         -0.229  
38 Macao (China)                  -0.253  
39 Greece                         -0.264  
40 Montenegro                     -0.276  
41 Brunei Darussalam              -0.356  
42 Bulgaria                       -0.368  
43 Chile                          -0.388  
44 Chinese Taipei                 -0.395  
45 Romania                        -0.399  
46 Qatar                          -0.442  
47 North Macedonia                -0.490  
48 Ukrainian regions (18 of 27)   -0.550  
49 Kosovo                         -0.621  
50 Saudi Arabia                   -0.689  
51 Uruguay                        -0.747  
52 Argentina                      -0.806  
53 Georgia                        -0.809  
54 Jamaica                        -0.834  
55 Republic of Moldova            -0.846  
56 Albania                        -0.859  
57 Kazakhstan                     -0.870  
58 Malaysia                       -0.908  
59 Costa Rica                     -0.979  
60 Baku (Azerbaijan)              -0.980  
61 Mexico                         -1.07   
62 Türkiye                        -1.08   
63 Thailand                       -1.17   
64 Brazil                         -1.22   
65 Colombia                       -1.26   
66 Viet Nam                       -1.29   
67 Uzbekistan                     -1.30   
68 Dominican Republic             -1.31   
69 Mongolia                       -1.31   
70 Panama                         -1.32   
71 Jordan                         -1.38   
72 Peru                           -1.40   
73 Palestinian Authority          -1.49   
74 Paraguay                       -1.52   
75 Guatemala                      -1.52   
76 El Salvador                    -1.57   
77 Indonesia                      -1.58   
78 Philippines                    -1.75   
79 Morocco                        -1.77   
80 Cambodia                       -2.41   
# A tibble: 80 × 3
   CNT                          meanwealth sdwealth
   <fct>                             <dbl>    <dbl>
 1 Norway                          0.547      0.970
 2 Australia                       0.483      0.861
 3 Korea                           0.371      1.01 
 4 New Zealand                     0.367      0.862
 5 Canada                          0.348      0.867
 6 Iceland                         0.346      0.805
 7 Sweden                          0.327      0.878
 8 Ireland                         0.318      0.818
 9 Malta                           0.308      0.857
10 Austria                         0.280      0.938
11 Netherlands                     0.255      0.802
12 Denmark                         0.237      0.815
13 Switzerland                     0.221      0.895
14 Czech Republic                  0.194      0.852
15 Slovenia                        0.186      0.833
16 Estonia                         0.178      0.740
17 Finland                         0.162      0.862
18 Germany                         0.149      0.946
19 Singapore                       0.124      0.840
20 United Kingdom                  0.116      0.919
21 United States                   0.115      0.927
22 Hungary                         0.104      0.914
23 Italy                           0.0887     0.803
24 Belgium                         0.0866     0.869
25 Poland                          0.0825     0.794
26 Portugal                        0.0755     0.901
27 Spain                           0.0739     0.805
28 Israel                          0.0499     1.03 
29 Latvia                          0.00480    0.774
30 Lithuania                      -0.0451     0.832
31 Croatia                        -0.117      0.719
32 France                         -0.139      0.972
33 United Arab Emirates           -0.157      1.04 
34 Slovak Republic                -0.187      0.990
35 Hong Kong (China)              -0.198      0.881
36 Japan                          -0.226      0.761
37 Serbia                         -0.229      0.768
38 Macao (China)                  -0.253      0.845
39 Greece                         -0.264      0.820
40 Montenegro                     -0.276      0.917
41 Brunei Darussalam              -0.356      1.01 
42 Bulgaria                       -0.368      1.04 
43 Chile                          -0.388      0.959
44 Chinese Taipei                 -0.395      0.985
45 Romania                        -0.399      0.991
46 Qatar                          -0.442      1.07 
47 North Macedonia                -0.490      0.915
48 Ukrainian regions (18 of 27)   -0.550      0.785
49 Kosovo                         -0.621      0.941
50 Saudi Arabia                   -0.689      1.04 
51 Uruguay                        -0.747      0.924
52 Argentina                      -0.806      1.01 
53 Georgia                        -0.809      0.972
54 Jamaica                        -0.834      1.12 
55 Republic of Moldova            -0.846      0.890
56 Albania                        -0.859      1.01 
57 Kazakhstan                     -0.870      0.832
58 Malaysia                       -0.908      0.962
59 Costa Rica                     -0.979      1.27 
60 Baku (Azerbaijan)              -0.980      0.971
61 Mexico                         -1.07       1.00 
62 Türkiye                        -1.08       1.02 
63 Thailand                       -1.17       1.13 
64 Brazil                         -1.22       0.956
65 Colombia                       -1.26       1.12 
66 Viet Nam                       -1.29       0.922
67 Uzbekistan                     -1.30       0.960
68 Dominican Republic             -1.31       0.977
69 Mongolia                       -1.31       0.949
70 Panama                         -1.32       1.21 
71 Jordan                         -1.38       1.18 
72 Peru                           -1.40       1.20 
73 Palestinian Authority          -1.49       1.25 
74 Paraguay                       -1.52       1.13 
75 Guatemala                      -1.52       1.31 
76 El Salvador                    -1.57       1.08 
77 Indonesia                      -1.58       0.911
78 Philippines                    -1.75       1.13 
79 Morocco                        -1.77       1.19 
80 Cambodia                       -2.41       1.08 

1.3.2 Bar charts

Recall you can use geom_bar to plot a bar graph. For example, if we wanted to plot the PISA2022WealthRank data frame we just created, we pass the data to ggplot. Recall that if you are passing geom_bar the exact values you want to plot, rather than making it count (for example, by including the original dataset with all student entries), you need to specify geom_bar(stat='identity')

I have added +theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) which rotates the text on the x-axis.

# Plot a bar graph of wealth by country

1ggplot(PISA2022WealthRank, aes(x = CNT, y = meanwealth)) +
2  geom_bar(stat = 'identity') +
3  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
1
line 1 - pass the PISA2022WealthRank to ggplot and set the x and y variables
2
line 2 - as the data are already summarised, we don’t want geom_bar to count items, but tell it to just plot the data as it is
3
line 3 - rotate the x-axis text

We can improve this plot by reordering the x-axis to rank the countries - we switch x=CNT to x=reorder(CNT, -meanwealth) that is we reorder the x axis based on descending (indicated by the minus sign -meanwealth) meanwealth.

# Plot the wealth data frame as a bar graph, reordering the x axis by wealth

1ggplot(PISA2022WealthRank, aes(x=reorder(CNT, -meanwealth), y = meanwealth)) +
  geom_bar(stat='identity') +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))       
1
line 1 - rather than simply specifying the x axis (e.g. x=CNT) to change the order of the x-axis by the meanwealth score we can use x=reorder(CNT, -meanwealth). Note the - before meanwealth sets the order is descending.

If you like, you can add colour, tidy up the axis labels, and give a title:

# Plot the wealth data frame as a bar graph, reordering the x axis by wealth

ggplot(PISA2022WealthRank, aes(x = reorder(CNT, -meanwealth), 
                               y = meanwealth)) +
3  geom_bar(stat='identity', fill = "skyblue") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
5  ggtitle("Countries ranked by HOMEPOS") +
6  xlab("Country") +
  ylab("Mean HOMEPOS")
3
line 3 - set the bar fill colour to sky blue (fill = "skyblue")
5
line 6 - set the x-axis title
6
line 7 - set the y-axis title

1.3.3 Scatter plots

To plot a scatter plot, recall we use geom_point. For example, to plot reading scores against mathematics scores in the UK we: a) create a data set of reading and science scores after filtering for UK; b) pass the data to ggplot; c) use aes to specify the x and y variables and d) plot with geom_point().

# Create a data.frame of the UK's science and reading scores

UKplot <- PISA_2022 %>%
  select(CNT, PV1READ, PV1SCIE) %>%
  filter(CNT == "United Kingdom")

# Plot the data on a scatter graph using geom_point

ggplot(UKplot, aes(x = PV1READ, y = PV1SCIE)) +
  geom_point()

That graph is quite dense, so we can use the alpha function to make the points slightly transparent, size to make them smaller, and set their colour. I will also tidy up the axis names and add a line (note that in: geom_smooth(method = "lm", colour = "black") method = "lm" sets the line to a straight (i.e., linear model, lm) line).

# Create a data.frame of the UK's science and reading scores

UKplot <- PISA_2022 %>%
  select(CNT, PV1READ, PV1SCIE) %>%
  filter(CNT == "United Kingdom")

# Plot the data on a scatter graph using geom_point

4ggplot(UKplot, aes(x = PV1READ, y = PV1SCIE)) +
5  geom_point(alpha = 0.6, size = 0.1, colour = "red") +
6  xlab("Reading score") +
7  ylab("Science score") +
8  geom_smooth(method = "lm", colour = "black")
4
line 4 - set the data to plot and set which variable goes on the x and y axis
5
line 5 - set the point size (size=0.1), colour (colour = "red") and opacity (alpha = 0.6)
6
line 6 - set the x-axis title
7
line 7 - set the y-axis title
8
line 8 - plot a straight line (method = "lm") and set its colour to black

1.3.4 Density plots

An alternative type of plot is the density plot, which is a kind of continuous histogram. The density plot can be useful for visualising the achievement scores of students. For example, the mathematics scores of girls and boys (recall the gender variable is ST004D01T) in the US. We use na.omit to omit NAs. Notice, for the plot, I use aes to set my x variable, and then specify that the plot should fill by gender (fill=ST004D01T). Finally, in geom_density(alpha=0.6) I set the alpha to 0.6 to make the fill areas partially transparent.

Tip

The y-axis on a density plot is chosen so that the total area under the graph adds up to 1

# Create a data.frame of US Math data including gender

USMathplot <- PISA_2022 %>%
  select(CNT, PV1MATH, ST004D01T) %>%
  filter(CNT == "United States") %>%
  na.omit()

# PLot a density chart, seeting the fill by gender, and setting the opacity to
# 0.6 to show both gender plots

ggplot(USMathplot, aes(x = PV1MATH, fill = ST004D01T)) +
  geom_density(alpha = 0.6)

1.3.5 Facet wrapping - producing the same graph for multiple countries.

A powerful feature of ggplot is being able to produce the same graph for multiple values of a variable, for example, for multiple countries. For example, we may want to produce the density graph of PV1MATH score by gender, for several countries in the data set. To do that, we produce a data set of PV1MATH scores, and gender (ST004D01T) and filter for four countries (Philippines, UK, Bulgaria and Germany). We use the same code as above to plot the graphs but add +facet_wrap(.~CNT) - facet_wrap tells ggplot to produce a multi-panel plot and .~CNT means do the same as above (the . means, as above), but vary across countries (~CNT).

# Create a data.frame of the maths scores for the 4 countries
Mathplot <- PISA_2022 %>%
  select(CNT, PV1MATH, ST004D01T) %>%
  filter(CNT == "Philippines"|CNT == "United Kingdom"|CNT == "Bulgaria" |
           CNT == "Germany")

# Plot the data, changing colour by gender, and faceting for the countries

5ggplot(Mathplot, aes(x = PV1MATH, fill = ST004D01T)) +
6  geom_density(alpha = 0.6) +
7  facet_wrap(. ~ CNT)
5
line 5 - pass the data to plot Mathplot and set the x axis (no y is needed for a geom_density plot) - set that we want two series, with the colour set by gender (ST004D01T)
6
line 6 - set fill (alpha = 0.6) so both gender plots are visible where they overlap
7
line 7 - facet_wrap repeats the initial graph for some variable. In this case we specify we want the same graph as above (.) but we want to produce versions for each country (~CNT) to give facet_wrap(. ~ CNT)

1.3.6 Categorising responses

A useful analytical choice is to categorise some a numerical variable into ordinal classes. For example, rather than treating HOMEPOS as a continuous scale, you might want to split into high and low wealth groups (for example, those above and below the mean value).

To do this, first calculate the mean mean(HOMEPOS). Then we add a new vector, which we will call wealthclass using the mutate function. We set the value of wealthclass using case_when. If HOMEPOS is more than the mean score, we set wealthclass to High, and if it is less than the mean, we set it to Low. We do that using mutate(wealthclass = case_when(HOMEPOS > mean(HOMEPOS, na.rm =TRUE) ~ "High", HOMEPOS < mean(HOMEPOS, na.rm =TRUE) ~ "Low", .default = NA)). This means that in the case when HOMEPOS is more than the mean (note the na.rm =TRUE to remove missing values) the value of the new column wealthclass is set to High. When HOMEPOS is less than mean(HOMEPOS, na.rm =TRUE), weatlthclass is set to Low. The .default sets what to return if neither of those conditions are met.

For example, create a data frame of UK participants HOMEPOS sorted into HIGH and LOW categories.

# Create a data frame of UK responses
UKPISA2022 <- PISA_2022 %>%
  select(CNT, HOMEPOS) %>%
  filter(CNT == "United Kingdom") %>%
4  mutate(wealthclass =  case_when(HOMEPOS > mean(HOMEPOS, na.rm =TRUE) ~ "High",
                                  HOMEPOS < mean(HOMEPOS, na.rm =TRUE) ~ "Low",
                                   .default = NA)) 
UKPISA2022
4
line 4 - mutate to create a new column wealthclass - if HOMEPOS is more than mean(HOMEPOS), set the column to “High” otherwise set it to “Low”
# A tibble: 12,972 × 3
    CNT            HOMEPOS wealthclass
    <fct>            <dbl> <chr>      
  1 United Kingdom -1.09   Low        
  2 United Kingdom -0.418  Low        
  3 United Kingdom  1.13   High       
  4 United Kingdom -0.829  Low        
  5 United Kingdom -0.274  Low        
  6 United Kingdom NA      <NA>       
  7 United Kingdom -0.606  Low        
  8 United Kingdom NA      <NA>       
  9 United Kingdom  0.425  High       
 10 United Kingdom  0.998  High       
 11 United Kingdom  1.73   High       
 12 United Kingdom -1.20   Low        
 13 United Kingdom  1.81   High       
 14 United Kingdom NA      <NA>       
 15 United Kingdom -0.452  Low        
 16 United Kingdom -0.626  Low        
 17 United Kingdom -0.171  Low        
 18 United Kingdom  1.40   High       
 19 United Kingdom -0.720  Low        
 20 United Kingdom -0.930  Low        
 21 United Kingdom -0.517  Low        
 22 United Kingdom -0.840  Low        
 23 United Kingdom -0.741  Low        
 24 United Kingdom -1.66   Low        
 25 United Kingdom  0.42   High       
 26 United Kingdom -0.637  Low        
 27 United Kingdom  1.94   High       
 28 United Kingdom NA      <NA>       
 29 United Kingdom  0.964  High       
 30 United Kingdom -0.259  Low        
 31 United Kingdom -0.599  Low        
 32 United Kingdom -0.088  Low        
 33 United Kingdom  0.553  High       
 34 United Kingdom -0.168  Low        
 35 United Kingdom  0.158  High       
 36 United Kingdom  1.28   High       
 37 United Kingdom -0.312  Low        
 38 United Kingdom -0.434  Low        
 39 United Kingdom  0.420  High       
 40 United Kingdom  0.149  High       
 41 United Kingdom  0.855  High       
 42 United Kingdom -0.700  Low        
 43 United Kingdom  0.606  High       
 44 United Kingdom  0.233  High       
 45 United Kingdom -0.518  Low        
 46 United Kingdom  0.0376 Low        
 47 United Kingdom  1.50   High       
 48 United Kingdom NA      <NA>       
 49 United Kingdom  0.488  High       
 50 United Kingdom -0.266  Low        
 51 United Kingdom -0.0388 Low        
 52 United Kingdom NA      <NA>       
 53 United Kingdom -1.28   Low        
 54 United Kingdom  0.473  High       
 55 United Kingdom  0.415  High       
 56 United Kingdom  0.831  High       
 57 United Kingdom  0.033  Low        
 58 United Kingdom -0.190  Low        
 59 United Kingdom -0.885  Low        
 60 United Kingdom  2.75   High       
 61 United Kingdom  0.323  High       
 62 United Kingdom  1.62   High       
 63 United Kingdom  0.861  High       
 64 United Kingdom  1.20   High       
 65 United Kingdom -0.332  Low        
 66 United Kingdom NA      <NA>       
 67 United Kingdom -0.0787 Low        
 68 United Kingdom -0.414  Low        
 69 United Kingdom  0.243  High       
 70 United Kingdom -1.01   Low        
 71 United Kingdom -0.910  Low        
 72 United Kingdom NA      <NA>       
 73 United Kingdom NA      <NA>       
 74 United Kingdom  0.866  High       
 75 United Kingdom -0.481  Low        
 76 United Kingdom  1.22   High       
 77 United Kingdom  0.921  High       
 78 United Kingdom  1.56   High       
 79 United Kingdom NA      <NA>       
 80 United Kingdom NA      <NA>       
 81 United Kingdom -0.168  Low        
 82 United Kingdom -0.137  Low        
 83 United Kingdom -0.0073 Low        
 84 United Kingdom -1.35   Low        
 85 United Kingdom -0.656  Low        
 86 United Kingdom  1.09   High       
 87 United Kingdom  0.131  High       
 88 United Kingdom  0.806  High       
 89 United Kingdom -0.508  Low        
 90 United Kingdom  2.08   High       
 91 United Kingdom  1.10   High       
 92 United Kingdom NA      <NA>       
 93 United Kingdom  0.870  High       
 94 United Kingdom NA      <NA>       
 95 United Kingdom -0.392  Low        
 96 United Kingdom NA      <NA>       
 97 United Kingdom  0.781  High       
 98 United Kingdom -0.765  Low        
 99 United Kingdom -0.680  Low        
100 United Kingdom -0.505  Low        
101 United Kingdom  0.124  High       
102 United Kingdom NA      <NA>       
103 United Kingdom  1.67   High       
104 United Kingdom -0.468  Low        
105 United Kingdom -0.266  Low        
106 United Kingdom NA      <NA>       
107 United Kingdom  0.609  High       
108 United Kingdom -1.34   Low        
109 United Kingdom  0.422  High       
110 United Kingdom  0.732  High       
111 United Kingdom -0.119  Low        
112 United Kingdom  1.04   High       
113 United Kingdom  1.27   High       
114 United Kingdom -0.408  Low        
115 United Kingdom  1.12   High       
116 United Kingdom -0.652  Low        
117 United Kingdom  0.130  High       
118 United Kingdom -0.888  Low        
119 United Kingdom -0.219  Low        
120 United Kingdom  0.884  High       
121 United Kingdom  0.682  High       
122 United Kingdom  1.02   High       
123 United Kingdom NA      <NA>       
124 United Kingdom  1.21   High       
125 United Kingdom -0.452  Low        
126 United Kingdom -1.06   Low        
127 United Kingdom  0.230  High       
128 United Kingdom  0.594  High       
129 United Kingdom  0.837  High       
130 United Kingdom -0.455  Low        
131 United Kingdom -0.608  Low        
132 United Kingdom -0.367  Low        
133 United Kingdom -1.44   Low        
134 United Kingdom  0.356  High       
135 United Kingdom NA      <NA>       
136 United Kingdom -1.44   Low        
137 United Kingdom  0.800  High       
138 United Kingdom  1.12   High       
139 United Kingdom -0.121  Low        
140 United Kingdom -0.862  Low        
141 United Kingdom  0.357  High       
142 United Kingdom  0.0831 Low        
143 United Kingdom -0.754  Low        
144 United Kingdom  1.68   High       
145 United Kingdom  0.608  High       
146 United Kingdom -0.184  Low        
147 United Kingdom -0.431  Low        
148 United Kingdom -1.32   Low        
149 United Kingdom  1.21   High       
150 United Kingdom -0.0912 Low        
151 United Kingdom -0.0866 Low        
152 United Kingdom  0.818  High       
153 United Kingdom  0.764  High       
154 United Kingdom -0.111  Low        
155 United Kingdom  1.45   High       
156 United Kingdom -0.936  Low        
157 United Kingdom  1.56   High       
158 United Kingdom -0.0655 Low        
159 United Kingdom  0.163  High       
160 United Kingdom -0.0348 Low        
161 United Kingdom  0.672  High       
162 United Kingdom NA      <NA>       
163 United Kingdom  0.799  High       
164 United Kingdom -0.0587 Low        
165 United Kingdom -1.14   Low        
166 United Kingdom -0.991  Low        
167 United Kingdom -0.468  Low        
168 United Kingdom NA      <NA>       
169 United Kingdom -1.15   Low        
170 United Kingdom  0.0185 Low        
171 United Kingdom  1.87   High       
172 United Kingdom  1.61   High       
173 United Kingdom NA      <NA>       
174 United Kingdom  0.553  High       
175 United Kingdom  1.38   High       
176 United Kingdom  1.65   High       
177 United Kingdom  1.31   High       
178 United Kingdom -2.19   Low        
179 United Kingdom -1.39   Low        
180 United Kingdom  0.461  High       
181 United Kingdom -1.28   Low        
182 United Kingdom -0.0547 Low        
183 United Kingdom  1.35   High       
184 United Kingdom -0.285  Low        
185 United Kingdom -0.558  Low        
186 United Kingdom  0.165  High       
187 United Kingdom  0.0727 Low        
188 United Kingdom  0.380  High       
189 United Kingdom  0.832  High       
190 United Kingdom  0.306  High       
191 United Kingdom  0.475  High       
192 United Kingdom -0.0706 Low        
193 United Kingdom  1.24   High       
194 United Kingdom -1.42   Low        
195 United Kingdom  0.0354 Low        
196 United Kingdom -0.311  Low        
197 United Kingdom  0.234  High       
198 United Kingdom -0.838  Low        
199 United Kingdom -1.17   Low        
200 United Kingdom NA      <NA>       
201 United Kingdom -0.303  Low        
202 United Kingdom  0.927  High       
203 United Kingdom  0.257  High       
204 United Kingdom  0.281  High       
205 United Kingdom  0.903  High       
206 United Kingdom  1.17   High       
207 United Kingdom -1.14   Low        
208 United Kingdom  1.16   High       
209 United Kingdom  0.768  High       
210 United Kingdom -0.0763 Low        
211 United Kingdom  0.337  High       
212 United Kingdom -0.324  Low        
213 United Kingdom  0.435  High       
214 United Kingdom -0.592  Low        
215 United Kingdom  0.167  High       
216 United Kingdom NA      <NA>       
217 United Kingdom -1.37   Low        
218 United Kingdom -0.238  Low        
219 United Kingdom NA      <NA>       
220 United Kingdom  0.202  High       
221 United Kingdom -0.623  Low        
222 United Kingdom -1.01   Low        
223 United Kingdom -1.42   Low        
224 United Kingdom -0.487  Low        
225 United Kingdom  0.0919 Low        
226 United Kingdom  1.91   High       
227 United Kingdom  0.368  High       
228 United Kingdom NA      <NA>       
229 United Kingdom  0.0909 Low        
230 United Kingdom -1.50   Low        
231 United Kingdom  0.161  High       
232 United Kingdom NA      <NA>       
233 United Kingdom -0.818  Low        
234 United Kingdom  1.33   High       
235 United Kingdom -1.01   Low        
236 United Kingdom -0.671  Low        
237 United Kingdom -0.161  Low        
238 United Kingdom -0.0321 Low        
239 United Kingdom NA      <NA>       
240 United Kingdom -0.340  Low        
241 United Kingdom -0.426  Low        
242 United Kingdom  0.0556 Low        
243 United Kingdom -0.436  Low        
244 United Kingdom  0.524  High       
245 United Kingdom NA      <NA>       
246 United Kingdom -1.12   Low        
247 United Kingdom NA      <NA>       
248 United Kingdom -0.394  Low        
249 United Kingdom -0.202  Low        
250 United Kingdom  1.29   High       
251 United Kingdom -0.810  Low        
252 United Kingdom -0.782  Low        
253 United Kingdom  0.632  High       
254 United Kingdom NA      <NA>       
255 United Kingdom -0.576  Low        
256 United Kingdom  1.08   High       
257 United Kingdom  0.795  High       
258 United Kingdom NA      <NA>       
259 United Kingdom  0.991  High       
260 United Kingdom NA      <NA>       
261 United Kingdom  0.400  High       
262 United Kingdom  0.243  High       
263 United Kingdom -1.31   Low        
264 United Kingdom -0.614  Low        
265 United Kingdom  0.533  High       
266 United Kingdom  0.0851 Low        
267 United Kingdom  0.257  High       
268 United Kingdom -0.746  Low        
269 United Kingdom  0.818  High       
270 United Kingdom  0.0313 Low        
271 United Kingdom  0.418  High       
272 United Kingdom  0.526  High       
273 United Kingdom  1.66   High       
274 United Kingdom  0.788  High       
275 United Kingdom  0.0385 Low        
276 United Kingdom -0.163  Low        
277 United Kingdom -1.19   Low        
278 United Kingdom  0.0895 Low        
279 United Kingdom -0.765  Low        
280 United Kingdom -1.27   Low        
281 United Kingdom  0.0851 Low        
282 United Kingdom  0.834  High       
283 United Kingdom  1.02   High       
284 United Kingdom -0.779  Low        
285 United Kingdom -0.268  Low        
286 United Kingdom  0.221  High       
287 United Kingdom  0.160  High       
288 United Kingdom  0.111  Low        
289 United Kingdom  0.920  High       
290 United Kingdom NA      <NA>       
291 United Kingdom  1.49   High       
292 United Kingdom -0.748  Low        
293 United Kingdom -0.0714 Low        
294 United Kingdom NA      <NA>       
295 United Kingdom  0.197  High       
296 United Kingdom  1.47   High       
297 United Kingdom  1.25   High       
298 United Kingdom -1.14   Low        
299 United Kingdom NA      <NA>       
300 United Kingdom  0.320  High       
301 United Kingdom -0.204  Low        
302 United Kingdom  0.179  High       
303 United Kingdom  1.94   High       
304 United Kingdom NA      <NA>       
305 United Kingdom  0.350  High       
306 United Kingdom -0.842  Low        
307 United Kingdom  0.308  High       
308 United Kingdom -0.722  Low        
309 United Kingdom NA      <NA>       
310 United Kingdom -0.809  Low        
311 United Kingdom -0.0455 Low        
312 United Kingdom  1.14   High       
313 United Kingdom  0.161  High       
314 United Kingdom -0.858  Low        
315 United Kingdom  0.128  High       
316 United Kingdom -0.610  Low        
317 United Kingdom  0.513  High       
318 United Kingdom NA      <NA>       
319 United Kingdom -0.227  Low        
320 United Kingdom  1.20   High       
321 United Kingdom -0.762  Low        
322 United Kingdom -1.91   Low        
323 United Kingdom NA      <NA>       
324 United Kingdom -0.300  Low        
325 United Kingdom  1.21   High       
326 United Kingdom NA      <NA>       
327 United Kingdom  1.21   High       
328 United Kingdom -0.368  Low        
329 United Kingdom  0.712  High       
330 United Kingdom  0.908  High       
331 United Kingdom NA      <NA>       
332 United Kingdom NA      <NA>       
333 United Kingdom -0.530  Low        
334 United Kingdom -1.22   Low        
335 United Kingdom NA      <NA>       
336 United Kingdom -0.510  Low        
337 United Kingdom -0.511  Low        
338 United Kingdom NA      <NA>       
339 United Kingdom NA      <NA>       
340 United Kingdom  0.728  High       
341 United Kingdom  0.993  High       
342 United Kingdom -1.33   Low        
343 United Kingdom -0.743  Low        
344 United Kingdom -0.238  Low        
345 United Kingdom  1.85   High       
346 United Kingdom  0.931  High       
347 United Kingdom NA      <NA>       
348 United Kingdom  0.563  High       
349 United Kingdom  0.0552 Low        
350 United Kingdom -0.901  Low        
351 United Kingdom NA      <NA>       
352 United Kingdom NA      <NA>       
353 United Kingdom NA      <NA>       
354 United Kingdom NA      <NA>       
355 United Kingdom -0.0691 Low        
356 United Kingdom NA      <NA>       
357 United Kingdom  0.705  High       
358 United Kingdom -0.087  Low        
359 United Kingdom  0.488  High       
360 United Kingdom -0.887  Low        
361 United Kingdom  1.32   High       
362 United Kingdom  0.840  High       
363 United Kingdom -0.972  Low        
364 United Kingdom -0.581  Low        
365 United Kingdom  0.383  High       
366 United Kingdom -0.202  Low        
367 United Kingdom  0.600  High       
368 United Kingdom  0.256  High       
369 United Kingdom -1.19   Low        
370 United Kingdom  1.13   High       
371 United Kingdom  0.246  High       
372 United Kingdom  2.65   High       
373 United Kingdom -0.517  Low        
374 United Kingdom  0.500  High       
375 United Kingdom  0.252  High       
376 United Kingdom  0.301  High       
377 United Kingdom -0.719  Low        
378 United Kingdom  0.227  High       
379 United Kingdom -0.114  Low        
380 United Kingdom  0.105  Low        
381 United Kingdom  0.0123 Low        
382 United Kingdom -1.17   Low        
383 United Kingdom NA      <NA>       
384 United Kingdom  0.217  High       
385 United Kingdom  0.348  High       
386 United Kingdom  0.559  High       
387 United Kingdom -0.0607 Low        
388 United Kingdom NA      <NA>       
389 United Kingdom NA      <NA>       
390 United Kingdom  0.355  High       
391 United Kingdom -0.621  Low        
392 United Kingdom  0.121  High       
393 United Kingdom  0.765  High       
394 United Kingdom  0.0722 Low        
395 United Kingdom -0.795  Low        
396 United Kingdom -0.439  Low        
397 United Kingdom -1.44   Low        
398 United Kingdom -0.0748 Low        
399 United Kingdom -0.514  Low        
400 United Kingdom  2.14   High       
401 United Kingdom -1.23   Low        
402 United Kingdom  0.297  High       
403 United Kingdom  0.704  High       
404 United Kingdom  0.336  High       
405 United Kingdom -1.00   Low        
406 United Kingdom  1.12   High       
407 United Kingdom  0.407  High       
408 United Kingdom  1.10   High       
409 United Kingdom -0.366  Low        
410 United Kingdom NA      <NA>       
411 United Kingdom  1.82   High       
412 United Kingdom  1.68   High       
413 United Kingdom  1.05   High       
414 United Kingdom -0.0146 Low        
415 United Kingdom  0.406  High       
416 United Kingdom -0.428  Low        
417 United Kingdom  0.264  High       
418 United Kingdom -0.584  Low        
419 United Kingdom -0.932  Low        
420 United Kingdom -0.972  Low        
421 United Kingdom  0.671  High       
422 United Kingdom -0.0961 Low        
423 United Kingdom  0.562  High       
424 United Kingdom  2.07   High       
425 United Kingdom  0.328  High       
426 United Kingdom -0.420  Low        
427 United Kingdom -0.682  Low        
428 United Kingdom  1.23   High       
429 United Kingdom NA      <NA>       
430 United Kingdom -0.111  Low        
431 United Kingdom -1.35   Low        
432 United Kingdom  0.335  High       
433 United Kingdom -1.17   Low        
434 United Kingdom  0.334  High       
435 United Kingdom  0.777  High       
436 United Kingdom  0.531  High       
437 United Kingdom  0.394  High       
438 United Kingdom  1.28   High       
439 United Kingdom -0.849  Low        
440 United Kingdom  0.952  High       
441 United Kingdom  1.66   High       
442 United Kingdom  0.0159 Low        
443 United Kingdom -0.660  Low        
444 United Kingdom  0.595  High       
445 United Kingdom -0.425  Low        
446 United Kingdom -1.93   Low        
447 United Kingdom -0.501  Low        
448 United Kingdom -0.761  Low        
449 United Kingdom  0.513  High       
450 United Kingdom NA      <NA>       
451 United Kingdom  1.05   High       
452 United Kingdom -0.758  Low        
453 United Kingdom -0.451  Low        
454 United Kingdom -0.249  Low        
455 United Kingdom -0.485  Low        
456 United Kingdom  1.18   High       
457 United Kingdom NA      <NA>       
458 United Kingdom -0.0599 Low        
459 United Kingdom -0.817  Low        
460 United Kingdom -0.337  Low        
461 United Kingdom NA      <NA>       
462 United Kingdom -0.122  Low        
463 United Kingdom  1.67   High       
464 United Kingdom -0.0447 Low        
465 United Kingdom -1.15   Low        
466 United Kingdom -0.134  Low        
467 United Kingdom -0.109  Low        
468 United Kingdom -0.393  Low        
469 United Kingdom  0.121  High       
470 United Kingdom -0.910  Low        
471 United Kingdom NA      <NA>       
472 United Kingdom  0.0138 Low        
473 United Kingdom -0.744  Low        
474 United Kingdom  1.02   High       
475 United Kingdom -0.916  Low        
476 United Kingdom -0.0975 Low        
477 United Kingdom  0.139  High       
478 United Kingdom NA      <NA>       
479 United Kingdom  0.604  High       
480 United Kingdom NA      <NA>       
481 United Kingdom  0.316  High       
482 United Kingdom NA      <NA>       
483 United Kingdom -0.176  Low        
484 United Kingdom -0.491  Low        
485 United Kingdom  1.32   High       
486 United Kingdom -0.213  Low        
487 United Kingdom  0.289  High       
488 United Kingdom  1.29   High       
489 United Kingdom NA      <NA>       
490 United Kingdom -0.175  Low        
491 United Kingdom -0.0729 Low        
492 United Kingdom  2.28   High       
493 United Kingdom  0.721  High       
494 United Kingdom  0.915  High       
495 United Kingdom -0.107  Low        
496 United Kingdom  0.173  High       
497 United Kingdom -1.39   Low        
498 United Kingdom  0.195  High       
499 United Kingdom NA      <NA>       
500 United Kingdom  1.88   High       
# ℹ 12,472 more rows

1.4 Seminar activities

1.4.1 Task 1 Discussion activity

  • Discuss the design features of PISA (for example, sampling, forms of tests etc.) and the sources of error that arise from them.
  • As researchers, what issues should we bear in mind when interpreting the data? (Consider, for example, measures of wealth, gender and “competency”)
  • What caveats should policy makers bear in mind when making high stakes decisions based on the PISA measures (for example, what to include to curricula, where to target funding)?
Tip

Note that the PISA data collection protocol allows countries to exclude up to 5% of the relevant population (see the PISA 2018 technical report (OECD 2018), Annex A2), in particular allowing the exclusion from the data of either individual students by their disability status, or whole schools which provide specialist education (e.g. for blind students). Permitted exclusions include: “intellectual disability, i.e. a mental or emotional disability resulting in the student being so cognitively delayed that he/she could not perform in the PISA testing environment”, and “functional disability, i.e. a moderate to severe permanent physical disability resulting in the student being unable to perform in the PISA testing environment” along with other exclusions.

1.4.2 Task 2 Create a ranked list

Create a ranked list of countries by their mean science scores (PV1SCIE). What are the top five countries for science? Do the same for wealth (HOMEPOS). What patterns do you notice? Why might a researcher be critical of such rankings [Extension: Include the standard deviation of each country (hint: use the sd function) - can you detect any patterns?]

Tip

Note that the PISA 2022 links wealth to HOMEPOS (a self reported measure of possessions in the home). You might want to consider the implications of that definition for interpreting the data

Show the answer
# Create a ranked data data frame for science

PISA2022SciRank <- PISA_2022 %>%
  select(CNT, PV1SCIE) %>% # Select variables of interest
  group_by(CNT) %>% # group by country
  summarise(meansci = mean(PV1SCIE)) %>% 
     # summarise  country data to find the mean Sci score
  arrange(desc(meansci)) # arrange in descending order based on the meansci score

print(PISA2022SciRank)
# A tibble: 80 × 2
   CNT                          meansci
   <fct>                          <dbl>
 1 Singapore                       561.
 2 Japan                           546.
 3 Macao (China)                   543.
 4 Korea                           531.
 5 Estonia                         527.
 6 Chinese Taipei                  527.
 7 Hong Kong (China)               525.
 8 Czech Republic                  511.
 9 Australia                       508.
10 Poland                          505.
11 New Zealand                     505.
12 Ireland                         504.
13 Switzerland                     501.
14 Canada                          499.
15 United States                   498.
16 Finland                         498.
17 Germany                         495.
18 Belgium                         495.
19 Sweden                          494.
20 Austria                         494.
21 Spain                           493.
22 Latvia                          493.
23 United Kingdom                  492.
24 Hungary                         492.
25 Portugal                        488.
26 Slovenia                        487.
27 Netherlands                     487.
28 Croatia                         483.
29 France                          481.
30 Italy                           481.
31 Denmark                         480.
32 Lithuania                       480.
33 Norway                          479.
34 Türkiye                         476.
35 Viet Nam                        473.
36 Malta                           470.
37 Slovak Republic                 467.
38 Israel                          464.
39 Chile                           463.
40 Ukrainian regions (18 of 27)    454.
41 Iceland                         448.
42 Serbia                          447.
43 Greece                          445.
44 Brunei Darussalam               445.
45 Kazakhstan                      441.
46 Romania                         436.
47 United Arab Emirates            436.
48 Uruguay                         433.
49 Thailand                        429.
50 Qatar                           429.
51 Bulgaria                        422.
52 Colombia                        421.
53 Malaysia                        417.
54 Republic of Moldova             417.
55 Argentina                       415.
56 Mongolia                        411.
57 Costa Rica                      411.
58 Peru                            411.
59 Mexico                          411.
60 Brazil                          406.
61 Montenegro                      405.
62 Jamaica                         396.
63 Indonesia                       395.
64 Saudi Arabia                    390.
65 Georgia                         386.
66 Panama                          385.
67 North Macedonia                 382.
68 Baku (Azerbaijan)               382.
69 Albania                         376.
70 El Salvador                     375.
71 Jordan                          375.
72 Guatemala                       375.
73 Paraguay                        372.
74 Palestinian Authority           367.
75 Morocco                         363.
76 Dominican Republic              362.
77 Uzbekistan                      355.
78 Philippines                     354.
79 Kosovo                          354.
80 Cambodia                        340.
Show the answer
# And repeat the ranking for wealth

PISA2022WealthRank <- PISA_2022 %>%
  select(CNT, HOMEPOS) %>% # Select variables of interest
  group_by(CNT) %>% # group by country
  summarise(meanwel = mean(HOMEPOS, na.rm=TRUE)) %>% 
     # summarise  country data to find the mean Sci score
  arrange(desc(meanwel)) # arrange in descending order based on the meansci score

print(PISA2022WealthRank)
# A tibble: 80 × 2
   CNT                           meanwel
   <fct>                           <dbl>
 1 Norway                        0.547  
 2 Australia                     0.483  
 3 Korea                         0.371  
 4 New Zealand                   0.367  
 5 Canada                        0.348  
 6 Iceland                       0.346  
 7 Sweden                        0.327  
 8 Ireland                       0.318  
 9 Malta                         0.308  
10 Austria                       0.280  
11 Netherlands                   0.255  
12 Denmark                       0.237  
13 Switzerland                   0.221  
14 Czech Republic                0.194  
15 Slovenia                      0.186  
16 Estonia                       0.178  
17 Finland                       0.162  
18 Germany                       0.149  
19 Singapore                     0.124  
20 United Kingdom                0.116  
21 United States                 0.115  
22 Hungary                       0.104  
23 Italy                         0.0887 
24 Belgium                       0.0866 
25 Poland                        0.0825 
26 Portugal                      0.0755 
27 Spain                         0.0739 
28 Israel                        0.0499 
29 Latvia                        0.00480
30 Lithuania                    -0.0451 
31 Croatia                      -0.117  
32 France                       -0.139  
33 United Arab Emirates         -0.157  
34 Slovak Republic              -0.187  
35 Hong Kong (China)            -0.198  
36 Japan                        -0.226  
37 Serbia                       -0.229  
38 Macao (China)                -0.253  
39 Greece                       -0.264  
40 Montenegro                   -0.276  
41 Brunei Darussalam            -0.356  
42 Bulgaria                     -0.368  
43 Chile                        -0.388  
44 Chinese Taipei               -0.395  
45 Romania                      -0.399  
46 Qatar                        -0.442  
47 North Macedonia              -0.490  
48 Ukrainian regions (18 of 27) -0.550  
49 Kosovo                       -0.621  
50 Saudi Arabia                 -0.689  
51 Uruguay                      -0.747  
52 Argentina                    -0.806  
53 Georgia                      -0.809  
54 Jamaica                      -0.834  
55 Republic of Moldova          -0.846  
56 Albania                      -0.859  
57 Kazakhstan                   -0.870  
58 Malaysia                     -0.908  
59 Costa Rica                   -0.979  
60 Baku (Azerbaijan)            -0.980  
61 Mexico                       -1.07   
62 Türkiye                      -1.08   
63 Thailand                     -1.17   
64 Brazil                       -1.22   
65 Colombia                     -1.26   
66 Viet Nam                     -1.29   
67 Uzbekistan                   -1.30   
68 Dominican Republic           -1.31   
69 Mongolia                     -1.31   
70 Panama                       -1.32   
71 Jordan                       -1.38   
72 Peru                         -1.40   
73 Palestinian Authority        -1.49   
74 Paraguay                     -1.52   
75 Guatemala                    -1.52   
76 El Salvador                  -1.57   
77 Indonesia                    -1.58   
78 Philippines                  -1.75   
79 Morocco                      -1.77   
80 Cambodia                     -2.41   
Show the answer
# With standard deviations

PISA2022SciRank <- PISA_2022 %>%
  select(CNT, PV1SCIE) %>% # Select variables of interest
  group_by(CNT) %>% # group by country
  summarise(meansci = mean(PV1SCIE), 
            sdsci = sd(PV1SCIE)) %>% 
  # summarise  country data to find the mean Sci score
  arrange(desc(meansci)) # arrange in descending order based on the meansci score

print(PISA2022SciRank)
# A tibble: 80 × 3
   CNT                          meansci sdsci
   <fct>                          <dbl> <dbl>
 1 Singapore                       561.  99.6
 2 Japan                           546.  92.7
 3 Macao (China)                   543.  86.6
 4 Korea                           531. 104. 
 5 Estonia                         527.  87.7
 6 Chinese Taipei                  527. 102. 
 7 Hong Kong (China)               525.  91.1
 8 Czech Republic                  511. 103. 
 9 Australia                       508. 107. 
10 Poland                          505.  94.2
11 New Zealand                     505. 108. 
12 Ireland                         504.  92.0
13 Switzerland                     501.  97.9
14 Canada                          499.  98.8
15 United States                   498. 109. 
16 Finland                         498. 111. 
17 Germany                         495. 105. 
18 Belgium                         495.  99.9
19 Sweden                          494. 108. 
20 Austria                         494.  99.1
21 Spain                           493.  90.1
22 Latvia                          493.  84.6
23 United Kingdom                  492. 102. 
24 Hungary                         492.  94.7
25 Portugal                        488.  89.7
26 Slovenia                        487.  93.9
27 Netherlands                     487. 112. 
28 Croatia                         483.  92.0
29 France                          481. 106. 
30 Italy                           481.  92.0
31 Denmark                         480.  96.9
32 Lithuania                       480.  92.5
33 Norway                          479. 106. 
34 Türkiye                         476.  89.1
35 Viet Nam                        473.  78.4
36 Malta                           470. 102. 
37 Slovak Republic                 467. 103. 
38 Israel                          464. 109. 
39 Chile                           463.  94.9
40 Ukrainian regions (18 of 27)    454.  88.7
41 Iceland                         448.  94.8
42 Serbia                          447.  88.3
43 Greece                          445.  89.0
44 Brunei Darussalam               445.  93.5
45 Kazakhstan                      441.  84.4
46 Romania                         436.  96.2
47 United Arab Emirates            436. 108. 
48 Uruguay                         433.  92.4
49 Thailand                        429.  93.1
50 Qatar                           429.  96.3
51 Bulgaria                        422.  94.7
52 Colombia                        421.  88.2
53 Malaysia                        417.  77.9
54 Republic of Moldova             417.  82.5
55 Argentina                       415.  86.3
56 Mongolia                        411.  77.7
57 Costa Rica                      411.  80.4
58 Peru                            411.  85.4
59 Mexico                          411.  75.0
60 Brazil                          406.  93.3
61 Montenegro                      405.  83.4
62 Jamaica                         396.  92.0
63 Indonesia                       395.  69.9
64 Saudi Arabia                    390.  72.2
65 Georgia                         386.  81.6
66 Panama                          385.  84.9
67 North Macedonia                 382.  82.8
68 Baku (Azerbaijan)               382.  78.7
69 Albania                         376.  81.2
70 El Salvador                     375.  73.4
71 Jordan                          375.  73.7
72 Guatemala                       375.  65.4
73 Paraguay                        372.  74.5
74 Palestinian Authority           367.  70.9
75 Morocco                         363.  66.2
76 Dominican Republic              362.  68.7
77 Uzbekistan                      355.  63.3
78 Philippines                     354.  77.0
79 Kosovo                          354.  64.8
80 Cambodia                        340.  50.3
Show the answer
PISA2022WealthRank <- PISA_2022%>%
  select(CNT, HOMEPOS)%>% # Select variables of interest
  group_by(CNT) %>% # group by country
  summarise(meanwel = mean(HOMEPOS, na.rm=TRUE),
            sdwel = sd(HOMEPOS, na.rm=TRUE)) %>% 
  # summarise  country data to find  mean wealth score
  arrange(desc(meanwel)) 
  # arrange in descending order based on the meanwel score
print(PISA2022WealthRank)
# A tibble: 80 × 3
   CNT                           meanwel sdwel
   <fct>                           <dbl> <dbl>
 1 Norway                        0.547   0.970
 2 Australia                     0.483   0.861
 3 Korea                         0.371   1.01 
 4 New Zealand                   0.367   0.862
 5 Canada                        0.348   0.867
 6 Iceland                       0.346   0.805
 7 Sweden                        0.327   0.878
 8 Ireland                       0.318   0.818
 9 Malta                         0.308   0.857
10 Austria                       0.280   0.938
11 Netherlands                   0.255   0.802
12 Denmark                       0.237   0.815
13 Switzerland                   0.221   0.895
14 Czech Republic                0.194   0.852
15 Slovenia                      0.186   0.833
16 Estonia                       0.178   0.740
17 Finland                       0.162   0.862
18 Germany                       0.149   0.946
19 Singapore                     0.124   0.840
20 United Kingdom                0.116   0.919
21 United States                 0.115   0.927
22 Hungary                       0.104   0.914
23 Italy                         0.0887  0.803
24 Belgium                       0.0866  0.869
25 Poland                        0.0825  0.794
26 Portugal                      0.0755  0.901
27 Spain                         0.0739  0.805
28 Israel                        0.0499  1.03 
29 Latvia                        0.00480 0.774
30 Lithuania                    -0.0451  0.832
31 Croatia                      -0.117   0.719
32 France                       -0.139   0.972
33 United Arab Emirates         -0.157   1.04 
34 Slovak Republic              -0.187   0.990
35 Hong Kong (China)            -0.198   0.881
36 Japan                        -0.226   0.761
37 Serbia                       -0.229   0.768
38 Macao (China)                -0.253   0.845
39 Greece                       -0.264   0.820
40 Montenegro                   -0.276   0.917
41 Brunei Darussalam            -0.356   1.01 
42 Bulgaria                     -0.368   1.04 
43 Chile                        -0.388   0.959
44 Chinese Taipei               -0.395   0.985
45 Romania                      -0.399   0.991
46 Qatar                        -0.442   1.07 
47 North Macedonia              -0.490   0.915
48 Ukrainian regions (18 of 27) -0.550   0.785
49 Kosovo                       -0.621   0.941
50 Saudi Arabia                 -0.689   1.04 
51 Uruguay                      -0.747   0.924
52 Argentina                    -0.806   1.01 
53 Georgia                      -0.809   0.972
54 Jamaica                      -0.834   1.12 
55 Republic of Moldova          -0.846   0.890
56 Albania                      -0.859   1.01 
57 Kazakhstan                   -0.870   0.832
58 Malaysia                     -0.908   0.962
59 Costa Rica                   -0.979   1.27 
60 Baku (Azerbaijan)            -0.980   0.971
61 Mexico                       -1.07    1.00 
62 Türkiye                      -1.08    1.02 
63 Thailand                     -1.17    1.13 
64 Brazil                       -1.22    0.956
65 Colombia                     -1.26    1.12 
66 Viet Nam                     -1.29    0.922
67 Uzbekistan                   -1.30    0.960
68 Dominican Republic           -1.31    0.977
69 Mongolia                     -1.31    0.949
70 Panama                       -1.32    1.21 
71 Jordan                       -1.38    1.18 
72 Peru                         -1.40    1.20 
73 Palestinian Authority        -1.49    1.25 
74 Paraguay                     -1.52    1.13 
75 Guatemala                    -1.52    1.31 
76 El Salvador                  -1.57    1.08 
77 Indonesia                    -1.58    0.911
78 Philippines                  -1.75    1.13 
79 Morocco                      -1.77    1.19 
80 Cambodia                     -2.41    1.08 

1.4.3 Task 3 Plot distributions of wealth scores

Use a scatter plot to show the correlation between HOMEPOS and ESCS. Use a facet_wrap to show the charts for the UK, Japan, Colombia and Sweden. Discuss the different relationships between the two variables across the countries.

Tip

Note that the PISA variable, Economic, Social and Cultural Status ESCS is based on highest parental occupation (‘HISEI’), highest parental education (‘PARED’), and home possessions (‘HOMEPOS’), including books in the home. Do consider the implications of this definition.

Show the answer
# Create a data frame with the ESCS, gender (ST004D01T) and HOMEPOS variables for the 4 countries 

WealthcompPISA<-PISA_2022 %>%
  select(CNT, ESCS, HOMEPOS, ST004D01T)%>%
  filter(CNT == "Japan" | CNT == "United Kingdom" | CNT == "Colombia" | CNT == "Sweden")

# Use ggplot to create a scatter graph
# Set the x variable to ESCS and the y to HOMEPOS, set the colour to gender
# Set point size and transparency
# Facet wrap to produce graphs for each country

ggplot(WealthcompPISA, aes(x = ESCS, y = HOMEPOS, colour=ST004D01T))+
  geom_point(size=0.1, alpha=0.5)+
  facet_wrap(.~CNT)

1.4.4 Task 4 Plot distributions of scores

  • Use geom_density to plot distributions to plot the distribution of Japanese and UK mathematics scores - what patterns do you notice?
Tip

To plot a distribution, you can use geom_density to plot a distribution curve. In ggplot you specify the data, and then in aes set the x-value (the variable of interest, and set the fill to change by different groups). Within the geom_density call you can specify the alpha, the opacity of the plot.

For example, to plot science scores in the UK by gender, you would use the code below:

# Create a data frame of UK science scores including gender

UKSci<-PISA_2022 %>%
  select(CNT, PV1SCIE, ST004D01T) %>%
  filter(CNT == "United Kingdom")

# Plot the density chart, changing colour by gender, and setting the alpha (opacity) to 0.5
ggplot(data = UKSci,
       aes(x = PV1SCIE, fill = ST004D01T)) +
  geom_density(alpha = 0.5)

Show the answer
# Create a data frame of UK and Japanese mathematics scores

JPUKMath<-PISA_2022 %>%
  select(CNT, PV1MATH) %>%
  filter(CNT == "United Kingdom"|CNT == "Japan")

# Plot the density chart, changing colour by country, and setting the alpha (opacity) to 0.5
ggplot(data = JPUKMath,
       aes(x = PV1MATH, fill = CNT)) +
  geom_density(alpha = 0.5)

1.4.5 Task 5 Plot distributions of scores by gender

  • Examine gender differences: Plot the distributions of mathematics achievement in the UK by gender. What patterns can you see?
Show the answer
UKMathGender <- PISA_2022 %>%
  select(CNT, PV1MATH, ST004D01T) %>%
  filter(CNT == "United Kingdom")

ggplot(data = UKMathGender,
       aes(x = PV1MATH, fill = ST004D01T)) +
  geom_density(alpha = 0.5)

1.4.6 Task 6 Facet wrap by country

Plot density graphs of gender differences in mathematics scores in the UK, Spain, Japan, Korea and Finland. Hint use facet_wrap(.~CNT)

Show the answer
# Create a data frame of mathematics scores, gender and country
# Filter by the five countries of interest

MathGender <- PISA_2022 %>%
  select(CNT, PV1MATH, ST004D01T) %>%
  filter(CNT == "United Kingdom"|CNT == "Spain"|CNT == "Japan"
         | CNT=="Korea"|CNT == "Finland")

# Plot a density graph of mathematics scores, splitting into groups, with coloured fills by gender. Set transparency to 0.5 to show overlap 

ggplot(data = MathGender,
       aes(x = PV1MATH, fill = ST004D01T)) +
  geom_density(alpha = 0.5) +
  facet_wrap(.~CNT)

1.4.7 Task 7 Plot a scatter graph

Plot a scatter graph of mean mathematics achievement (y-axis) by mean wealth (x-axis) with each country as a single point. Hint: You will first need to use group_by and then summarise to create a data frame of mean scores.

Tip

Note that the competency tests for Vietnam in PISA are all NA at the student level. This is because many students finish compulsory schooling before 15. Hence, we add an na.omit to remove the data from Vietnam

Show the answer
# Create a summary data frame
# Group by country, and then summarise the mean meath and wealth scores

Wealthdata <- PISA_2022 %>%
  select(CNT, HOMEPOS, PV1MATH) %>%
  filter(CNT!="Vietnam")%>%  # To cut Vietnam due to lack of data
  group_by(CNT) %>%
  summarise(MeanWealth=mean(HOMEPOS, na.rm = TRUE),
            MeanMath=mean(PV1MATH, na.rm = TRUE))

# Use ggplot to create a scatter graph

ggplot(data = Wealthdata,
       aes(x = MeanWealth, y = MeanMath)) +
  geom_point(alpha = 0.5, colour="red") +
  xlab("Home Possessions (Wealth proxy)") +
  ylab("Mathematics score")

In the previous scatter of mathematics vs wealth scores, highlight outlier countries (any score of over 500) in a different colour. Hint, mutate the data frame to include a label column (by the condition of the maths score being over 550). Then set the colour in ggplot by theis label column.

Show the answer
# Create a summary data frame
# Group by country, and then summarise the mean math and wealth scores

Wealthdata <- PISA_2022 %>%
  select(CNT, HOMEPOS, PV1MATH) %>%
  group_by(CNT) %>%
  filter(CNT!="Vietnam")%>%
  summarise(MeanWealth = mean(HOMEPOS, na.rm = TRUE),
            MeanMath = mean(PV1MATH, na.rm = TRUE)) %>%
  mutate(label=ifelse(MeanMath > 500, "Red", "Blue")) # mutate to add a label
# the column label is "Red" if MeanMath > 500 and "Blue" otherwise

# Use ggplot to create a scatter graph

ggplot(data = Wealthdata,
       aes(x = MeanWealth, y = MeanMath, colour = label)) +
  geom_point() +
  xlab("Wealth") +
  ylab("Mathematics score")

Add the country names as a label to the outliers. Hint: add an additional column labelname to which the country name as.charachter(CNT) is added if the MeanMath score is over 500. Hint: you can use geom_label_repel to add the labels. You can set: (aes(label = labelname), colour = "black", check_overlap = TRUE) to give the source of the lables (labelname) the colour and to force the lables not to overlap.

Show the answer
# Mutate to give a new column labelname, set to the country name (CNT) if Meanmath is over 500, or NA if not.
Wealthdata <- PISA_2022 %>%
  select(CNT, HOMEPOS, PV1MATH) %>%
  group_by(CNT) %>%
  filter(CNT!="Vietnam")%>%
  summarise(MeanWealth = mean(HOMEPOS, na.rm = TRUE),
            MeanMath = mean(PV1MATH, na.rm = TRUE)) %>%
  mutate(label = ifelse(MeanMath>500, "Red", "Blue")) %>%
  mutate(labelname = ifelse(MeanMath>500, as.character(CNT), NA))
  
# Use geom_label_repel to add the labelname column to the graph
ggplot(data = Wealthdata,
       aes(x = MeanWealth, y = MeanMath, colour = label)) +
  geom_point() +
  geom_label_repel(aes(label = labelname), 
            colour = "black", 
            check_overlap = TRUE) +
  xlab("Wealth") +
  ylab("Mathematics score") 

1.4.8 Task 8 Plot Likert responses using facet wrapping

Examine Likert responses by country using facet plot.

For ST125Q01NA - How old were you when you started early childhood education? Plot responses, first, for the whole data set, then facet plot for the UK, Germany, Belgium, Austria, France, Poland, Estonia, Finland and Italy.

• What international differences can you note?

Show the answer
# Create a data frame of childhood education data for the whole data frame 
ChildhoodEd<-PISA_2022 %>%
  select(CNT, ST125Q01NA) %>%
  group_by(CNT)

# Plot a bar graph of responses  

ggplot(data = ChildhoodEd,
       aes(x = ST125Q01NA, fill = ST125Q01NA)) +
  geom_bar() +
  xlab("How old were you when you started early childhood education?") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

Then use faceting to split the plots by country

Show the answer
# Repeat filtering for UK, Germany, Belgium, Austria, France, Poland, Estonia, Finland and Italy

ChildhoodEd <- PISA_2022 %>%
  select(CNT, ST125Q01NA) %>%
  filter(CNT == "United Kingdom"|CNT == "Germany" | CNT == "Belgium"
         | CNT == "Austria"| CNT == "France" | CNT == "Poland"
         | CNT == "Estonia" | CNT=="Finland"| CNT=="Italy")

# Plot the data and facet wrap by country

ggplot(data = ChildhoodEd,
       aes(x = ST125Q01NA, fill = CNT))+
  geom_bar()+
  xlab("How old were you when you started early childhood education?") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
  facet_wrap(. ~ CNT)

1.4.9 Task 9 Categorise HOMEPOS scores

Categorising Variables

Split the HOMEPOS variable for the UK and Germany into the following groups:

HOMEPOS Name of category
>1 Very High
0>HOMEPOS<1 High
0< Low

Plot bar graphs of participants in these categories for both countries.

• What differences can you observe between the countries?

Hint: You can use mutate with case_when to do the categorisation. For example in combination with teh mutate to create the new column maths_scores_category, we use case_when(PV1MATH < 400 ~ "Low" to set the maths_scores_category to Low when PV1MATH is below 400. Then maths_scores_category becomes High if the score is between 400 and 500 (note the use of & and the repeat of PV1MATH: PV1MATH >= 400 & PV1MATH > 500. Here <= means less than or equal to).

Show the answer
# Create a data frame for the UK and Germany
# Mutate the wealth_cat (wealth category) column by the boundaries of wealth categories
Wealth <- PISA_2022 %>%
  select(CNT, HOMEPOS) %>%
  filter(CNT == "United Kingdom" | CNT == "Germany") %>%
  mutate(wealth_cat = case_when(HOMEPOS < 0 ~ "Low",
                                HOMEPOS >= 0 & HOMEPOS < 1 ~ "High",
                                HOMEPOS >= 1 ~ "Very High",
                                .default = NA)) %>%
  group_by(CNT) %>%
  droplevels()

# You can set the factors to a logical order for plotting
# The default is alphabetical which gives High, Low, Very High which 
# doesn't make sense

Wealth$wealth_cat <- factor(Wealth$wealth_cat, levels = c("Low", "High", "Very High"))

# Plot the data
ggplot(data = Wealth, 
       aes(x = wealth_cat, fill = wealth_cat))+
  geom_bar()+
  facet_wrap(.~CNT)+
  xlab("Wealth grouping")

1.4.10 Task 10 Compare the association between mathematics and science PV values across three diverse countries

Plot scatter plots of science versus mathematics achievement in United Kingdom, Qatar and Brazil. What differences can you see between the countries?

Show the answer
# Create a data frame of science and mathematics scores, across the countries Including gender)

SciMaths <- PISA_2022 %>%
  select(CNT, PV1MATH, PV1SCIE, ST004D01T) %>%
  filter(CNT == "Colombia" | CNT == "New Zealand" | CNT == "Qatar"|
           CNT == "Israel") %>%
  droplevels()

# Scatter plot the data, faceting by country

ggplot(data = SciMaths, 
       aes(x = PV1MATH, y = PV1SCIE, colour = ST004D01T))+
  geom_point(size = 0.1, alpha = 0.5)+
  facet_wrap(.~CNT)

Show the answer
# Low achieving (filter for scores less than 400)

SciMaths <- PISA_2022 %>%
  select(CNT, PV1MATH, PV1SCIE, ST004D01T) %>%
  filter(CNT == "Colombia" | CNT == "New Zealand" | CNT == "Qatar"|
           CNT == "Israel") %>%
  filter(PV1MATH < 400)%>%
  filter(PV1SCIE < 400)%>%
  droplevels()

ggplot(data = SciMaths, 
       aes(x = PV1MATH, y = PV1SCIE, colour = ST004D01T))+
  geom_point(size = 0.1, alpha = 0.5)+
  facet_wrap(.~CNT)

References

Anders, Jake, Silvan Has, John Jerrim, Nikki Shure, and Laura Zieger. 2021. “Is Canada Really an Education Superpower? The Impact of Non-Participation on Results from PISA 2015.” Educational Assessment, Evaluation and Accountability 33: 229–49.
Avvisati, Francesco. 2020. “The Measure of Socio-Economic Status in PISA: A Review and Some Suggested Improvements.” Large-Scale Assessments in Education 8 (1): 1–37.
Gillis, Shelley, John Polesel, and Margaret Wu. 2016. “PISA Data: Raising Concerns with Its Use in Policy Settings.” The Australian Educational Researcher 43: 131–46.
Hopmann, Stefan Thomas, Gertrude Brinek, and Martin Retzl. 2007. “PISA According to PISA: Does PISA Keep What It Promises.” Reihe Schulpädagogik Und Pädagogische Psychologie, Bd 6.
Jerrim, John. 2016. “PISA 2012: How Do Results for the Paper and Computer Tests Compare?” Assessment in Education: Principles, Policy & Practice 23 (4): 495–518.
———. 2021. “PISA 2018 in England, Northern Ireland, Scotland and Wales: Is the Data Really Representative of All Four Corners of the UK?” Review of Education 9 (3): e3270.
Jerrim, John, Luis Alejandro Lopez-Agudo, Oscar D Marcenaro-Gutierrez, and Nikki Shure. 2017. “To Weight or Not to Weight?: The Case of PISA Data.” In Proceedings of the XXVI Meeting of the Economics of Education Association, Murcia, Spain, 29–30.
Jerrim, John, John Micklewright, Jorg-Henrik Heine, Christine Salzer, and Caroline McKeown. 2018. “PISA 2015: How Big Is the ‘Mode Effect’and What Has Been Done about It?” Oxford Review of Education 44 (4): 476–93.
OECD. 2018. “Technical Report.” OECD, Paris. https://www.oecd.org/pisa/data/pisa2018technicalreport/PISA2018-TecReport-Ch-01-Programme-for-International-Student-Assessment-An-Overview.pdf.
———. 2019. PISA 2018 Results (Volume I). https://doi.org/10.1787/5f07c754-en.
OECD. 2022. PISA 2022 Results (Volume i). OECD. https://www.oecd-ilibrary.org/docserver/53f23881-en.pdf.
Rutkowski, Leslie, and David Rutkowski. 2016. “A Call for a More Measured Approach to Reporting and Interpreting PISA Results.” Educational Researcher 45 (4): 252–57.
Zieger, Laura Raffaella, John Jerrim, Jake Anders, and Nikki Shure. 2022. “Conditioning: How Background Variables Can Influence PISA Scores.” Assessment in Education: Principles, Policy & Practice 29 (6): 632–52.