PISA

1 What is PISA

The Programme for International Student Assessment (PISA) is an OECD initiative that looks at the reading, mathematics and science abilities of students aged 15 years old. Data is collected from ~38 OECD countries and other partner countries every three years.

Dataset Description 03 06 09 12 15 18 22
Student demographic data on student participants x x x x x x x
School descriptive data about schools x x x x x x x
Parent a survey for student’s parents including information about home environments and parental education x x
Teacher demographic, teaching, qualification and training data x x x x
Cognitive individual results for each exam style question students took x x x x x x

PISA datasets above can be found on the OECD website. The links in the table above will allow you to download .parquet versions of these files which we have created, though they might need additional editing, e.g. reducing the number of columns or changing the types of each column. If you want to find out more about what each field stores, take a look at the corresponding codebook: 2022, 2018, 2015.

2 How to use it

The PISA datasets come in SPSS or SAS formats. The data used in this course comes directly from downloading the SPSS .sav files and using the haven package to clean it into a native R format suitable for analysis, in most cases .parquet files (see: §.parquet files). There are a few quirks that you need to be aware of:

  • R uses levels (factors) instead of labelled data
  • All SPSS fields are labelled, and auto conversion into the native R dataframe format would make numeric fields, factors(!?). To avoid this confusion we have stripped out the no-response data for numeric fields and replaced it with NA values. This means that you won’t be able to tell the reason that a field is missing, but most of the original data doesn’t appear to use the majority of these levels, i.e. there are few reasons given for missing data. The following labels have all been set to NA:
Labels set to NA in .parquet files
value label
95 Valid Skip
97 Not Applicable
98 Invalid
99 No Response
  • As the fields are now using R’s native factor format you might find that the data doesn’t quite match the format of the table labels. For example, CNT is labelled “Country code 3-character”, but the data is now instead the full country name.
  • the examples shown in the book use cut down PISA datasets, where only a limited number of columns are included. The full datasets are linked in the table above.

3 Common issues

The PISA datasets can be absolutely huge and might bring your computer to its knees; if you are using a computer with less than 16GB of RAM you might not be able to load some tables at all. Tables such as the Cognitive dataset have hundreds of thousands of rows and thousands of columns, loading them directly might lead to an error similar to this: Error: cannot allocate vector of size 2.1 Gb. This means that R can’t find enough RAM to load the dataset and has given up. You can see a rough estimate of how much RAM R is currently using the top Environment panel:

To get around this issues you can try to clean your current R environment using the brush tool:

This will drop all the current dataframes, objects, functions and packages that you have loaded meaning you will have to reload packages such as library(tidyverse) and library(haven) before you can attempt to reload the PISA tables.

A lack of RAM might also be the result of lots of other programs running concurrently on your computer. Try to close anything that you don’t need, web browsers can be particularly RAM hungry, so close them or as many tabs as you can.

If none of the above works, then please get in touch with the team, letting them know which table you need from which year, with which fields and for which countries. We will be able to provide you with a cutdown dataset.

4 Questions

4.1 What are Plausible Values?

In the PISA dataset, the outcomes of student tests are reported as plausible values, for example, in the variables of the science test (PV1SCIE, PV2SCIE, PV3SCIE, PV3SCIE, and PV5SCIE). It might seem counter intuitive that there are five or ten values for a score on a test. (note PISA published five plausible values before 2015, when they increased the number of plausible values to ten)

Plausible values (PVs) are a way of expressing the error in a measurement. The number of questions in the full PISA survey is very large, so students are randomly allocated to take a subset of questions (and even then, the test still takes two hours!). As no student completes the full set of questions (only 40% of students even answer questions in reading, science and mathematics OECD (2014)), estimating how a student would have performed on the full question set involves some error. Plausible values are a way of expressing the uncertainty in the estimation of student scores.

One way of thinking of the PV scores is that they represent five or ten different estimates of students’ abilities based on the questions they have answered. To decrease measurement error, five different approaches are applied to create five different estimates, the PV scores.

The PISA Data Analysis Manual suggests:

Population statistics should be estimated using each plausible value separately. The reported population statistic is then the average of each plausible value statistic. For instance, if one is interested in the correlation coefficient between the social index and the reading performance in PISA, then five correlation coefficients should be computed and then averaged

Plausible values should never be averaged at the student level, i.e. by computing in the dataset the mean of the five plausible values at the student level and then computing the statistic of interest once using that average PV value. Doing so would be equivalent to an EAP estimate, with a bias as described in the previous section.

(Monseur et al. 2009, 100)

The actual PV values can differ substantially from each, for example if we work out the scaled (normalised) value for a PV value, that is how many standard deviations from the mean of other PV values a student gets we can see large fluctuations. For student 5, PV1SCIE is 0.147 standard deviations lower than the average of all students, whilst PV2SCIE is 1.10 standard deviations higher:

PISA_2022 %>% 
  mutate(PV1SCIE_z = scale(PV1SCIE),
         PV2SCIE_z = scale(PV2SCIE)) %>%
  select(PV1SCIE_z,
         PV2SCIE_z)
# A tibble: 613,744 × 2
   PV1SCIE_z[,1] PV2SCIE_z[,1]
           <dbl>         <dbl>
 1        -1.09         -1.47 
 2        -1.29         -1.17 
 3        -0.872        -1.00 
 4        -2.24         -2.45 
 5        -0.147         1.10 
 6         0.272         0.298
 7        -1.03         -1.52 
 8        -1.22         -0.865
 9        -0.854        -0.840
10        -0.614        -0.221
# ℹ 613,734 more rows

These differences make analysis at an individual student level hard to justify, though overall population or sub population scores will be more reliable. As suggested above, we could look at using the average (here, taken to be the mean) of several PV scores. To calculate the mean of ten PV values, do the following (it might take some time to complete this calculation!):

PISA_2022 %>% 
  rowwise() %>%
  mutate(PV_SCIE_z = mean(c_across(matches("PV[0-9]+SCIE"))),
         PV_MATH_z = mean(c_across(matches("PV[0-9]+MATH"))),
         PV_READ_z = mean(c_across(matches("PV[0-9]+READ")))) %>%
  ungroup() %>% # to get rid of the rowwise operator
  select(PV_SCIE_z, PV_MATH_z, PV_READ_z)

4.2 What are weights

The PISA data set contains two weighting variables that allow you to look at student outcomes that represent a country population accurately, and to compare countries against each other.

4.2.1 Student weights

The first weight is the student weight W_FSTUWT, which represents the probability of an individual student being selected within a given country. Taking mean results from students who took the test in a country might not be fully representative of all the students in a country. For example, in the UK we might have more students sampled in Scotland than in England, because the school population of England (~9.1 million) is much larger than Scotland (~790 thousand). This is the case with 4,763 students sampled in England and 3,257 sampled in Scotland; as a proportion the Scottish population is too big. For any calculation, we would want to reduce the impact of each Scottish student on the overall UK results, otherwise they would be over represented:

Code
PISA_2022 %>% 
  filter(CNT == "United Kingdom") %>%
  group_by(REGION) %>%
  summarise(n=n(),
            W_FSTUWT_mean = mean(W_FSTUWT)) %>%
  gt() %>%
  fmt_number(columns = "n", decimals = 0) %>%
  fmt_number(columns = "W_FSTUWT_mean", decimals = 2) %>%
  cols_align(columns = "REGION", align="left")
REGION n W_FSTUWT_mean
Great Britain: England 4,763 131.04
Great Britain: Northern Ireland 2,384 9.47
Great Britain: Wales 2,568 12.73
Great Britain: Scotland 3,257 15.91

The ratio of students in England to students in Scotland is roughly 9,100,100 : 790,000 or 11.5. Applying the students weights for England and Scotland we get:

(4763 * 131.04) / (3257 * 15.91)
[1] 12.04471

This is close but not quite the same as the population ratio. The reason is because there are other factors that influence the student weighting, including situations where certain school types are under represented and other types over represented, for example there being more urban schools in the sample than in the population, this would then require students in rural schools to be given a greater weighting than students in urban schools. The exact art of working out these weights is hard to come by.

So how do you use student weights in a calculation? Let’s look at the example of finding the mean poverty level of country, by looking at HOMEPOS. Calculating this without the weights gives us the following:

Code
PISA_2022 %>% 
  group_by(CNT) %>%
  summarise(HOMEPOS_m = mean(HOMEPOS, na.rm=TRUE),
            HOMEPOS_sd = sd(HOMEPOS, na.rm=TRUE)) %>%
  arrange(HOMEPOS_m)
# A tibble: 80 × 3
   CNT                   HOMEPOS_m HOMEPOS_sd
   <fct>                     <dbl>      <dbl>
 1 Cambodia                  -2.41      1.08 
 2 Morocco                   -1.77      1.19 
 3 Philippines               -1.75      1.13 
 4 Indonesia                 -1.58      0.911
 5 El Salvador               -1.57      1.08 
 6 Guatemala                 -1.52      1.31 
 7 Paraguay                  -1.52      1.13 
 8 Palestinian Authority     -1.49      1.25 
 9 Peru                      -1.40      1.20 
10 Jordan                    -1.38      1.18 
# ℹ 70 more rows

Now to apply the weights, students in each country will have their HOMEPOS score adjusted before the

Code
PISA_2022 %>% 
  group_by(CNT) %>%
  summarise(HOMEPOS_m = mean(HOMEPOS * W_FSTUWT, na.rm=TRUE) / mean(W_FSTUWT),
            HOMEPOS_sd = sd(HOMEPOS * W_FSTUWT, na.rm=TRUE) / mean(W_FSTUWT)) %>%
  arrange(HOMEPOS_m)
# A tibble: 80 × 3
   CNT                   HOMEPOS_m HOMEPOS_sd
   <fct>                     <dbl>      <dbl>
 1 Cambodia                  -2.34       1.90
 2 Philippines               -1.75       1.23
 3 Morocco                   -1.74       1.22
 4 Indonesia                 -1.70       2.31
 5 El Salvador               -1.63       1.58
 6 Paraguay                  -1.56       1.40
 7 Guatemala                 -1.54       1.60
 8 Palestinian Authority     -1.53       1.52
 9 Peru                      -1.44       1.38
10 Thailand                  -1.43       1.70
# ℹ 70 more rows

4.2.2 Senate weights

The data set contains another weighting variable, the senate weight [COL_ID]. To ensure that countries make equal contributions to regression models when they have different response rates, senate weights are published (Jerrim et al. 2017). Senate weights renormalise the weights within each country so that the total for a country sums to a constant value, giving each country the same weight in an overall analysis (Rijmen 2011).

4.3 How do I use the plausible values and weights correctly?

To simplify our teaching, we focus on using a single PV value, PV1___ in our calculations. This is not the recommended use, but simplifies our introduction to the PISA data. Here we set out how to perform a more complete analysis.

The first step in analysis is to apply the weights. In the code below, we select the needed columns, country, the student weight column (W_FSTUWT) and the ten PV values (in this case for math). The mutate(across line multiples each of the ten PV values by the value in W_FSTUWT, the student weight, and adds new columns with _weighted appended.

PISA_2022 %>%
  select(CNT, W_FSTUWT, PV1MATH:PV10MATH) %>%
  mutate(across(PV1MATH:PV10MATH, ~ .x * W_FSTUWT, .names = "{.col}_weighted"))

Next we group_by(CNT) and summarise to get a total_weight for each country, by adding all the individual student weights (W_FSTUWT). To get a mean PV1, PV2, etc score for each country, we then sum all the weighted PV1 scores and divide by the total weight, and do the same for PV2, Pv3 etc. These are given the names PV1MATH_weighted_mean, PV2MATH_weighted_mean, etc.

PISA_2022 %>%
  select(CNT, W_FSTUWT, PV1MATH:PV10MATH) %>%
  mutate(across(PV1MATH:PV10MATH, ~ .x * W_FSTUWT, .names = "{.col}_weighted")) %>%
  group_by(CNT) %>%
  summarise(
    total_weight = sum(W_FSTUWT, na.rm = TRUE),
    across(PV1MATH_weighted:PV10MATH_weighted, ~ sum(.x, na.rm = TRUE) / total_weight, .names = "{.col}_mean"))

Finally, we calculated the mean of the PV1MATH_weighted_mean, PV2MATH_weighted_mean, … PV10MATH_weighted_mean columns. I have arranged the results in descending order.

PISA_2022 %>%
  select(CNT, W_FSTUWT, PV1MATH:PV10MATH) %>%
  mutate(across(PV1MATH:PV10MATH, ~ .x * W_FSTUWT, .names = "{.col}_weighted")) %>%
  group_by(CNT) %>%
  summarise(
    total_weight = sum(W_FSTUWT, na.rm = TRUE),
    across(PV1MATH_weighted:PV10MATH_weighted, ~ sum(.x, na.rm = TRUE) / total_weight, .names = "{.col}_mean")) %>%
  rowwise() %>%
  mutate(PV_mean_weighted = mean(c_across(ends_with("_mean")), na.rm = TRUE)) %>%
  select(CNT, PV_mean_weighted) %>%
  mutate(PV_mean_weighted = round(PV_mean_weighted, digits = 1)) %>%
  arrange(desc(PV_mean_weighted))
# A tibble: 80 × 2
# Rowwise: 
   CNT               PV_mean_weighted
   <fct>                        <dbl>
 1 Singapore                     575.
 2 Macao (China)                 552.
 3 Chinese Taipei                547.
 4 Hong Kong (China)             540.
 5 Japan                         536.
 6 Korea                         527.
 7 Estonia                       510.
 8 Switzerland                   508 
 9 Canada                        497.
10 Netherlands                   493.
# ℹ 70 more rows

This output matches the PISA published values for 2022.

When performing tests (for example, t-tests or linear regressions) the recommended approach is to:Rubin’s rules for multiple imputation as recommended by the OECD OECD (2009a) : a) estimate a statistic multiple times, once each for each plausible value; b) average the values produced; c) estimate the magnitude of the imputation error; and d) calculate the final standard error from the sampling error and the imputation error.

Hence, if you were performing a linear model you should a) weight the raw scores you intend to use; b) run the model ten times, once each for each of the plausible values; c) calculate the average outcome metrics. For example, your code might look like this:

# a function that returns the models for a given formula
# formula must include exactly one PV
multi_pv_lm <- function(model_data,
                        pv_formula,
                        pv_vals = 1:10,
                        weight_id = "W_FSTUWT"){

  pv_focus_orig <- str_extract(pv_formula, "PV[0-9]+[A-Z]{4}")
  pv_focus <- pv_focus_orig %>% str_remove("PV[0-9]*")
  message("focusing on ", pv_focus)

  if(weight_id %in% names(model_data)){
    message(glue("Weighting by {weight_id}"))
  } else {
    message(glue("No {weight_id} weight variable found"))
    return(NULL)
  }
  
  # List of plausible value column names
  pv_columns <- paste0("PV", pv_vals, pv_focus)
  
  results <- map_dfr(pv_columns, \(pv){
    pv_formula_map <- str_replace(pv_formula, pv_focus_orig, pv)

    message(pv)
    # run model
    model <- lm(as.formula(pv_formula_map), 
                data = model_data, 
                weights = model_data[[weight_id]])
    # extract summary
    model_summary <- summary(model)
    
    # return rows
    broom::tidy(model_summary) %>% 
      mutate(R2 = model_summary$r.squared) %>%
      mutate(PV = pv)
  })
  
  results_flat <- results %>% 
    mutate(term = ifelse(str_detect(term, "PV[0-9]+[A-Z]{4}"), 
                         glue("PV_{pv_focus}"), 
                         term)) %>%
    group_by(term) %>% 
    summarise(estimate_mean = mean(estimate),
              estimate_sd = sd(estimate),
              std_error_mean = mean(std.error),
              std_error_sd = sd(std.error),
              statistic_mean = mean(statistic),
              statistic_sd = sd(statistic),
              p_value_mean = mean(p.value),
              p_value_sd = sd(p.value),
              r2 = max(R2))
  
  return(list(results=results, 
              results_flat=results_flat))
}

# we can call the model comparison, like so:
mdl_pvs_math <- multi_pv_lm(PISA_2022, "PV1MATH ~ HOMEPOS + ESCS")

# You can then access the full model responses through:
mdl_pvs_math$results
# A tibble: 30 × 7
   term        estimate std.error statistic p.value    R2 PV     
   <chr>          <dbl>     <dbl>     <dbl>   <dbl> <dbl> <chr>  
 1 (Intercept)    459.      0.131    3515.        0 0.290 PV1MATH
 2 HOMEPOS         34.3     0.154     223.        0 0.290 PV1MATH
 3 ESCS            10.7     0.153      69.8       0 0.290 PV1MATH
 4 (Intercept)    459.      0.130    3524.        0 0.289 PV2MATH
 5 HOMEPOS         34.4     0.154     224.        0 0.289 PV2MATH
 6 ESCS            10.5     0.153      68.4       0 0.289 PV2MATH
 7 (Intercept)    459.      0.130    3524.        0 0.290 PV3MATH
 8 HOMEPOS         34.2     0.154     223.        0 0.290 PV3MATH
 9 ESCS            10.8     0.153      70.6       0 0.290 PV3MATH
10 (Intercept)    459.      0.130    3530.        0 0.291 PV4MATH
# ℹ 20 more rows
# or the mean values for model outcomes with:
mdl_pvs_math$results_flat
# A tibble: 3 × 10
  term      estimate_mean estimate_sd std_error_mean std_error_sd statistic_mean
  <chr>             <dbl>       <dbl>          <dbl>        <dbl>          <dbl>
1 (Interce…         459.       0.128           0.130     0.000189         3523. 
2 ESCS               10.7      0.140           0.153     0.000221           69.9
3 HOMEPOS            34.4      0.0896          0.154     0.000223          224. 
# ℹ 4 more variables: statistic_sd <dbl>, p_value_mean <dbl>, p_value_sd <dbl>,
#   r2 <dbl>
mdl_pvs_read <- multi_pv_lm(PISA_2022, "ESCS ~ HOMEPOS + PV1READ")

# TODO: https://bookdown.org/mwheymans/bookmi/rubins-rules.html

4.4 Where can I find examples of PISA test items?

The OECD do not release the full question set used in the PISA science, reading and mathematics tests in order to allow questions to be reused across cycles to allow valid inferences. However, you can find a document of items used in PISA 2000, 2003, 2006 and some test items here which give a flavour of the nature of the tests.

4.5 Why are some countries OECD countries and others aren’t?

The Organisation for Economic Co-operation and Development (OECD) has 38 member states. PISA is run by the OECD and its member states normally take part in each PISA cycle, but other countries are allowed to take part as Partners. You can find more details on participation here.

Results for OECD members are generally higher than for Partner countries:

PISA_2022 %>% 
  group_by(OECD) %>% 
  summarise(country_n = length(unique(CNT)),
            math_mean = mean(PV1MATH, na.rm=TRUE),
            math_sd = sd(PV1MATH, na.rm=TRUE),
            students_n = n())
# A tibble: 2 × 5
  OECD  country_n math_mean math_sd students_n
  <fct>     <int>     <dbl>   <dbl>      <int>
1 No           43      409.    97.8     318587
2 Yes          37      475.    95.0     295157

4.6 Why are the PV grades pivoting around the ~500 mark?

The scores for students in mathematics, reading and science are scaled so that the mean of students in OECD countries is roughly 500 points with a standard deviation of 100 points. To see this, run the following code:

PISA_2022 %>% 
  filter(OECD=="Yes") %>% 
  summarise(math_mean = mean(PV1MATH, na.rm=TRUE),
            math_sd = sd(PV1MATH, na.rm=TRUE),
            scie_mean = mean(PV1SCIE, na.rm=TRUE),
            scie_sd = sd(PV1SCIE, na.rm=TRUE),
            read_mean = mean(PV1READ, na.rm=TRUE),
            read_sd = sd(PV1READ, na.rm=TRUE))
# A tibble: 1 × 6
  math_mean math_sd scie_mean scie_sd read_mean read_sd
      <dbl>   <dbl>     <dbl>   <dbl>     <dbl>   <dbl>
1      475.    95.0      487.    101.      478.    104.

4.7 But the mean PV score isn’t 500?!

The OECD’s initial plan (in the 2000 study) was that the mean PC score for OECD countries should be 500 and the standard deviation 100 (OECD 2019a). However, after the 2000 study, scores were scaled to be comparable with the first cycle of data, resulting in means differing from 500 (Pulkkinen and Rautopuro 2022). For example, by 2015, the mean for science had fallen to 493 in science and reading, and 490 in mathematics.

4.8 Why are the letters TA and NA used in some field names?

4.9 How do I find fields that are numeric?

# using the following code!

nms <- PISA_2022 %>% select(where(is.numeric)) %>% names()
lbls <- map_dfr(nms,\(nme){
  message(nme)
  lbl <- attr(PISA_2022[[nme]], "label")
  row <- c(nme, lbl)
  names(row) <- c("name", "label")
  return(row)
})

4.10 How are students selected to take part in PISA?

The students who take part in the PISA study are aged between 15 years and 3 (completed) months and 16 years and 2 (completed) months at the beginning of the testing period (OECD 2018). A number of classes of students are excluded from data collection:

  • Students classed as ‘functionally disabled’ so that they cannot participate in the test.
  • Judged by teachers to have cognitive, emotional or behavioural difficulties that mean they cannot participate.
  • The student lacks language abilities to take the test in the assessment language.
  • There are no test material available in the student’s language
  • Another agreed reason

The OECD expect that 85% of schools in their original sample participate - nonparticipating schools can be replaced with a substitute, ‘replacement’ school. A minimum weighted response rate of 80% is required within schools.

The sampling strategy for PISA is a stratified two-stage sample design. That is schools are sampled to represent proportional distribution by size (referring to the number of enrolled 15-year-olds) sampling. Within schools, students are sampled with equal probability.

Students are selected by … (ref?)

Add Christian Bokhove papers https://bokhove.net/r-materials/

From the data, you can see that 50% of schools entered fewer than 30 students into PISA.

PISA_2022 %>% 
  group_by(CNTSCHID) %>%
  summarise(size = n()) %>%
  mutate(quartile = ntile(size, 4)) %>%
  group_by(quartile) %>%
  summarise(Qmax = max(size),
            Qmedian = median(size),
            Qmean = mean(size))
# A tibble: 4 × 4
  quartile  Qmax Qmedian Qmean
     <int> <int>   <dbl> <dbl>
1        1    19      10  10.1
2        2    30      25  25.2
3        3    37      34  33.8
4        4   475      40  44.4
ggplot(PISA_2022 %>% 
  group_by(CNTSCHID) %>%
  summarise(size = n()), aes(x=size)) +
  geom_density()

4.11 What are the PISA test questions like?

You can view sample PISA science, reading and mathematics questions here.

4.12 How can I find the ethnicity or race of a student taking the PISA test?

This data isn’t collected by PISA. Instead they collect information on the language spoken at home (LANGN) and the language of the test (LANGTEST_QQQ), as well as the immigration status and country of birth (COBN_S student, COBN_M mother, COBN_F father). Details on ethnicity and outcomes in the England are published through the country specific research report for 2018. Note that Chinese students are categorised under “Other” rather than “Asian”.

4.13 What are the PISA domains?

Every PISA test has included test items measuring literacy, numeracy and science. In each cycle, one of three areas is the focus of study (the major domain). In addition, extra domains have been added to cycles (for example, creative thinking and collaborative problem solving). The additional domains are shown in the table below.

Year Major Domain Minor Domains
2000 Reading literacy Mathematics, Science
2003 Mathematics Reading literacy, Science, Cross-curricular problem solving
2006 Science Reading literacy, Mathematics
2009 Reading literacy Mathematics, Science
2012 Mathematics Reading literacy, Science, Creative problem solving
2015 Science Mathematics, Reading literacy, Collaborative problem solving
2018 Reading literacy Mathematics, Science, Global Competence
2022 Mathematics Reading literacy, Science, Creative thinking
2025 Science Mathematics, Reading literacy, Learning in the Digital World

4.14 Why is China given the CNT value B-S-J-Z (China) (2018) or B-S-J-G (China) (2015)?

B-S-J-G/Z (China) is an acronym for Beijing, Shanghai, Jiangsu and Guangdong/Zhejiang, the four provinces/municipalities of the People’s Republic of China that take part in PISA data collection. Zhejiang took the place of Guangdong in the 2018 dataset. Several authors (including (Du and Wong 2019)) comment that sampling only from some of the most developed regions of China means the country’s data is unlikely to be nationally representative.

4.15 Where is mainland China in PISA 2022?

Chinese provinces/municipalities (Beijing, Shanghai, Jiangsu and Zhejiang) and Lebanon are participants in PISA 2022 but were unable to collect data because schools were closed during the intended data collection period. - PISA 2022 participants

4.16 How do I calculate weighted means of the PV scores?

You can use a function written by Miguel Diaz Kusztrick, here is his slightly tidied function for calculating weighted means and standard deviations (original link):

# Copyright Miguel Diaz Kusztrick
wght_meansd_pv <- function(sdata, pv, weight, brr) {
    mmeans  <- c(0, 0, 0, 0)
    names(mmeans) <- c("MEAN","SE-MEAN","STDEV","SE-STDEV")
    
    mmeanspv <- rep(0,length(pv))
    stdspv   <- rep(0,length(pv))
    mmeansbr <- rep(0,length(pv))
    stdsbr   <- rep(0,length(pv))
    sum_weight <- sum(sdata[,weight])
    
    for (i in 1:length(pv)) {
        mmeanspv[i] <- sum(sdata[,weight]*sdata[,pv[i]])/sum_weight
        stdspv[i]   <- sqrt((sum(sdata[,weight]*(sdata[,pv[i]]^2))/swght)-mmeanspv[i]^2)
        for (j in 1:length(brr)) {
            sbrr<-sum(sdata[,brr[j]])
            mbrrj<-sum(sdata[,brr[j]]*sdata[,pv[i]])/sbrr
            mmeansbr[i]<-mmeansbr[i] + (mbrrj - mmeanspv[i])^2
            stdsbr[i]<-stdsbr[i] + (sqrt((sum(sdata[,brr[j]]*(sdata[,pv[i]]^2))/sbrr)-mbrrj^2) - stdspv[i])^2
        }       
    }
    mmeans[1] <- sum(mmeanspv) / length(pv)
    mmeans[2] <- sum((mmeansbr * 4) / length(brr)) / length(pv)
    mmeans[3] <- sum(stdspv) / length(pv)
    mmeans[4] <- sum((stdsbr * 4) / length(brr)) / length(pv)
    ivar <- c(0,0)
    
    for (i in 1:length(pv)) {
        ivar[1] <- ivar[1] + (mmeanspv[i] - mmeans[1])^2;
        ivar[2] <- ivar[2] + (stdspv[i] - mmeans[3])^2;
    }
    ivar = (1 + (1 / length(pv))) * (ivar / (length(pv) - 1));
    mmeans[2] <- sqrt(mmeans[2] + ivar[1]);
    mmeans[4] <- sqrt(mmeans[4] + ivar[2]);
    return(mmeans);
}

5 PISA quirks

5.1 Empty fields

All 2022 school responses to questions about the clubs and extra curricular activities run in a school SC053Q____ are coded as NA, as are SC207____. It’s not clear why this data is included in the dataset or whether this data should have values but doesn’t. These (albeit empty) fields are included in the full PISA_school_2022.parquet file linked above.

Code
club_flds <- c("SC053Q01TA","SC053Q02TA","SC053Q03TA","SC053Q04TA","SC053Q05NA",
               "SC053Q06NA","SC053Q07TA","SC053Q08TA","SC053Q09TA","SC053Q10TA")

PISA_2022_school %>% 
  select(c("CNT", starts_with("SC053Q"), starts_with("SC207"))) %>% 
  group_by(CNT) %>%
  pivot_longer(-CNT, 
               names_to = "club",
               values_to = "present") %>%
  filter(!is.na(present)) %>%
  pull(club) %>% 
  unique()

# Note: SC053D11TA is present:
# <This academic year>,follow. activities/school offers<national modal grade for 15-year-olds>? <country specific item>

Additionally, creativity fields stored in ST334_____, ST340_____, ST341_____, PA185_____ and CREA____ on the student questionnaire are missing answers for all countries:

Code
PISA_2022 %>% 
  select(c("CNT", "IMAGINE", 
           starts_with("ST334"),
           starts_with("ST340"), 
           starts_with("ST341"),
           starts_with("PA185"),
           starts_with("CREA"))) %>% 
  mutate(across(everything(), as.numeric)) %>%
  group_by(CNT) %>%
  pivot_longer(-CNT, 
               names_to = "creativity",
               values_to = "present") %>%
  filter(!is.na(present)) %>%
  pull(creativity) %>% 
  unique()

5.2 Cyprus present but missing

Cyprus is still present in the levels of CNT even though PISA hasn’t recorded data on Cyprus since 2012. Other countries that didn’t participate in the 2022 round have been removed from the levels, e.g. China.

Code
countries <- PISA_2022 %>% pull(CNT) %>% unique()
country_lvls <- PISA_2022 %>% pull(CNT) %>% levels()
setdiff(country_lvls, countries)

5.3 Great Britain vs the United Kingdom

The United Kingdom is the country referred to when correctly combining the results of England, Scotland, Wales and Northern Ireland. However, the regions of the United Kingdom listed by the OECD are “Great Britain:” followed by England, Scotland, Wales and Northern Ireland. Northern Ireland isn’t part of Great Britain.

Code
PISA_2022 %>% select(CNT, REGION) %>% 
  filter(grepl("Great Britain", REGION)) %>% distinct()
# A tibble: 4 × 2
  CNT            REGION                         
  <fct>          <fct>                          
1 United Kingdom Great Britain: Wales           
2 United Kingdom Great Britain: Northern Ireland
3 United Kingdom Great Britain: England         
4 United Kingdom Great Britain: Scotland        

6 Interesting papers and reading on PISA

There are a number of useful OECD reports including:

John Jerrim and colleagues have written a number of papers which provide commentary on PISA analysis

  • PISA 2012: how do results for the paper and computer tests compare? Jerrim (2016)
  • To weight or not to weight?: The case of PISA data Jerrim et al. (2017)
  • PISA 2015: how big is the ‘mode effect’ and what has been done about it? Jerrim et al. (2018)
  • PISA 2018 in England, Northern Ireland, Scotland and Wales: Is the data really representative of all four corners of the UK? Jerrim (2021)
  • Is Canada really an education superpower? The impact of non-participation on results from PISA 2015 Anders et al. (2021)
  • Has Peak PISA passed? An investigation of interest in International Large-Scale Assessments across countries and over time Jerrim (2023)
  • Conditioning: how background variables can influence PISA scores Zieger et al. (2022)

Other PISA Papers of Interest

  • PISA according to PISA: Does PISA keep what it promises? Cordingley (2008)
  • The policy impact of PISA: An exploration of the normative effects of international benchmarking in school system performance Breakspear (2012)
  • A call for a more measured approach to reporting and interpreting PISA results Rutkowski and Rutkowski (2016)
  • PISA data: Raising concerns with its use in policy settings Gillis, Polesel, and Wu (2016)
  • Differential item functioning in PISA due to mode effects Feskens, Fox, and Zwitser (2019)
  • The measure of socio-economic status in PISA: A review and some suggested improvements Avvisati (2020)
  • Conditioning: How background variables can influence PISA scores Zieger et al. (2022)
  • A critique of how socio-economic status is measured in PISA Avvisati (2020)

References

Anders, Jake, Silvan Has, John Jerrim, Nikki Shure, and Laura Zieger. 2021. “Is Canada Really an Education Superpower? The Impact of Non-Participation on Results from PISA 2015.” Educational Assessment, Evaluation and Accountability 33: 229–49.
Avvisati, Francesco. 2020. “The Measure of Socio-Economic Status in PISA: A Review and Some Suggested Improvements.” Large-Scale Assessments in Education 8 (1): 1–37.
Breakspear, Simon. 2012. “The Policy Impact of PISA: An Exploration of the Normative Effects of International Benchmarking in School System Performance.”
Cordingley, P. 2008. “Research and Evidence-Informed Practice: Focusing on Practice and Practitioners.” Cambridge Journal of Education 38 (1): 37–52. https://doi.org/10.1080/03057640801889964.
Du, Xin, and Billy Wong. 2019. “Science Career Aspiration and Science Capital in China and UK: A Comparative Study Using PISA Data.” International Journal of Science Education 41 (15): 2136–55.
Feskens, Remco, Jean-Paul Fox, and Robert Zwitser. 2019. “Differential Item Functioning in PISA Due to Mode Effects.” Theoretical and Practical Advances in Computer-Based Educational Measurement, 231–47.
Gillis, Shelley, John Polesel, and Margaret Wu. 2016. “PISA Data: Raising Concerns with Its Use in Policy Settings.” The Australian Educational Researcher 43: 131–46.
Jerrim, John. 2016. “PISA 2012: How Do Results for the Paper and Computer Tests Compare?” Assessment in Education: Principles, Policy & Practice 23 (4): 495–518.
———. 2021. “PISA 2018 in England, Northern Ireland, Scotland and Wales: Is the Data Really Representative of All Four Corners of the UK?” Review of Education 9 (3): e3270.
———. 2023. “Has Peak PISA Passed? An Investigation of Interest in International Large-Scale Assessments Across Countries and over Time.” European Educational Research Journal, 14749041231151793.
Jerrim, John, Luis Alejandro Lopez-Agudo, Oscar D Marcenaro-Gutierrez, and Nikki Shure. 2017. “To Weight or Not to Weight?: The Case of PISA Data.” In Proceedings of the XXVI Meeting of the Economics of Education Association, Murcia, Spain, 29–30.
Jerrim, John, John Micklewright, Jorg-Henrik Heine, Christine Salzer, and Caroline McKeown. 2018. “PISA 2015: How Big Is the ‘Mode Effect’and What Has Been Done about It?” Oxford Review of Education 44 (4): 476–93.
Monseur, Christian et al. 2009. “PISA Data Analysis Manual: SPSS Second Edition.” https://www.oecd-ilibrary.org/docserver/9789264056275-en.pdf?expires=1672909117&id=id&accname=guest&checksum=3AD95B021546E6CB9B93D8895B011056.
OECD. 2009a. PISA 2006 Technical Report. OECD.
OECD. 2009b. PISA Data Analysis Manual: SPSS, Second Edition.” PISA, March. https://doi.org/10.1787/9789264056275-en.
———. 2014. “PISA 2012 Technical Report.” OECD, Paris. https://www.oecd.org/pisa/pisaproducts/PISA-2012-technical-report-final.pdf.
———. 2018. “Technical Report.” OECD, Paris. https://www.oecd.org/pisa/data/pisa2018technicalreport/PISA2018-TecReport-Ch-01-Programme-for-International-Student-Assessment-An-Overview.pdf.
———. 2019a. PISA 2018 Results (Volume I).” PISA, December. https://doi.org/10.1787/a9b5930a-en.
———. 2019b. PISA 2018 technical background.” PISA, December. https://doi.org/10.1787/89178eb6-en.
Pulkkinen, Jonna, and Juhani Rautopuro. 2022. “The Correspondence Between PISA Performance and School Achievement in Finland.” International Journal of Educational Research 114: 102000.
Rijmen, Frank. 2011. “Hierarchical Factor Item Response Theory Models for PIRLS: Capturing Clustering Effects at Multiple Levels.” IERI Monograph Series: Issues and Methodologies in Large-Scale Assessments 4: 59–74.
Rubin, D. B. 1987. Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons.
Rutkowski, Leslie, and David Rutkowski. 2016. “A Call for a More Measured Approach to Reporting and Interpreting PISA Results.” Educational Researcher 45 (4): 252–57.
Zieger, Laura Raffaella, John Jerrim, Jake Anders, and Nikki Shure. 2022. “Conditioning: How Background Variables Can Influence PISA Scores.” Assessment in Education: Principles, Policy & Practice 29 (6): 632–52.