Bayesian Statistics

0.1 The Frequentist Approach to Statistics

The statistical methods you have encountered to this point, the t-tests, chi-squared tests and linear models, are examples of an approach to statistics labelled frequentist statistics. Typically these methods report p-values which indicate the probability, assuming that the null hypothesis is true, of obtaining similar or more extreme results. Frequentist methods calculate the probability of the occurrence of events based on the outcome of many trials.

However, frequentist statistics, because of the way the outcomes of tests are presented, can be hard to interpret. Consider the case of two sets of exam results for class A and B which are taught by different methods. We wish to determine if there is a statistically significant difference in means, to see if one approach to teaching is better than the other, so we perform a t-test.

ClassA<-c(54,53,72,31,25,34,87,45,26,42,54,58,80)
ClassB<-c(82,94,88,76,82,60,88,90,75,77,96,61,74)

t.test(ClassA, ClassB)


    Welch Two Sample t-test

data:  ClassA and ClassB
t = -4.6276, df = 19.013, p-value = 0.0001833
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -42.67437 -16.09486
sample estimates:
mean of x mean of y 
 50.84615  80.23077

In this case, the p-value of the frequentist t-test is 0.0002, less than 0.05. We conclude that, if we assume the null hypothesis (that the means of the classes are the same), there is a probability of 0.0002 (1 in 5000) than results like this, or more extreme, could have arisen by chance.

The p-value here can be thought of as indicating the probability of a false positive. That is, it reports the probability that a difference in means of this size or larger could have arise by chance.

The trouble with the frequentist p-value is that is doesn’t tell us what we typically want to know - what is the probability that the scores in the condition group are different from the control (and hence there is a difference in the teaching approaches). The Frequentist approach reports, after a number of trials, the limit of the relative frequency of an event. For example, after 13 students in each class have sat the test, and assuming that the two classes have the same mean score, there is a 1 in 5000 chance that the results like this, or more extreme, would occur. This is not an intuitive approach to reporting probability - the Bayesian approach offers an alternative.

0.2 The Bayesian Approach to Statistics

The Bayesian approach was developed by the English statistician and Presbyterian minister, Thomas Bayes. The Bayesian approach adopts proposes a different model to expressing the outcome of statistical tests. Bayesian statistics report a probability which expresses a degree of belief an event. Central to Bayesian statistics is the assumption that the degree of belief should be revised based on prior knowledge.

“’The key ideas of Bayesian statistics: that probability is orderly opinion, and that inference from data is nothing other than the revision of such opinion in the light of relevant new information” - (edwards1963?)

Bayes’ theorem can be stated mathematically as:\(P(A\vert B)=\frac{P(B \vert A)P(A)}{P(B)}, \text{ if } P(B) \neq 0\)

A and B represent two events. \(P(A)\) represents the probability of A occurring and \(P(A)\) the probability of B happening. The formula assumes that \(P(B)\ne0\), that is there is some probability the event B will happen.

\(P(A\vert B)\) is a conditional probability - that is, the probability that A will occur given that B has occurred.

\(P(B\vert A)\) is another conditional probability - this time, the probability that B will occur given that A has occurred.

We can see how the formula works in an example:

Imagine that the prevalence of students who have profound and multiple learning difficulties (PMLD) in state secondary schools in the UK is 1%. We might also know that the screening process for PMLD gives a false positive (i.e. incorrectly categorises a student with no learning difficulties with PMLD) 5% of the time.

A student receives a diagnosis of PMLD - what is the probability that they have PMLD?

We can use the formula to find the probability that the student has PMLD, given that they have a positive test \(P(HasPMLD\vert PositiveTest)\). But let us first consider the calculation intuitively.

\(P(HasPMLD\vert PositiveTest)=\frac{P(PositiveTest \vert HasPMLD)P(HasPMLD)}{P(PositiveTest)}\)

If we select 1000 people at random for PMLD testing, 10 will have PMLD. 990 students will not have PMLD. Of those 990 people, 5% will receive a positive screening. 5% of 990 is 49.5 people. So there are 10 (correct screenings) + 49.5 (false positives) = 59.5 people who test positive. The probability that someone has PMLD, given a positive test is \(P(HasPMLD\vert PositiveTest)\) is \(10/59.5=0.17\) surprisingly low.

We can use the formula to get to the same result:

\(P(HasPMLD\vert PositiveTest)=\frac{P(PositiveTest \vert HasPMLD)P(HasPMLD)}{P(PositiveTest)}\)

\(P(PositiveTest \vert HasPMLD)=1\) \(P(HasPMLD)=0.01\)

\(P(PositiveTest)=(1*10/1000)+(0.05*990/1000)=0.0595\)

\(P(HasPSMLD\vert PositiveTest)=\frac{(1)(0.01)}{(0.0595)}=0.17\)

The example indicates the value of taking prior information into account when making inferences. Intuitively, it might be expected that, with the low rate of false positives (5%) a positive test would mean a high certainty of diagnosis. However, combined with knowledge of the prevalence of PMLD in the population, the probability of having PSMLD after a positive test is only around 1 in 5.

0.2.1 The Bayesian t-test

The BayesFactor package can be used to perform a Bayesian version of the t-test.

library(BayesFactor)
# Create two dummy class exam scores, as dataframes this time, as that is what the Bayesian t-test taskes as an input
ClassA <- data.frame(Scores = c(54,53,72,31,25,34,87,45,26,42,54,58,80))
ClassB <- data.frame(Scores = c(82,94,88,76,82,60,88,90,75,77,96,61,74))

# Label the classes A and B
ClassA <- ClassA %>%
  mutate(Class="A")

ClassB <- ClassB %>%
  mutate(Class="B")

# Combine into one data frame
Totaldata <- rbind(ClassA, ClassB)

# Perform the Bayesian test
ttestBF(data = Totaldata, formula = Scores ~ Class)

Bayes factor analysis
--------------
[1] Alt., r=0.707 : 194.1877 ±0%

Against denominator:
  Null, mu1-mu2 = 0 
---
Bayes factor type: BFindepSample, JZS

The value of interest here is the Bayes factor, 194.1877, followed by the confidence interval ±0%.The Bayes factor can be thought of as a measure of the relative likelihood of a difference (in this case, a different in scores between the groups). here the factor can be interpreted to mean: It is 194 times more likely that there is a difference between the two classes than there is no difference.

By comparison with a p-value, which acts only as a cut of for significance, the Bayes facor tells you the relative likelihood, a more useful piece of information.

(Note that to calculate the Bayesian t-test, some assumptions have to be made. One is the assumption of a distribution of effect sizes, in this case a kind of distribution known as a Cauchy distribution is assumed, r=0.707 refers to a parameter of that distribution).

Null, mu1-mu2 = 0 tells us that the null hypothesis is that the means (the mus) of the two groups are the same.

0.2.2 The Bayesian chi-square test

# Create two a dataframe 1=has degree, 0=has no degree
DegreeGender <- data.frame(Gender = c("Male","Male","Female","Male","Female","Female",
                                      "Female","Male","Male","Female","Male",
                                      "Female","Female","Male"),
                           Degree = c(1,0,1,1,0,0,0,1,0,0,1,0,1,0))

# Create a contingency table
conttab <- xtabs(data = DegreeGender, ~ Degree + Gender)

# Perform the Bayesian chi-square test
contingencyTableBF(conttab, sampleType = "jointMulti")

Bayes factor analysis
--------------
[1] Non-indep. (a=1) : 1.351891 ±0%

Against denominator:
  Null, independence, a = 1 
---
Bayes factor type: BFcontingencyTable, joint multinomial

# For comparison, perform the frequentist Chi square test
chisq.test(conttab)


    Pearson's Chi-squared test with Yates' continuity correction

data:  conttab
X-squared = 0.29167, df = 1, p-value = 0.5892

The outcome of the chi-square test, gives a Bayes factor of 1.351891. This value raises the question of how to interpret different values of Bayes factors. (kass1995?) have suggested these labels for interpreting Bayes factors:

Bayes Factor	Strength of evidence
<1	There is no difference.
=1	Presence or absence of a difference is equally likely
1 to 3.2	Not worth more than a bare mention
3.2-10	Substantial
10-100	Strong
>100	Decisive

In the case of the chi-square test above, the Bayes factor of 1.351891 falls into the category of ‘Not worth more than a bare mention’. This coheres with the frequentist chi-square result, p-value = 0.5892, which is interpreted as implying, given the null hypothesis, there are no statistically significant differences between the two groups.

0.2.3 The Bayesian linear model

We covered linear models in an earlier sessions using the lm function - there is an alternative Bayesian version, in the

# Load the package to perform the Bayesian linear model
library(brms)
# BRM is not the quickest Bayesian lm function, but it allows the calculation of an R2 equivalent which is really helpful

# Create a dummy dataframe with a random normally distributed variable, Var 1. Then add a second random variable offset from the first
testdata <- data.frame(Var1 = rnorm(50, mean=25, sd=10))

testdata <- testdata %>%
  mutate(Var2 = Var1 + rnorm(50, mean=5, sd=2))

# Plot the data
ggplot(data=testdata, aes(x=Var1, y=Var2)) +
  geom_point()

# Perform the frequentist linear model
mod <- lm(data=testdata, Var1 ~ Var2)
summary(mod)


Call:
lm(formula = Var1 ~ Var2, data = testdata)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.0338 -1.5068 -0.2623  1.5565  6.5450 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -3.74345    1.07971  -3.467  0.00112 ** 
Var2         0.95988    0.03644  26.340  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.114 on 48 degrees of freedom
Multiple R-squared:  0.9353,    Adjusted R-squared:  0.9339 
F-statistic: 693.8 on 1 and 48 DF,  p-value: < 2.2e-16

# Perform the Bayesian linear model
mod <- brm(data=testdata, Var1 ~ Var2)


SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 4.3e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.43 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.028 seconds (Warm-up)
Chain 1:                0.023 seconds (Sampling)
Chain 1:                0.051 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 7e-06 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.027 seconds (Warm-up)
Chain 2:                0.023 seconds (Sampling)
Chain 2:                0.05 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 7e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.027 seconds (Warm-up)
Chain 3:                0.023 seconds (Sampling)
Chain 3:                0.05 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 8e-06 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.028 seconds (Warm-up)
Chain 4:                0.023 seconds (Sampling)
Chain 4:                0.051 seconds (Total)
Chain 4:

summary(mod)

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: Var1 ~ Var2 
   Data: testdata (Number of observations: 50) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -3.74      1.13    -5.99    -1.43 1.00     3673     2597
Var2          0.96      0.04     0.88     1.04 1.00     3623     2445

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     2.17      0.23     1.77     2.68 1.00     3314     2722

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

# Calculate the Bayesian equivalent of R2
bayes_R2(mod)

    Estimate   Est.Error      Q2.5     Q97.5
R2 0.9336623 0.005752511 0.9181952 0.9391946

# The R2 for the model is 0.97 so 97% of the variance is explained

0.2.4 Using prior data in a Bayesian linear model

A central principle of Bayesian statistics is that a model is based on knowledge of prior outcomes. The Bayesian Regression Model function brm in the brms package extends the lm function by allowing us to provide a prior model of the independent variable. We do this by passing the mean and standard deviation of our independent variable to the set_prior function (we also specify the shape of the distribution, in our case, normal, and specify the name of our independent variable (PV1MATH)).

# To predict 2018 PISA data, I am going to use prior information from the 2015 results
#PISA2015studentbayes <- read_parquet("/Users/k1765032/Library/CloudStorage/GoogleDrive-richardandrewbrock@gmail.com/.shortcut-targets-by-id/1c3CkaEBOICzepArDfjQUP34W2BYhFjM4/PISR/Data/PISA/2015/PISA2015studentbayes.parquet")
# PISA_2015 <- read_parquet(glue("{datafolder}/PISA_2015_student_subset.parquet"))

# Find the prior distribution of UK mathematics scores we expect from previous years 
PISA_2015 %>%
  select(PV1MATH, CNT) %>%
  filter(CNT == "United Kingdom") %>%
  summarise(mean = mean(PV1MATH, na.rm=TRUE), 
            sd = sd(PV1MATH, na.rm=TRUE))

# A tibble: 1 × 2
   mean    sd
  <dbl> <dbl>
1  489.  87.8

library(brms)
# Specify the prior distribution using set_prior from the `BMS` package
prior2015 <- c(set_prior("normal(489, 87.8)", class = "b", coef = "PV1MATH"))

# Create the data frame of UK scores in 2018 to model
UKPISA_2018 <- PISA_2018 %>%
  filter(CNT=="United Kingdom")

# Use the Bayesian regression model function 'brm' to fir the model, specifying the prior
modbay <- brm(data=UKPISA_2018, 
              formula = PV1SCIE ~ PV1MATH, prior=prior2015)


SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 8.9e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.89 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.501 seconds (Warm-up)
Chain 1:                0.278 seconds (Sampling)
Chain 1:                0.779 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 5.1e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.51 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.486 seconds (Warm-up)
Chain 2:                0.26 seconds (Sampling)
Chain 2:                0.746 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 5.2e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.52 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.466 seconds (Warm-up)
Chain 3:                0.303 seconds (Sampling)
Chain 3:                0.769 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 5.7e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.57 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.625 seconds (Warm-up)
Chain 4:                0.269 seconds (Sampling)
Chain 4:                0.894 seconds (Total)
Chain 4:

summary(modbay)

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: PV1SCIE ~ PV1MATH 
   Data: UKPISA_2018 (Number of observations: 13818) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    76.97      2.72    71.61    82.30 1.00     4701     3333
PV1MATH       0.84      0.01     0.83     0.85 1.00     4721     3364

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma    57.03      0.34    56.37    57.72 1.00     4480     2896

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

bayes_R2(modbay)

    Estimate   Est.Error      Q2.5     Q97.5
R2 0.6377383 0.002939821 0.6318979 0.6434291

# Compare to the simple linear model
mod <- lm(data = UKPISA_2018, 
          formula = PV1SCIE ~ PV1MATH)
summary(mod)


Call:
lm(formula = PV1SCIE ~ PV1MATH, data = UKPISA_2018)

Residuals:
     Min       1Q   Median       3Q      Max 
-280.564  -36.100    0.702   36.408  240.894 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 76.943905   2.725385   28.23   <2e-16 ***
PV1MATH      0.842087   0.005399  155.97   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 57.03 on 13816 degrees of freedom
Multiple R-squared:  0.6378,    Adjusted R-squared:  0.6378 
F-statistic: 2.433e+04 on 1 and 13816 DF,  p-value: < 2.2e-16

Notice we get fairly similar models from the linear and Bayesian model

Another useful feature of Bayesian models is that the model parameters, e.g. the intercept and coefficient, are not represent only as single values, but as distributions. This gives greater insight into the fit of the model

# What's nice with a Bayesian model is our model parameters are not simply values, but probability distributions which gives more insight into the confidence in the model
# For example, lets predict science scores based on mathematics scores and reading scores
# Find the prior distribution of UK mathematics scores we expect from previous years 
PISA_2015 %>%
  select(PV1MATH,CNT) %>%
  filter(CNT == "United Kingdom") %>%
  summarise(mean = mean(PV1MATH, na.rm=TRUE), 
            sd   = sd(PV1MATH, na.rm=TRUE))

# A tibble: 1 × 2
   mean    sd
  <dbl> <dbl>
1  489.  87.8

PISA_2015 %>%
  select(PV1READ, CNT) %>%
  filter(CNT == "United Kingdom") %>%
  summarise(mean = mean(PV1READ, na.rm=TRUE), 
            sd   = sd(PV1READ, na.rm=TRUE))

# A tibble: 1 × 2
   mean    sd
  <dbl> <dbl>
1  496.  92.7

# Specify the prior distribution using set priors from the `BMS` package
prior2015 <- c(set_prior("normal(489, 87.8)", class = "b", coef = "PV1MATH"),
               set_prior("normal(496, 92.7)", class = "b", coef = "PV1READ"))

# Create the data frame of UK scores in 2018 to model
UKPISA_2018 <- PISA_2018 %>%
  filter(CNT=="United Kingdom")

# Use the Bayesian regression model function 'brm' to fir the model, specifying the prior
modbay <- brm(data = UKPISA_2018, 
              formula = PV1SCIE ~ PV1MATH + PV1READ, prior=prior2015)


SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 0.000113 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.13 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 1.244 seconds (Warm-up)
Chain 1:                0.617 seconds (Sampling)
Chain 1:                1.861 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 6.3e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.63 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 1.114 seconds (Warm-up)
Chain 2:                0.613 seconds (Sampling)
Chain 2:                1.727 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 6.4e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.64 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 1.138 seconds (Warm-up)
Chain 3:                0.587 seconds (Sampling)
Chain 3:                1.725 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 7.7e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.77 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 1.322 seconds (Warm-up)
Chain 4:                0.609 seconds (Sampling)
Chain 4:                1.931 seconds (Total)
Chain 4:

summary(modbay)

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: PV1SCIE ~ PV1MATH + PV1READ 
   Data: UKPISA_2018 (Number of observations: 13818) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    35.07      2.30    30.67    39.68 1.00     5613     3032
PV1MATH       0.39      0.01     0.37     0.40 1.00     1958     2389
 [ reached getOption("max.print") -- omitted 1 row ]

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma    46.72      0.28    46.17    47.28 1.00     3558     2829

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

bayes_R2(modbay)

    Estimate   Est.Error     Q2.5     Q97.5
R2 0.7569889 0.001773055 0.753423 0.7603302

# Plot the distribution of the coefficients
plot(modbay)

0.2.5 Generating multiple models and comparing them

The BayesFactor package allows you to generate multiple models, based on different combinations of variables in a data set and compare their predictive power. Note that this approach is not seen as a recommended approach:

Note

“Let the computer find out” is a poor strategy and usually reflects the fact that the researcher didnot bother to think clearly about the problem of interest and its scientific setting (burnham2007?)

Nonetheless, once you have thought clearly about a range of models ranking the potential Bayesian models can provide insights. This can be done using the regressionBF function of the BayesFactor package. We call models <- regressionBF(PV1SCIE ~ ., data = PISAUK) which asks for regression to predict science scores (PV1SCIE~) by all of the variables (.) in the data frame (data=PISAUK) Note that variables need to be converted to factor variables. The function returns a data.frame ranking the models, the one with the highest Bayes factor bf is a candidate for the best fit.

# Create a UK subset
PISAUK <- PISA_2018 %>%
  select(PV1MATH, PV1READ, PV1SCIE, WEALTH, ESCS, CNT) %>%
  filter(CNT=="United Kingdom") %>%
  select(-CNT) %>%
  na.omit()
# Convert variables to factors
# PISAUK$WEALTH <- as.factor(PISAUK$WEALTH)
# PISAUK$PV1SCIE   <- as.factor(PISAUK$PV1SCIE)
# PISAUK$PV1READ   <- as.factor(PISAUK$PV1READ)
# PISAUK$ESCS   <- as.factor(PISAUK$ESCS)


# Use regressionBF to create a dataframe of the models
models <- regressionBF(PV1SCIE ~ ., data = PISAUK)
# Select only the relevant part of the output
output <- as.data.frame(models@bayesFactor)
# Arrange the output
output <- output %>%
  arrange(desc(bf))
output

                                        bf        error
PV1MATH + PV1READ + ESCS          9102.115 1.455009e-06
PV1MATH + PV1READ + WEALTH + ESCS 9101.061 2.469989e-06
PV1MATH + PV1READ + WEALTH        9048.201 1.432174e-06
                                                      time        code
PV1MATH + PV1READ + ESCS          Mon Nov 27 12:00:17 2023 2cd87427ae9
PV1MATH + PV1READ + WEALTH + ESCS Mon Nov 27 12:00:17 2023 2cd88602551
PV1MATH + PV1READ + WEALTH        Mon Nov 27 12:00:17 2023  2cd8f4409b
 [ reached 'max' / getOption("max.print") -- omitted 12 rows ]

Here the model with the highest Bayes factor is PV1SCIE~PV1MATH + PV1READ + ESCS.

We can compare the Bayes Factors of the models in a chart:

ggplot(output, aes(x=reorder(rownames(output), -bf), 
                   y=bf, fill=rownames(output))) +
  geom_col() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
  labs(x="model", y="Bayes Factor")

0.3 Tasks

0.3.1 Task 1 Bayesian Chi Square

Use a Bayesian chi square test to determine: a) In the PISA 2018 data, determine if there are statistically significant differences for boys and girls for ST011Q03TA (In your home: A quiet place to study)

For the UK, US, France and Germany, determine if they are statistically significant differences in ST011Q04TA In your home: A computer you can use for school work

::: callout hint

When analyzing by country, after filtering for the countries of interest, it can be helpful to drop_levels(<dataframe>). After filtering the other countries in the dataframe will remain as levels, and will create a contingency table with many zero entries if not dropped. For example:

# Create a data frame for having a desk to study at in the UK and the US
Desk <- PISA_2018 %>%
  select(CNT, ST011Q01TA) %>%
  filter(CNT == "United Kingdom" | CNT == "United States")

# Create a contingency table
conttab <- xtabs(data = Desk, ~ CNT + ST011Q01TA)
conttab

                        ST011Q01TA
CNT                        Yes    No Valid Skip Not Applicable Invalid
  Albania                    0     0          0              0       0
  United Arab Emirates       0     0          0              0       0
                        ST011Q01TA
CNT                      No Response
  Albania                          0
  United Arab Emirates             0
 [ reached getOption("max.print") -- omitted 80 rows ]

Adding drop_levels solves this issues

# Create a data frame for having a desk to study at in the UK and the US
Desk <- PISA_2018 %>%
  select(CNT, ST011Q01TA) %>%
  filter(CNT=="United Kingdom"| CNT=="United States")

# Remove levels with 0 responses
Desk <- droplevels(Desk)
# Create a contingency table
conttab <- xtabs(data=Desk, ~ CNT + ST011Q01TA)
conttab

                ST011Q01TA
CNT                Yes    No
  United Kingdom 11696  1508
  United States   3712  1036

:::

Answer A

# Create a dataframe for having a quiet room in the UK
Quiroom <- PISA_2018 %>%
  select(CNT, ST004D01T, ST011Q03TA) %>%
  filter(CNT=="United Kingdom")

# Create a contingency table
conttab <- xtabs(data=Quiroom, ~ ST004D01T + ST011Q03TA)

# Perform the Bayesian chi-square test
contingencyTableBF(conttab, sampleType = "jointMulti")

Bayes factor analysis
--------------
[1] Non-indep. (a=1) : 2.01724e-64 ±0%

Against denominator:
  Null, independence, a = 1 
---
Bayes factor type: BFcontingencyTable, joint multinomial

Answer A

# Returns a base factor of 2.01724e-64 - highly unlikely there is a difference by gender in having a quiet room

Answer B

# Create a dataframe for having a computer in the UK, US, France and Germany
Comp <- PISA_2018 %>%
  select(CNT, ST011Q04TA) %>%
  filter(CNT=="United Kingdom"|CNT=="United States"| CNT=="Germany"| CNT=="France")

# The other countries are included as levels so drop these:
Comp <- droplevels(Comp)

# Create a contingency table
conttab<-xtabs(data=Comp, ~ CNT + ST011Q04TA)

# Perform the Bayesian chi-square test
contingencyTableBF(conttab, sampleType = "jointMulti")

# Returns a base factor of 1.871984e+14 - highly likely there is a difference by country in having a computer

0.3.2 Task 2 Bayesian t-tests

Use a Bayesian t-test to determine: a) In the PISA 2018 data, are statistically significant differences in reading scores (PV1READ) for boys and girls (look at ST004D01T for gender) in the UK?

Answer

# Create a data frame for having a Reading scores in the UK including gender
Read <- PISA_2018 %>%
  select(CNT, ST004D01T, PV1READ) %>%
  filter(CNT == "United Kingdom")

# Perform the Bayesian test
ttestBF(data=Read, formula= PV1READ ~ ST004D01T)

# The Bayes factor is 8.194083e+39 - it is highly likely that are differences between the genders in reading in the UK

Use a Bayesian t-test to determine: In the PISA 2018 data, are statistically significant differences in mathematics scores (PV1MATH) between the UK and the US?

Answer

# Create a data frame for having a mathematics scores in the UK and US 
PISAMath <- PISA_2018 %>%
  select(CNT, PV1MATH) %>%
  filter(CNT=="United Kingdom"| CNT=="United States")

# Drop levels of other countries
PISAMath <- droplevels(PISAMath)

# Perform the Bayesian test
ttestBF(data=PISAMath, formula = PV1MATH ~ CNT)

# The Bayes factor is 9.628524e+50 - it is highly likely that there is a difference in mean mathematics scores between the US and UK

0.3.3 Task 3 Bayesian linear models

Use a Bayesian linear model to determine: a) In the PISA 2018 data, build a model of mathematics scores (PV1MATH) in the UK. What percentage of variance in mathematics score is explained by the model: PV1MATH \~ PV1READ + ST004D01T (gender) + WEALTH

Answer

# Create a data frame for having a Reading scores in the UK including gender
UKMath <- PISA_2018 %>%
  select(CNT, ST004D01T, PV1MATH, WEALTH, PV1READ) %>%
  filter(CNT=="United Kingdom")

# Perform the Bayesian linear model
mod <- brm(data=UKMath, PV1MATH ~ PV1READ + ST004D01T + WEALTH)
summary(mod)

# Calculate the Bayesian equivalent of R2
bayes_R2(mod)

# The model explains 64% of the variance

Use a Bayesian linear model to build the best model you can of the difference between boys and girls mathematics scores in the whole day set. a) Create a dataframe of boys and girls mathematics achievement across the world using summarise b) Use mutate to a difference in score column c) Propose a model (consider e.g., PV1READ, WEALTH etc), calculate the R2 value, and then tune the model by adding additional independent variable to increase its R2

If you aren’t feeling confident with your R data manipulation, you can use the hint below to start with a dataframe of gender difference and various variables by country.

Hint

# Creating the data frame for analysis
# Create a data frame for having a mathematics scores by  gender
MathScore <- PISA_2018 %>%
  select(CNT, ST004D01T, PV1MATH) %>%
  group_by(CNT, ST004D01T) %>%
  summarise(math=mean(PV1MATH)) %>%
  na.omit()

# Use pviot_wider to create side-by-sde columns for mathematics score

MathScore <- pivot_wider(data=MathScore, names_from=ST004D01T, values_from=math)

MathScore <- MathScore %>%
  mutate(genddiff = Female - Male)
# Create a summary of means of the indepedent variables (e.g. WEALTH, reading etc) by country
Independents <- PISA_2018 %>%
  select(CNT, PV1READ, WEALTH, ESCS) %>%
  group_by(CNT) %>%
  summarise(read=mean(PV1READ, na.rm=TRUE), 
            wealth=mean(WEALTH, na.rm=TRUE),
            ESCS=mean(ESCS, na.rm=TRUE)) %>%
  na.omit()
# join the independents to the maths scores gender differences
MathScore<-left_join(MathScore, Independents, by="CNT")

Full Answer

# Create a data frame for having a mathematics scores by  gender
MathScore <- PISA_2018 %>%
  select(CNT, ST004D01T, PV1MATH) %>%
  group_by(CNT, ST004D01T) %>%
  summarise(math=mean(PV1MATH)) %>%
  na.omit()

# Use pviot_wider to create side-by-sde columns for mathematics score

MathScore <- pivot_wider(data=MathScore, names_from=ST004D01T, values_from=math)
MathScore <- MathScore %>%
  mutate(genddiff = Female - Male)

# Create a summary of means of the independent variables (e.g. WEALTH, reading etc) by country

Independents <- PISA_2018 %>%
  select(CNT, PV1READ, WEALTH, ESCS) %>%
  group_by(CNT) %>%
  summarise(read=mean(PV1READ, na.rm=TRUE), 
            wealth=mean(WEALTH, na.rm=TRUE),
            ESCS=mean(ESCS, na.rm=TRUE)) %>%
  na.omit()
# join the independents to the maths scores gender differences

MathScore <- left_join(MathScore, Independents, by="CNT")

# Perform the Bayesian linear model with reading alone
mod<-brm(data = MathScore, genddiff ~ read)
summary(mod)
# Calculate the Bayesian equivalent of R2
bayes_R2(mod)
# The reading only model explains 8% of the variance

# Perform the Bayesian linear model with reading and wealth
mod<-brm(data=MathScore, genddiff ~ read + wealth)
summary(mod)

# Calculate the Bayesian equivalent of R2
bayes_R2(mod)

# The reading and wealth model explains 19% of the variance

# Perform the Bayesian linear model with reading, wealth and ESCS
mod <- brm(data=MathScore, genddiff ~ read + wealth + ESCS)
summary(mod)

# Calculate the Bayesian equivalent of R2
bayes_R2(mod)

# The reading, wealth and ESCS model explains 20% of the variance

0.3.4 Task 4 Bayesian linear models

Use a Bayesian linear model to determine: a) In the UK PISA 2018 data, build a model of mathematics scores (PV1MATH) in the US based on reading scores (PV1READ) and science scores (PV1SCIE). Using the 2015 dataset to provide prior distributions for reading and science scores. Plot the distribution of the model parameters

Answer

# Determine prior distribution parameters
PISA_2015 %>%
  select(PV1SCIE, CNT) %>%
  filter(CNT=="United Kingdom") %>%
  summarise(mean=mean(PV1SCIE, na.rm=TRUE), 
            sd=sd(PV1SCIE, na.rm=TRUE))

PISA_2015 %>%
  select(PV1READ,CNT) %>%
  filter(CNT=="United Kingdom") %>%
  summarise(mean=mean(PV1READ, na.rm=TRUE), 
            sd=sd(PV1READ, na.rm=TRUE))

# Specify the prior distribution using set priors from the `BMS` package
prior2015 <- c(set_prior("normal(503.5, 96.6)", class = "b", coef = "PV1SCIE"),
               set_prior("normal(496, 92.7)", class = "b", coef = "PV1READ"))

# Create the data frame of UK scores in 2018 to model
UKPISA_2018 <- PISA_2018 %>%
  filter(CNT=="United Kingdom")

# Use the Bayesian regression model function 'brm' to fir the model, specifying the prior
modbay <- brm(data=UKPISA_2018, formula = PV1MATH ~PV1SCIE+PV1READ, prior=prior2015)
summary(modbay)
bayes_R2(modbay)

# Plot the distribution of the coefficients
plot(modbay)

0.3.5 Task 5 Bayesian comparison of models

Use the regressionBF function of the BayesFactor package to compare different models for the performance of UK students in mathematics. What is the best model you can find?

Answer

# Create a UK subset
PISAUK <- PISA_2018 %>%
  select(PV1MATH, PV1READ, PV1SCIE, WEALTH, ESCS, CNT) %>%
  filter(CNT == "United Kingdom")%>%
  select(-CNT) %>%
  na.omit()

# Convert variables to factors
PISAUK$WEALTH <- as.factor(PISAUK$WEALTH)
PISAUK$SCIE   <- as.factor(PISAUK$SCIE)
PISAUK$READ   <- as.factor(PISAUK$READ)
PISAUK$MATH   <- as.factor(PISAUK$MATH)
PISAUK$ESCS   <- as.factor(PISAUK$ESCS)

# Use regressionBF to create a dataframe of the models
models<-regressionBF(PV1MATH ~ ., data = PISAUK)

# Select only the relevant part of the output
output<-as.data.frame(models@bayesFactor)

# Arrange the output
output<-output%>%
  arrange(desc(output$bf))
output
# The best model appears to be: PV1SCIE ~ PV1READ + PV1SCIE + WEALTH + ESCS