Making graphs

The tidyverse includes the incredibly powerful ggplot2 package. This package is pretty much the industry standard for making graphs for publication. ggplot2 is built on the grammar of graphics where you build graphs by specifying underlying attributes and layering geometric objects on top of each other. In the diagram below you can see how a graph is built from geometric objects (the things that are plotted such as points and bars) a scale, and plot annotations (e.g. a key, title etc). You can then apply faceting to the graph to automatically split one graph into multiple plots, allowing you to easily compare different groupings. Publications, such as the Financial Times, make daily use of ggplot2.

Adapted from A Layered Grammar of Graphics, Wickham, 2010

1 ggplot

The basic structure of ggplot code is to combine different graphing elements through the use of the + operator. To demonstrate this, let’s look at the relationship between a poverty indicator ESCS and the performance in Maths PV1MATH, by gender ST004D01T and country CNT:

library(tidyverse)
# wrangle our data
graph_data <- PISA_2018 %>% 
  filter(CNT %in% c("France", "United Kingdom"))

# display a graph of the results
ggplot(data = graph_data, 
       aes(x = ESCS, y = PV1MATH, colour = ST004D01T)) +
  geom_point() +
  geom_smooth(method = 'lm') +
  facet_wrap(. ~ CNT) +
  ggtitle("Comparison of poverty indicator and Maths result, by gender and country") +
  theme(legend.position = "bottom")

Hopefully you can work out what lines 1-3 do from the previous chapter, let’s focus on the ggplot commands:

6-7 these lines set up the ggplot giving it the table object graph_data as its data input and setting up the aesthetics for the rest of the graph elements using columns from graph_data. The aes(<attribute>, <attribute>, ...) command allows us to specify aesthetic elements of the graph that will change dependent on the dataset we use. In the PISA_2018 data set, the variable ESCS refers to an index of economic status, and PV1Math, is the plausible value of the mathematics score. We set the x and y variables x=ESCS and y=PV1MATH , defining aes() inside ggplot() means we will pass down these values to subsequent geometric objects so we don’t have to define these x and y axis items again and again.
8 using the data and aes values defined on lines 6-7, geom_point uses the x and y values defined on line 19 to draw a point for each school in our dataset. There are lots of different parameters we could give geom_point e.g. specifying size and shape, but here we are content with using the defaults.
9 we add another geometric object on top of the points, this time we add a line of best fit geom_smooth, again this geometric object uses the values specified on lines 6-7, and we define the method as lm, to calculate a linear model line of best fit.
10 next we use facet_wrap(. ~ CNT) to create a graph for each group of CNT in the dataset, i.e. a graph for each country defined on line 3.
11 finally we customise the title of the graph, ggtitle, ready for display.

Important

Switching between the pipes and ggplot can get rather confusing. A very common mistake in using ggplot is to try and link together the geom_ elements with a pipe command %>% rather than the +.

2 geoms

There are about 40 different geometric objects in ggplot, allowing you to create almost any sort of graph. We will be exploring a few of them in detail, but if you want to explore others, please follow some of the links below:

geom_bar for creating bar charts and histograms
geom_point for plotting single points
geom_line for connecting points together and showing trends
geom_text for adding text labels to data points
geom_boxplot for representing the range of data
geom_hex for creating heat maps
geom_map for adding geographic maps
geom_smooth for adding lines of best fit
geom_hline for adding static horizontal lines to a graph
geom_vline for adding static vertical lines to a graph

2.1 geom_point

Rather unsurprisingly, geom_point allows us to plot a layer of points using x and y coordinates. The below example shows how we can specify within the ggplot function data=school_plot_data. We then define the aesthetic attributes of the graph, passing the x x=NumberOfBoys and y y=NumberOfGirls values. Once these have been declared in the ggplot(...) function, their values are passed down to any subsequent geom_, in this case geom_point will use the data and x and y values that have been specified in ggplot(...)

# create a new dataframe of maths and reading scores by country and OECD status
country_plot_data <- PISA_2018 %>% 
  group_by(OECD, CNT) %>%
  summarise(mean_maths = mean(PV1MATH),
            mean_read = mean(PV1READ),
            sz = n())

# using this new dataframe, show the relationship between maths and reading score
# using geom_points
ggplot(data=country_plot_data, 
       aes(x=mean_maths, y=mean_read)) +
  geom_point(aes(size=sz, colour=OECD), alpha = 0.6)

In line 2 above we pipe the large data.frame PISA_2018 to apply a number of functions.
Line 3 groups by OECD (a Yes or No indicating membership) and CNT (the name of the country).
Lines 4-5 calculate the mean mathematics and reading score, and create new variables (mean_maths, and mean_read) for their values.
In addition, in line 6, the variables sz is created which counts the number of responses per country through the n() command.
This whole pipe is then saved to the country_plot_data object, using the <- on line 2

This new dataframe is then passed to ggplot.

In line 10, we specify the data that should be plotted - the new dataframe country_plot_data we have created.
Line 11, then we pass the x and y variables, mean_maths, and mean_read, inside the aesthetic function. These values will be passed to any subsequent geom_
In line 12, we make a number of tweaks to the points: first setting the aesthetics - the size of the points is linked to the sz variable we created, the number of responses per country, and the colour is linked to OECD membership. Finally (and note this is outside the aes brackets, we set an alpha value which makes the point slightly transparent, which is more visually appealing where points overlap.

Important

Defining things inside aes mean that they will change with the dataset you use, if you define them outside aes then they will be constant values. For example

# plotting number of boys as x, number of girls as y and % disadvantged as size,
# all inside aes, so each point is a table row
ggplot(data=PISA_2018_school) +
  geom_point(aes(x = SC002Q01TA,
                 y = SC002Q02TA,
                 size=SC048Q03NA))

# there is an error if we put size outside the aes, and set it a value from the 
# dataset, it can't find value!
ggplot(data=PISA_2018_school) +
  geom_point(aes(x = SC002Q01TA,
                 y = SC002Q02TA),
             size=SC048Q03NA)

Error in eval(expr, envir, enclos): object 'SC048Q03NA' not found

# but if we set the size explicitly, outside the aes, then all points will be that size
ggplot(data=PISA_2018_school) +
  geom_point(aes(x = SC002Q01TA,
                 y = SC002Q02TA),
             size=3)

2.1.1 Questions

Spot the three errors in this graph code

ggplot(adta=diamonds, x=depth, y=price) +
  geom_point() %>% 
  ggtitle("diamond graph")

Using the PISA_2018 dataset, plot a graph for students from Norway to look at the relationship between wealth WEALTH and reading grade PV1READ. Colour each point with the gender ST004D01T of the student. Give the graph sensible x and y labels (e.g. xlab("label")).

answer dataframe

graph_data <- PISA_2018 %>% filter(CNT == "Norway")

answer graph

ggplot(graph_data,
       aes(x=WEALTH,
           y=PV1READ)) +
  geom_point(aes(colour=ST004D01T)) +
  xlab("Wealth") +
  ylab("Reading grade")

Using the PISA_2018 dataset for each country CNT, create a graph to explore how the median of the sense of school belonging BELONG relates to the median of the disciplinary climate in the school DISCLIMA, adjust the colour of each point to reflect the mean wealth of students in each country ESCS.

HINT: You’ll need create a new dataframe with summarised variables for median_belong, median_discipline and mean_wealth.

answer dataframe

graph_data <- PISA_2018 %>%
  group_by(OECD, CNT) %>%
  summarise(median_belong = median(BELONG, na.rm=TRUE),
            median_discipline = median(DISCLIMA, na.rm=TRUE),
            mean_wealth = mean(ESCS, na.rm=TRUE))

HINT: To make your colours stand out more, add + scale_color_gradientn(colours = rainbow(3)) to the end of your plot.

answer graph

# display a graph of the results
ggplot(data = graph_data, 
       aes(x = median_belong, 
           y = median_discipline)) +
  geom_point(aes(colour = mean_wealth)) + 
  scale_color_gradientn(colours = rainbow(3))

2.2 geom_bar

The geom_bar function is versatile, allowing the creation of bar, multiple bar, stacked bar charts and histograms. This first example shows how we can use bar charts to represent the number of cars in a household ST012Q02TA:

plot_cars <- PISA_2018 %>% filter(!is.na(ST012Q02TA))

ggplot(data = plot_cars, 
       aes(x=ST012Q02TA)) + 
  geom_bar()

2 gets the PISA_2018 dataset and removes all rows where ST012Q02TA is NA, storing this new dataset as plot_cars
4 passes the plot_cars to ggplot, as the dataset we are going to plot
5 specifies the x values to be the values stored in ST012Q02TA, i.e. we will have a bar for each response given in ST012Q02TA: None, One, Two, Three or more.
6 geom_bar tell ggplot to make bars, it uses the aesthetic from line 5, to plot the x axis, note we haven’t given it a y value, this is auto-calculated from the number of students in each x group.

We can choose to let ggplot split the results into different groups for us by setting a fill option, in this case on the OECD status of the country, i.e. do students in OECD countries have more cars than those not in OECD countries, to do this, we add fill=OECD to the aes on line 2 below:

ggplot(data = plot_cars, 
       aes(x=ST012Q02TA, fill=OECD)) + 
  geom_bar()

The bars are now coloured with a key, but, annoyingly, the bars are on top of each other not easily allowing us to make direct comparisons. To compare different groups we need the bars to not be stacked, we want them next to each other, or in ggplot language, we want the bars to dodge each other, to do this we add the position=position_dodge() command to line 3 below:

2.2.1 Raising the bars yourself

ggplot can do a lot of the hard work when putting together bar charts, e.g. counting the number of students in each group, but there might also be times when you want to use pipes to calculate summary values that you then want plot. That is, you want to specify the heights of the bars yourself. To do this we will specify the y axis in the aes and use stat="indentity" to tell ggplot that’s what we’re doing. Take the example where you want to find the overall percentage of students for a range of countries getting over 500 in PV1SCIE:

plot_data <- PISA_2018 %>%
  group_by(OECD, CNT) %>%
  summarise(all_students = n(),
            upper_sci_per = sum(PV1SCIE > 500) / n())

ggplot(data=plot_data, 
       aes(x=CNT, y=upper_sci_per)) +
  geom_bar(aes(fill=OECD), 
           position=position_dodge(),
           stat="identity") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

1 to 4 creates a dataframe plot_data that calculates the percentage of students in each country CNT, that have a science grade PV1SCIE over 500
7 as we are setting the heights of the bars ourselves, we need to give the ggplot aes command a y value, in this case y=upper_sci_per
8 the geom_bar is given a fill value of OECD, this will allow us to see how countries in and out of the OECD compare
9 we use position=position_dodge() as we want the percentage grades of each country to be next to each other so we can look for differences in heights
10 stat="identity" tells geom_bar that you have defined your own bar heights in the y attribute and not to count the number of rows.
11 this theme command helps rotate the x-axis labels 90 degrees, so they don’t overlap.

Alternatively, you can use the geom_col() function that can handle you setting the y values and not specifying stat="identity"

ggplot(data=plot_data, 
       aes(x=CNT, y=upper_sci_per)) +
  geom_col(aes(fill=OECD), 
           position=position_dodge()) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

2.2.2 Questions

Can you spot the 4 errors in this code.

ggplot(data=schools %>% filter(Phase == "Secondary"), 
       x=Region +
  geom_bar(aes(fill=Gender) position="full")

answer

ggplot(data=schools %>% filter(Phase == "Secondary"), 
       aes(x=Region)) + # 1 no aes() around the x value # 2 missing close brackets
  geom_bar(aes(fill=Gender), position="fill") # 3 missing comma # 4 position="full" rather than fill

Create a bar chart showing the total number of students for each grouping of “Think about your school, how true: Students seem to value competition” ST205Q01HA. Adjust this graph to see if there is a difference for this amongst females and males ST004D01T:

answer

ggplot(data=PISA_2018,
       aes(x=ST205Q01HA)) +
  geom_bar() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

answer with gender added

ggplot(data=PISA_2018 %>% filter(!is.na(ST205Q01HA)),
       aes(x=ST205Q01HA, fill=ST004D01T)) +
  geom_bar(position=position_dodge()) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

Plot bars for the number of Females and Males ST004D01T who answer each grouping for: “Use digital devices outside of school: Browsing the Internet for fun (such as watching videos, e.g. )” IC008Q08TA. Make sure that the bars position_dodge() each other so we can compare the heights.

answer

graph_data <- PISA_2018 %>% filter(!is.na(IC008Q08TA))

ggplot(data=graph_data,
       aes(x=IC008Q08TA, fill=ST004D01T)) +
  geom_bar(position=position_dodge()) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

For France and the United Kingdom, plot the total number of students who gave each answer for IC010Q09NA “Use digital devices outside of school: Doing homework on a computer”. Filter out all the NA values first !is.na(...)

answer dataframe

plot_data <- PISA_2018 %>% 
  filter(CNT %in% c("France", "United Kingdom"),
         !is.na(IC010Q09NA))

answer

ggplot(plot_data,
       aes(x=IC010Q09NA, fill=CNT)) +
  geom_bar(position=position_dodge()) +
  theme(legend.position = "bottom")

Using the PISA_2018_school dataset (available here) create a table that stores for each country CNT

the total number of schools,
the total number of teachers working full-time SC018Q01TA01,
the total number of teachers working part-time SC018Q01TA02,
and the overall total number of teachers

Add an additional column working out the % of teachers working part-time in each country, call this per_part

creating the dataframe

plot_workforce <- PISA_2018_school %>% 
  group_by(CNT) %>% 
  summarise(schools = n(),
            fulltime = SC018Q01TA01 %>% sum(na.rm=TRUE),
            parttime = SC018Q01TA02 %>% sum(na.rm=TRUE),
            teachers = fulltime + parttime,
            per_part = parttime / teachers)

Using this dataframe plot a bar graph for each country of the per_part, use the number of schools as a fill:

creating the dataframe

ggplot(data=plot_workforce,
       aes(x=CNT, y=per_part, fill=schools)) +
  geom_bar(stat="identity") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  theme(legend.position="top")

[Extension] Explore other patterns in the school and student pisa datasets.

2.3 geom_text

Our bar charts look great, but finding the actual value of each bar can be a little clumsly if we have to get a ruler out and read off the y-axis. Better would be for us to have numbers listed at the top of each bar by adding a geom_text element:

plot_cars <- PISA_2018 %>% filter(!is.na(ST012Q02TA))

ggplot(data = plot_cars, 
       aes(x=ST012Q02TA, fill=OECD)) + 
  geom_bar(position=position_dodge()) +
  geom_text(stat='count', 
            aes(label=..count..), 
            position = position_dodge(width=0.9),
            vjust=-0.5)

line 6 starts the geom_text command, telling the geom to use the count statistic from ggplot, this means it will be able to fetch the number of rows in each grouping.
line 7 as we want the label to change for each bar element, we put label=..count.. inside aes. The x location of the labels is inherited from line 4 and the y location will be calculated from the height of each bar
line 8 we want the labels to align to the bars, so we tell the geom_text to also position_dodge, passing a width=0.9 to the dodge function, so the labels line up above the columns,
finally, on line 9, we vertically adjust the labels vjust, so they sit on top of the columns.

Rather than adding the count, you might want to add the percentage that each bar represents, we can do this by changing the value given to label on line 5, below:

ggplot(data = plot_cars, 
       aes(x=ST012Q02TA, fill=OECD)) + 
  geom_bar(position=position_dodge()) +
  geom_text(stat='count', 
            aes(label=100*(..count../sum(..count..)) %>% round(3), 
            group = OECD), 
            position = position_dodge(width=0.9),
            vjust=-0.5)

Additionally, when we make graphs we often want to label the dataset, for example if we were to plot all the countries and their PV1MATH and PV1READ scores, we would get:

plot_data <- PISA_2018 %>%
  group_by(OECD, CNT) %>%
  summarise(m_read  = mean(PV1READ, na.rm="TRUE"),
            m_maths = mean(PV1MATH, na.rm="TRUE"))

ggplot(data=plot_data, 
       aes(x=m_read, y=m_maths, colour=OECD)) +
  geom_point()

This looks great, but we don’t actually know which countries are which. To get this data we need to add text to the graph, using geom_text.

ggplot(data=plot_data, 
       aes(x=m_read, y=m_maths, colour=OECD)) +
  geom_point() +
  geom_text(aes(label=CNT), 
            colour="black", 
            check_overlap = TRUE)

Here we have used colour="black" outside the aes to define the colour for all the labels, and check_overlap = TRUE which removes any labels that are on top of each other. It’s still a little bit hard to understand, and maybe we want to focus on just a few of the labels for countries we are interested in. For example

# make a vector of countries you want to have labels for
focus_cnt <- c("United Kingdom", "France", "Argentina")

# add a new column to the plot_data where these countries are
plot_data <- plot_data %>% mutate(focus = CNT %in% focus_cnt)

plot_data

# A tibble: 80 × 5
# Groups:   OECD [2]
   OECD  CNT                    m_read m_maths focus
   <fct> <fct>                   <dbl>   <dbl> <lgl>
 1 No    Albania                  407.    438. FALSE
 2 No    United Arab Emirates     421.    430. FALSE
 3 No    Argentina                415.    392. TRUE 
 4 No    Bulgaria                 423.    440. FALSE
 5 No    Bosnia and Herzegovina   403.    407. FALSE
 6 No    Belarus                  476.    473. FALSE
 7 No    Brazil                   416.    384. FALSE
 8 No    Brunei Darussalam        409.    429. FALSE
 9 No    Costa Rica               426.    403. FALSE
10 No    Dominican Republic       344.    327. FALSE
# ℹ 70 more rows

Now we can adjust out geom_text to only show those countries that we want to focus on:

ggplot(data=plot_data, 
       aes(x=m_read, y=m_maths, colour=OECD)) +
  geom_point() +
  geom_text(data=plot_data %>% filter(focus == TRUE),
            aes(label=CNT), 
            colour="black", 
            check_overlap = TRUE)

line 5 changes the data that is being passed to the geom_text, it no longer uses the data defined in line 2, but has a new dataset, that is filtered on focus == TRUE, i.e. only containing the countries that we want.
Note that the x and y mappings from line 3 are inherited by line 6, it’s only the data that we have redefined

Tip

geom_text is great, but you might find that ggrepel package useful as it’ll add lines connecting the text the data points. Using the plot_data from above:

library(ggrepel)

ggplot(data=plot_data, 
       aes(x=m_read, y=m_maths, colour=OECD)) +
  geom_point() +
  geom_text_repel(data=plot_data %>% filter(focus == TRUE),
            aes(label=CNT),
            box.padding = 0.5,
            max.overlaps = Inf,
            colour="black")

3 faceting

Faceting allows you to easily create multiple graphs from one dataset and one ggplot definition by splitting the data on different factors, by defining:

facet_grid(<column_to_split> ~ .)

facet_grid(. ~ <column_to_split>)

Looking at the PISA_2018 dataset, we can plot the WEALTH and reading test outcome PV1READ variables against each other:

# create a subset of the data for plotting
plot_data <- PISA_2018 %>% 
  select(OECD, CNT, WEALTH, PV1READ) %>%
  filter(CNT %in% c("Germany", "Russian Federation", 
                    "United Kingdom"))

ggplot(data=plot_data, aes(x=WEALTH, y=PV1READ)) + 
  geom_point() + 
  geom_smooth() +
  theme(legend.position = "bottom")

But it isn’t clear how this graph varies between countries. We could try plotting separate graphs for each country, but there is a faster way. Using:

+ facet_grid(CNT ~ .)

ggplot will automatically create graphs for subsets of plot_data, split on each different country in CNT

ggplot(data=plot_data, aes(x=WEALTH, y=PV1READ)) + 
  geom_point() + 
  geom_smooth() +
  theme(legend.position = "bottom") +
  facet_grid(CNT ~ .)

If the column you want to split on it on the left hand side of facet_grid(CNT ~ .), then the graphs will be piled on top of each other, if it’s on the right hand side facet_grid(. ~ CNT), then they’ll be side by side.

Take a subset of the overall dataset, by filtering on a few countries, take another one of your plots and use facet_grid(CNT ~ .) to explore the graphs for different countries.

3.1 Exporting plots

ggplot can export data in a variety of formats suitable for printing, publication and the web. Once you have created a graph and stored it in an object my_graph <- ggplot(..., the command to save the graph to your hard drive is:

ggsave(<file_location_name_and_extension>, <object_name>, width = 5, height = 4, device=<file_ext>)

ggsave("my_graph.pdf", my_graph, width = 5, height = 4, device="pdf")

If you want to change the output format, just change the extension of the file you are saving:

“poverty_size.pdf” perfect for publication and printing, potentially large file size
“poverty_size.svg” the same as pdf, also suitable for putting on webpages
“poverty_size.png” smaller file size, suitable for websites and presentations
“poverty_size.jpg” similar output to png

4 Additional support on graphing

4.1 Using R to do descriptive statistics and plot graphs

You can find the code used in the video below:

Show the code

# Introduction to plotting graphs
#
# Download data from /Users/k1765032/Library/CloudStorage/GoogleDrive-richardandrewbrock@gmail.com/.shortcut-targets-by-id/1c3CkaEBOICzepArDfjQUP34W2BYhFjM4/PISR/Data/PISA/subset/Students_2018_RBDP_none_levels.rds
# You want the file: Students_2018_RBDP_none_levels.rds
# and place in your own file system
# change loc to load the data directly. Loading into R might take a few minutes
library(tidyverse)
loc <- "/Users/k1765032/Library/CloudStorage/GoogleDrive-richardandrewbrock@gmail.com/.shortcut-targets-by-id/1c3CkaEBOICzepArDfjQUP34W2BYhFjM4/PISR/Data/PISA/subset/Students_2018_RBDP_none_levels.rds"
PISA_2018 <- read_rds(loc)

# Calculating means of groups
# The PISA_2018 dataframe is piped to a new dataframe MeanPISA
# The data are grouped by the country variable (CNT)
# The countries of interest are chosen (UK, France, Germany and the US)
# The summarise function is used to output the mean and standard deviation score for each country
# on the Science Plausible Value (PV1SCIE) and NAs are ignored na.rm=TRUE

MeanPISA <- PISA_2018 %>%
  group_by(CNT)  %>%
  filter(CNT=="United Kingdom" | CNT== "France" | CNT== "Germany" | CNT=="United States")  %>%
  summarise(mean_sci = mean(PV1SCIE, na.rm=TRUE), sd_sci= sd(PV1SCIE, na.rm=TRUE)) 
print(MeanPISA)


# To plot data we can use the ggplot function. 
# We will start by plotting a column graph use geom_col
# We specify the data set for ggplot to use (MeanPisa) and then 
# define the x and y variables:
# ggplot(MeanPISA,
#       aes(x=CNT, y=mean_sci))
# geom_col() (Note the plus is on the line before) plots the graph and the fill colour is set to red
# The next three lines set the formatting of the axis text and add x and y axis labels

ggplot(MeanPISA,
       aes(x=CNT, y=mean_sci))+
geom_col(fill="red") +
  theme(axis.text.x = element_text(angle = 90, hjust=0.95, vjust=0.2, size=10)) +
  xlab("Country") +
  ylab("Science Score")

# For plotting a scatter plot or PISA reading scores against science scores
#, first we make a managable data set
# I will filter the data set to include only the UK data
# and remove any NAs

UKData <- PISA_2018 %>%
  filter(CNT=="United Kingdom") %>%
  drop_na(PV1SCIE)

# This time I will use ggplot to plot a scatter graph
# I feed UKDATA to ggplot, specify the x (PISA Reading score)
# And y (PISA science score). This time, I have linked the colour
# to a variable (ST004D01T) which is the gender value, giving
# plot points of different colours for boys and girls
# To produce a scatter plot, I use geom_point to plot points,
# Giving the size of point and the transparency (alpha=0.5) -
# some transparency of points is helpful when plots become dense
# The x and y lables are added
# Finally, a line is plotted - geom_smooth(method='lm')
# sets the line to a linear ('lm') line

  ggplot(UKData,
       aes(x=PV1READ, y=PV1SCIE, colour=ST004D01T)) +
  geom_point(size=0.1, alpha=0.5) +
  ylab("Science Score") +
  xlab("Reading Score") +
  geom_smooth(method='lm')
  
# Where R becomes very powerful is being able to produce multiple charts rapidly
# In the code below, I plot reading against science scores as above, but this time
# Use the entire data set - for the whole world!
# All the steps are the same, except, I use the facet_wrap, a way to create multiple
# graph panels - the instruction creates a set of graphs for each country  
  
  WorldData <- PISA_2018 %>%
    drop_na(PV1SCIE)
  
  ggplot(WorldData,
         aes(x=PV1READ, y=PV1SCIE, colour=ST004D01T)) +
    geom_point(size=0.1, alpha=0.5) +
    ylab("Science Score") +
    xlab("Reading Score") +
    geom_smooth(method='lm') +
    facet_wrap(CNT~.)