Other datasets

1 What other datasets are there?

There exist thousands of datasets that you can freely analyse. This section covers a few of the major ones for education, or that might fruitfully be combined with educational datasets such as PISA.

2 Gender equality indices

There are several international datasets for studying gender equality. These have been used by researchers to look at the impact of gender equality on student attitudes and outcomes. For example Stoet and Geary (2018) use the Global Gender Gap Index (GGGI) too look at PISA science self-efficacy and science outcomes for females.

2.1 Global Gender Gap Index (GGGI)

The world economic forum produces the Global Gender Gap Index (GGGI), this index combines female and male outcomes on Economic participation and opportunity, educational attainment, health and survival, and political empowerment.

Reports for: 2022, 2021, 2020,2018,2017,2016, 2015

It has proven difficult to find the 2015 dataset used by Stoet & Geary, the 2013 dataset is here

2.2 United Nations

The UN reports on two gender specific indexes:

Gender Inequality Index (GII)

The Gender Inequality Index is a index incorporating data on reproductive health, empowerment and the labour market. Values range from 0 - full equality for men and women, to 0, full inequality.

Gender Development Index (GDI)

The Gender Development Index measures inequalities in human development, combining data on female and male life expectancy, years of schooling and earned income. Values of 1 indicate equality, with values of less than 1 showing males performing better, and values over 1 showing females doing better.

Downloads for the GII and GDI are here

3 UNESCO

UNESCO produces a number of open data sets of relevance to education:

The UNESCO Institute of Statistics data browser allows you to create data sets including:

  • Demographics of teachers
  • Mean years of schooling
  • Graduation ratios
  • Out of school children and adolescents
  • Educational expenditure
  • Illiteracy rates
  • Survival (in schooling) rate by grade

4 World bank

Lower middle income coutnry mapping from the world ba nk, including historical mappings

5 UNICEF

Multiple Indicator Cluster Surveys, Household survey, looking at educatin and social characteristics of children and adults around the world. To access the data you need to register with UNICEF, and approval for data access can take several days https://mics.unicef.org/surveys

6 Office of National Statistics (in the UK)

The UK ONS site contains a number of general data sets related to the UK, for example, census data, including

The site includes a number of education related data sets including:

7 OECD

The OECD offers over 1,6000 datasets on the countries that it covers, including data on patents, education, mortality and the environment. You can combine these datasets with PISA etc, to explore national trends. To access the OECD database collection use https://stats.oecd.org/. There is catalogue of datasets

Whilst it is possible to access the datasets directly you might find it easier to use the R OECD package that converts the sdmx files in R dataframes:

# installing and loading OECD

# install OECD library
install.packages("OECD")
library("OECD")
get_dataset("EPL_OV")
list datasets and load one
# list all the datasets available
datasets <- get_datasets()

datasets

# download a dataset
df <- get_dataset("EPL_OV")

The OECD R package currently has a bug in it and you need to overwrite the get_data_structure, by running all the code below:

fix the get_data_structure function
install.packages("rsdmx")
library(rsdmx) # loads the sdmx reader
library(OECD)

# get the fields in a dataset
# current implementation broken
get_data_structure <- function(dataset="DUR_D"){
    url <- paste0("https://stats.oecd.org/restsdmx/sdmx.ashx/GetDataStructure/", 
        dataset)
    # data_structure <- readsdmx::read_sdmx(url)
    data_structure <- rsdmx::readSDMX(url)
    variable_desc <- data.frame(data_structure@concepts)
    variable_desc[] <- lapply(variable_desc, as.character)
    variable_desc$en[is.na(variable_desc$en)] <- variable_desc$Name.en[!is.na(variable_desc$Name.en)]
    names(variable_desc)[length(names(variable_desc))] <- "description"
    variable_desc <- variable_desc[, c("id", "description")]
    code_names <- data_structure@codelists@codelists
    code_names <- vapply(code_names, function(x) x@id, "character")
    code_list <- lapply(code_names, function(x) {
        df <- as.data.frame(data_structure@codelists, codelistId = x)
        try({
            df <- df[, c("id", "label.en")]
            names(df)[2] <- "label"
        }, silent = TRUE)
        df
    })
    names(code_list) <- gsub(paste0("CL_", dataset, "_"), "", 
        code_names)
    full_df_list <- c(VAR_DESC = list(variable_desc), code_list)
    full_df_list
}

You can then list the field descriptors for each dataset by running:

get_data_structure("EPL_OV")
load OECD datasets
# http://stats.oecd.org/SDMX-JSON/data/<dataset identifier>/<filter expression>/<agency name>[ ?<additional parameters>]

# install OECD library
library("OECD")
library(rsdmx)

# list all the datasets available
datasets <- get_datasets()

# download a dataset
df <- get_dataset("EPL_OV")

# get the fields in a dataset
# current implementation broken
get_data_structure <- function(dataset="DUR_D"){
    url <- paste0("https://stats.oecd.org/restsdmx/sdmx.ashx/GetDataStructure/", 
        dataset)
    # data_structure <- readsdmx::read_sdmx(url)
    data_structure <- rsdmx::readSDMX(url)
    variable_desc <- data.frame(data_structure@concepts)
    variable_desc[] <- lapply(variable_desc, as.character)
    variable_desc$en[is.na(variable_desc$en)] <- variable_desc$Name.en[!is.na(variable_desc$Name.en)]
    names(variable_desc)[length(names(variable_desc))] <- "description"
    variable_desc <- variable_desc[, c("id", "description")]
    code_names <- data_structure@codelists@codelists
    code_names <- vapply(code_names, function(x) x@id, "character")
    code_list <- lapply(code_names, function(x) {
        df <- as.data.frame(data_structure@codelists, codelistId = x)
        try({
            df <- df[, c("id", "label.en")]
            names(df)[2] <- "label"
        }, silent = TRUE)
        df
    })
    names(code_list) <- gsub(paste0("CL_", dataset, "_"), "", 
        code_names)
    full_df_list <- c(VAR_DESC = list(variable_desc), code_list)
    full_df_list
}
# get_data_structure <- function(dataset="DUR_D"){
  # url <- paste0("https://stats.oecd.org/restsdmx/sdmx.ashx/GetDataStructure/",
  #   dataset)
  # data_structure <- readsdmx::read_sdmx(url)
#   # data_structure <- rsdmx::readSDMX(url)
#   return(data_structure)
# }
df_struct <- get_data_structure("DUR_D")

#############################
#############################

# Database names
# CWB - children wellbeing
### B2_1 - Children (0- to 2-year-olds) participating in early childhood education and care (%)
# NCC - Net childcare costs for parents using childcare facilities
# RPERS - Educational personnel
# EAG_FIN_NATURE - Educational expenditure by Nature
# EAG_FIN_SOURCE - Educational expenditure by Source and destination

8 The Office for Standards in Education, Children’s Services and Skills (Ofsted)

Ofsted is the English body responsible for inspecting state schools, some independent schools, childcare, adoption and fostering agencies and initial teacher training. It has some interesting open data sets.

9 UK Data Service

https://ukdataservice.ac.uk/find-data/browse/

10 data.gov.uk

https://www.data.gov.uk/

References

Stoet, Gijsbert, and David C Geary. 2018. “The Gender-Equality Paradox in Science, Technology, Engineering, and Mathematics Education.” Psychological Science 29 (4): 581–93. https://eprints.leedsbeckett.ac.uk/id/eprint/4753/6/symplectic-version.pdf.