Type: | Package |
Title: | Retrieve, Harmonise and Map Open Data Regarding the Italian School System |
Version: | 0.2.7 |
Author: | Leonardo Cefalo |
Maintainer: | Leonardo Cefalo <leonardo.cefalo@uniba.it> |
Description: | Compiles and displays the available data sets regarding the Italian school system, with a focus on the infrastructural aspects. Input datasets are downloaded from the web, with the aim of updating everything to real time. The functions are divided in four main modules, namely 'Get', to scrape raw data from the web 'Util', various utilities needed to process raw data 'Group', to aggregate data at the municipality or province level 'Map', to visualize the output datasets. |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
URL: | https://github.com/lcef97/SchoolDataIT |
LazyData: | true |
RoxygenNote: | 7.3.2 |
Imports: | curl, dplyr, ggplot2, grDevices, httr, leafpop, magrittr, mapview, readr, rlang, rvest, sf, stringr, tidyr, utils, xml2 |
Suggests: | knitr, readxl, rmarkdown, testthat (≥ 3.0.0), tidyverse |
Config/testthat/edition: | 3 |
Depends: | R (≥ 2.10) |
NeedsCompilation: | no |
Packaged: | 2025-07-11 10:11:54 UTC; Leonardo |
Repository: | CRAN |
Date/Publication: | 2025-07-11 10:40:07 UTC |
SchoolDataIT: Retrieve, Harmonise and Map Open Data Regarding the Italian School System
Description
Compiles and displays the available data sets regarding the Italian school system, with a focus on the infrastructural aspects. Input datasets are downloaded from the web, with the aim of updating everything to real time. The functions are divided in four main modules, namely 'Get', to scrape raw data from the web 'Util', various utilities needed to process raw data 'Group', to aggregate data at the municipality or province level 'Map', to visualize the output datasets.
Author(s)
Maintainer: Leonardo Cefalo leonardo.cefalo@uniba.it (ORCID)
Other contributors:
Alessio Pollice alessio.pollice@uniba.it (ORCID) [contributor, thesis advisor]
Paolo Maranzano pmaranzano.ricercastatistica@gmail.com (ORCID) [contributor]
See Also
Useful links:
Download the names and codes of Italian LAU and NUTS-3 administrative units
Description
This function downloads a file provided by the Italian National Institute of Statistics including all the codes of administrative units in Italy. As of today, it is the easiest way to map directly cadastral codes to municipality codes.
Usage
Get_AdmUnNames(Date = Sys.Date(), autoAbort = FALSE)
Arguments
Date |
Character. The date at which administrative unit codes are sought for. Important: must be in the format: "yyyy-mm-dd". Current date by default. |
autoAbort |
Logical. Whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. |
Value
An object of class tbl_df
, tbl
and data.frame
, including: NUTS-3 code, NUTS-3 abbreviation,
LAU code, LAU name (description) and cadastral code. All variables are characters except for the NUTS-3 code.
Source
<https://situas.istat.it/web/#/territorio>
Examples
Get_AdmUnNames("2025-01-01", autoAbort = TRUE)
Download the data regarding the broad band connection activation in Italian schools
Description
Retrieves the data regarding the activation date of the ultra-broadband connection in schools and indicates whether the connection was activated or not at a certain date.
Usage
Get_BroadBand(
Date = as.Date(format(as.Date(format(Sys.Date(), "%Y-%m-01")) - 1, "%Y-%m-01")),
include_municipality_code = TRUE,
input_School2mun = NULL,
input_Registry = NULL,
input_AdmUnNames = NULL,
verbose = TRUE,
autoAbort = FALSE
)
Arguments
Date |
Object of class |
include_municipality_code |
Logical. Whether to include municipality codes.
|
input_School2mun |
Object of class |
input_Registry |
If |
input_AdmUnNames |
If |
verbose |
Logical. If |
autoAbort |
Logical. Whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. |
Details
Ultra - Broadband is defined as everlasting internet connection with a maximum speed of 1 gigabit per second, with a minimum guaranteed speed of 100 megabits/second both on the uploading and downloading operations, until the peering point is reached, as declared on the data provider's website: <https://bandaultralarga.italia.it/scuole-voucher/progetto-scuole/>. In the example the broadband availability at the beginning of school year 2022/23 (1st september 2022) is shown.
Value
An object of class tbl_df
, tbl
and data.frame
.
The variables BB_Activation_date
and BB_Activation_staus
indicate
the activation date and activation status of the broadband connection at the selected date.
Source
Broadband dashboard: <https://bandaultralarga.italia.it/scuole-voucher/dashboard-scuole/>
Examples
Broadband_220901 <- Get_BroadBand(Date = as.Date("2022-09-01"), autoAbort = TRUE)
Broadband_220901
Broadband_220901[, c(9,6,13,14)]
Download the database of Italian public schools buildings
Description
This function downloads the School Buildings Open Database provided by the Italian Ministry of Education, University and Research.
It is one of the main sources of information regarding the infrastructure system of public schools in Italy. For a given year, all available data are downloaded (except for the structural units section, which has a different level of detail) and gathered into a unique dataframe.
Usage
Get_DB_MIUR(
Year = 2023,
verbose = TRUE,
input_Registry = NULL,
input_AdmUnNames = NULL,
show_col_types = FALSE,
certifications = FALSE,
autoAbort = FALSE
)
Arguments
Year |
Numeric or character value. Reference school year (last available is 2023).
Available in the formats: |
verbose |
Logical. If |
input_Registry |
Object of class |
input_AdmUnNames |
Object of class |
show_col_types |
Logical. If |
certifications |
Logical. From year 2021/22 onwards, whether to include some safety certifications in the database.
Given the particular level of definition of this file, it requires extra computational time (other than the downloading time). |
autoAbort |
Logical. Whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. |
Details
This function downloads the raw data; missing observations are not edited; all variables are characters.
Since certifications are defined at the level of structural units of the single buildings, here
the fields read as the percentage of structural units in a building having a given certificate.
To edit the output of this function and convert the relevant variables to numeric or Boolean, please Util_DB_MIUR_num
.
Schools different from primary, middle or high schools are classified as "NR"
. In the example, the data for school year 2022/23 are retrieved.
Value
An object of class tbl_df
, tbl
and data.frame
.
Source
Examples
input_DB23_MIUR <- Get_DB_MIUR(2023, autoAbort = TRUE)
input_DB23_MIUR[-c(1,4,6,9)]
Download the classification of peripheral municipalities
Description
Retrieves the classification of Italian municipalities into six categories; classes D, E, and F are the so-called internal/inner areas; classes A, B and C are the central areas.
Usage
Get_InnerAreas(verbose = TRUE, autoAbort = FALSE)
Arguments
verbose |
Logical. Whether to keep track of computational time. |
autoAbort |
Logical. Whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. |
Details
Classes are defined according to these criteria; see the methodological note (in Italian) for more detail:
A - Standalone pole municipalities, the highest degree of centrality; they are characterised by a thorough and self-sufficient combined endowment of school, health and transport infrastructure, i.e. there are at least a lyceum and a technical high school; a railway station of medium dimensions and a hospital provided with an emergency ward.
B - Intermunicipality poles; the endowment of such infrastructures is complete if a small set of contiguous municipalities is considered
The remaining classes are defined in terms of the national distribution of the road distances from a municipality to the closest pole:
C - Belt municipalities, travel time below the median (< 27'42”) .
D - Intermediate municipalities, travel time between the median and the third quartile (27'42” - 40'54”).
E - Peripheral municipalities, travel time between the third quartile and 97.5th percentile (40'54” - 1h 6' 54”).
F - Ultra-peripheral municipalities, travel time over the 97.5th percentile (>1h 6' 54”).
For more information regarding the dataset, it is possible to check the ISTAT methodological note (in Italian) available at <https://www.istat.it/it/files//2022/07/FOCUS-AREE-INTERNE-2021.pdf>
Value
An object of class tbl_df
, tbl
and data.frame
.
Source
<https://www.istat.it/notizia/la-geografia-delle-aree-interne-nel-2020-vasti-territori-tra-potenzialita-e-debolezze/>
Examples
InnerAreas <- Get_InnerAreas(autoAbort = TRUE)
InnerAreas[, c(1,9,13)]
Download the Invalsi census survey data
Description
Downloads the full database of the Invalsi scores, detailed either at the municipality or province level.
Usage
Get_Invalsi_IS(
level = "LAU",
verbose = TRUE,
show_col_types = FALSE,
multiple_out = TRUE,
autoAbort = FALSE,
category = FALSE
)
Arguments
level |
Character. The level of aggregation of Invalsi census data. Either |
verbose |
Logical. If |
show_col_types |
Logical. If |
multiple_out |
Logical. Wheter keeping
multiple dataframes as outputs (thus overriding the |
autoAbort |
Logical. Whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. |
category |
Logical. Whether to focus on a specific category of students participating to the census survey. Warning: experimental. |
Details
Numeric variables provided are:
-
Average_percentage_score
Average direct score (percentage of sufficient tests) -
Std_dev_percentage_score
Standard deviation of the direct score -
WLE_average_score
Average WLE score. The WLE score is calculated through the Rasch's psychometric model and is suitable for middle and high schools in that it is cleaned from the effect of cheating (which would affect both the average score and the score variability). By construction it has a mean around 200 points. -
Std_dev_WLE_score
Standard deviation of the WLE score. By construction it ranges around 40 points at the school level. -
Students_coverage
Students coverage percentage
Additional numeric variables, not always available for all observational units, are:
Mean and SD of ESCS indicator
-
First-Fifth_Level
: Distribution of the proficiency level of students -
Targets_percentage
: Percentage of students reaching targets
Numeric codes 888
and 999
denote not applicable and not available fields respectively.
If multiple_out == TRUE
, provides the following datasets:
-
Municipality_data
: LAU-level data -
Province_data
: NUTS-3-level data -
Region_data
: NUTS-2-level data -
LLS_data
: data at the level of local labour systems (Sistemi Locali del Lavoro; see ISTAT webpage for details) -
Inner_Areas_2021_data
aggregated data for inner areas according to the 2020 taxonomy -
Inner_Areas_2014_data
aggregated data for inner areas according to the former 2014 taxonomy -
Macroarea_data
data aggregated for North-West, North-East, Center, South and Islands
Value
Unless multiple_out == TRUE
, an object of class tbl_df
, tbl
and data.frame
.
Otherwise, a list including objects of the aforementioned classes
Source
<https://serviziostatistico.invalsi.it/en/archivio-dati/?_sft_invalsi_ss_data_collective=open-data>
Examples
Get_Invalsi_IS(level = "NUTS-3", autoAbort = TRUE, verbose = FALSE)
Download the registry of Italian public schools from the school registry section
Description
This function returns two main pieces of information regarding Italian schools, namely:
The denomination of the region, province and municipality to which the school belongs.
The mechanographical code to the reference institute of each school.
It is possible to access schools in all the national territory, including the autonomous provinces of Aosta, Trento and Bozen.
Usage
Get_Registry(
Year = 2023,
filename = c("SCUANAGRAFESTAT", "SCUANAAUTSTAT"),
show_col_types = FALSE,
autoAbort = FALSE
)
Arguments
Year |
Numeric or character. Reference school year (last available is 2024).
Available in the formats: |
filename |
Character. A string included in the name of the file to download, identifying the schools included.
By default it is For the registry of private schools, either in all the national territory except for the aforementioned provinces, and for these provinces, please use |
show_col_types |
Logical. If |
autoAbort |
Logical. Whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. |
Details
Schools different from primary, middle or high schools are classified as "NR"
.
Value
An object of class tbl_df
, tbl
and data.frame
.
Source
Examples
Get_Registry(2024, filename = "SCUANAGRAFESTAT", autoAbort = TRUE)
Associate a Municipality (LAU) code to each school
Description
This function associates the relevant municipality codes to all the schools listed in the two main registries provided by the Italian Ministry of Education, University and Research, namely:
The registry of school buildings, here referred to as
Registry_from_buildings
(Get_DB_MIUR
)The official schools registry, here referred to as
Registry_from_registry
(seeGet_Registry
)
Usage
Get_School2mun(
Year = 2023,
show_col_types = FALSE,
verbose = TRUE,
input_AdmUnNames = NULL,
input_Registry = NULL,
autoAbort = FALSE
)
Arguments
Year |
Numeric or character value (last available is 2023).
Available in the formats: |
show_col_types |
Logical. If |
verbose |
Logical. If |
input_AdmUnNames |
Object of class |
input_Registry |
Object of class |
autoAbort |
Logical. Whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. |
Value
An object of class list
, including 4 elements:
-
$Registry_from_buildings
: Object of classtbl_df
,tbl
anddata.frame
: the schools listed in the buildings registry -
$Registry_from_registry
: Object of classtbl_df
,tbl
anddata.frame
: the schools listed in the schools registry -
$Any
: Object of classtbl_df
,tbl
anddata.frame
: schools listed anywhere -
$Both
: Object of classtbl_df
,tbl
anddata.frame
: schools listed in both the sections
Source
Buildings registry (2021 onwards); Buindings registry(until 2019); Schools registry
Examples
Get_School2mun(Year = 2023, autoAbort = TRUE)
Download the shapefiles of Italian NUTS-3 and LAU administrative units
Description
Downloads either the boundaries or the centroids of the relevant administrative units, either provinces or municipalities, from the ISTAT website. Geometries are expressed in meters.
Usage
Get_Shapefile(
Year,
level = "LAU",
lightShp = TRUE,
autoAbort = FALSE,
centroids = FALSE
)
Arguments
Year |
Numeric. Reference year for the administrative units. |
level |
Character. Either |
lightShp |
Logical. If |
autoAbort |
Logical. Whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. |
centroids |
Logical. Whether to switch from polygon geometry to point geometry. In the latter case, the point is located at the centroid of the relevant area. |
Value
A spatial data frame of class data.frame
and sf
.
Source
<https://www.istat.it/it/archivio/222527>
Examples
library(magrittr)
Prov23_shp <- Get_Shapefile(2023, lightShp = TRUE, level = "NUTS-3", autoAbort = TRUE)
ggplot2::ggplot() + ggplot2::geom_sf(data = Prov23_shp) +
ggplot2::ggtitle("Italian provinces in 2023/01/01")
Download students' number data
Description
This functions downloads the data regarding the number of students, from the open website of the Italian Ministry of Education, University and Research
Usage
Get_nstud(
Year = 2023,
filename = c("ALUCORSOETASTA", "ALUCORSOINDCLASTA"),
verbose = TRUE,
show_col_types = FALSE,
autoAbort = FALSE
)
Arguments
Year |
Numeric or character. Reference school year (last available is 2023).
Available in the formats: |
filename |
Character. A string included in the name of the file to download.
By default it is Other file names are the following. The output is not currently supported by the remainder of the functions involving the number of students.
|
verbose |
Logical. If |
show_col_types |
Logical. If |
autoAbort |
Logical. Whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. |
Value
By default, a list of two tbl_df
, tbl
and data.frame
objects:
-
$ALUCORSOETASTA
: The number of students by school, school grade and age. It provides a higher number of school than the other element -
$ALUCORSOINDCLASTA
: The number of students and classes by school and school grade. This is a long-format dataframe.
Source
Examples
Get_nstud(2023, filename = "ALUCORSOINDCLASTA", autoAbort = TRUE)
Download the number of teachers in Italian schools by province
Description
This functions downloads the number of teachers by province from the open website of the Italian Ministry of Education, University and Research.
Usage
Get_nteachers_prov(
Year = 2023,
verbose = TRUE,
show_col_types = FALSE,
filename = c("DOCTIT", "DOCSUP"),
autoAbort = FALSE
)
Arguments
Year |
Numeric or character value. Reference school year for the school registry data (last available is 2023).
Available in the formats: |
verbose |
Logical. If |
show_col_types |
Logical. If |
filename |
Character. Which data to retrieve among the province counts of teachers/school personnel.
By default it is
|
autoAbort |
Logical. Whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. |
Details
Please notice that by default, the function returns the count of the number of tenured and temporary teachers.
If either the count of non-teaching personnel or the count of a single category of teaching personnel is needed, please adapt
the filename
argument accordingly.
Value
An object of class tbl_df
, tbl
and data.frame
.
Source
Examples
nteachers23 <- Get_nteachers_prov(2023, filename = "DOCTIT", autoAbort = TRUE)
nteachers23[, c(3,4,5)]
Aggregate the database of Italian public schools buildings at the municipality and province level
Description
This function transforms the output of the Util_DB_MIUR_num
function (which is detailed at the level of single school buildings) at the municipality/LAU and province/NUTS-3 level.
It also allows the user to classify the grade of centrality of municipalities through the variable Inner_area
.
Usage
Group_DB_MIUR(
data = NULL,
Year = 2023,
count_units = TRUE,
countname = "nbuildings",
count_missing = TRUE,
verbose = TRUE,
track_deleted = TRUE,
InnerAreas = TRUE,
ord_InnerAreas = FALSE,
input_InnerAreas = NULL,
autoAbort = FALSE,
...
)
Arguments
data |
Object of class |
Year |
Numeric or Character. The reference school year, if either |
count_units |
Logical. Whether the rows to aggregate at each level must be counted or not. True by default. |
countname |
character. The name of the variable indicating the number of schools included in each municipality of province,
if the argument 'count' is |
count_missing |
Logical. Whether the function should return two dataframes including the percentage of NAs in the |
verbose |
Logical. If |
track_deleted |
Logical. If |
InnerAreas |
Logical. Whether an indicator of the percentage of schools belonging to peripheral (Inner) areas mus be included or not. |
ord_InnerAreas |
Logical. Whether the Inner areas classification should be treated as an ordinal variable rather than as a binary one (see |
input_InnerAreas |
Object of class |
autoAbort |
Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. |
... |
Additional arguments to the function |
Details
Numerical variables are summarised by the mean; Boolean variables are summarised by the mean as well, thus they become frequency indicators. Qualitative values, if included, are summarised by the mode. Summary measures do not include NAs. The output dataframes are also detailed at the school order level (i.e. Primary, Midde, High school, or different orders). This means that rows are unique combinations of territorial unities and school order.
Value
An object of class list
including:
-
$Municipality_data
: object of classtbl_df
,tbl
anddata.frame
, the output dataframe detailed at the municipality level; all variables besides the first 5 (which identify the record) are numeric -
$Province_data
: object of class 'tbl_df', 'tbl' and 'data.frame', the output dataframe detailad at the province level; all variables besides the first 3 (which identify the record) are numeric -
$Municipality_missing
(Only ifcount_missing == TRUE
); object of classtbl_df
,tbl
anddata.frame
, the percentage of NAs in each variable at the municipality level. -
$Province_missing
: (Only ifcount_missing == TRUE
); object of class 'tbl_df', 'tbl' and 'data.frame', the percentage of NAs in each variable at the province level. -
$deleted
: character vector. The schools removed from the original dataframe for data quality reasons. This object is returned only iftrack_deleted == TRUE
Examples
library(magrittr)
DB23_MIUR <- example_input_DB23_MIUR %>% Util_DB_MIUR_num(verbose = FALSE) %>%
Group_DB_MIUR(InnerAreas = FALSE)
DB23_MIUR$Municipality_data[, -c(1,2,4)]
summary(DB23_MIUR$Municipality_data)
DB23_MIUR$Province_data[, -c(1,3)]
summary(DB23_MIUR$Province_data)
Aggregate the students number data by class at the municipality and province level
Description
This function creates two dataframes with the number of students, classes and students by class, aggregated at the province and municipality level
Usage
Group_nstud(
data = NULL,
Year = 2023,
check = TRUE,
verbose = TRUE,
check_registry = "Any",
InnerAreas = TRUE,
ord_InnerAreas = FALSE,
check_ggplot = FALSE,
missing_to_1 = FALSE,
input_Registry = NULL,
input_InnerAreas = NULL,
input_Prov_shp = NULL,
input_School2mun = NULL,
input_AdmUnNames = NULL,
autoAbort = FALSE,
...
)
Arguments
data |
Either an object of class |
Year |
Numeric or character value. The reference school year, if either of the |
check |
Logical. If |
verbose |
Logical. If |
check_registry |
Character. If |
InnerAreas |
Logical. If |
ord_InnerAreas |
Logical. If |
check_ggplot |
Logical. If |
missing_to_1 |
Logical. Only needed if |
input_Registry |
Object of class |
input_InnerAreas |
Object of class |
input_Prov_shp |
Object of class |
input_School2mun |
Object of class |
input_AdmUnNames |
Object of class |
autoAbort |
Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. |
... |
Additional arguments to the function |
Details
Numerical variables are summarised by the mean; Boolean variables are summarised by the mean as well, thus they become frequency indicators. Qualitative values, if included, are summarised by the mode. Summary measures do not include NAs.
Value
An object of class list
including:
-
$Municipality_data
: object of classtbl_df
,tbl
anddata.frame
, the output dataframe detailed at the municipality level -
$Province_data
: object of class 'tbl_df', 'tbl' and 'data.frame', the output dataframe detailad at the province level
Examples
Year <- 2023
nstud23_aggr <- Group_nstud(data = example_input_nstud23, Year = Year,
input_Registry = example_input_Registry23,
InnerAreas = FALSE,
input_School2mun = example_School2mun23)
summary(nstud23_aggr$Municipality_data[,c(46,47,48)])
summary(nstud23_aggr$Province_data[,c(44,45,46)])
Arrange the number of teachers per students in public Italian schools at the province level
Description
This function provides the average number of teachers per students in Italian public schools at the province level.
Usage
Group_teachers4stud(
Year = 2023,
input_nteachers = NULL,
nteachers_filename = c("DOCTIT", "DOCSUP"),
verbose = TRUE,
input_nstud_raw = NULL,
input_nstud_aggr = NULL,
autoAbort = FALSE,
...
)
Arguments
Year |
Numeric or character value. Reference school year for the school registry data (last available is 2022).
Available in the formats: |
input_nteachers |
Object of class |
nteachers_filename |
Character. If |
verbose |
Logical. If |
input_nstud_raw |
Object of class 'list', including two objects of class |
input_nstud_aggr |
Object of class |
autoAbort |
Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. |
... |
Arguments to |
Value
An object of class tbl_df
, tbl
and data.frame
Examples
input_nstud23 <- Get_nstud(2023, filename ="ALUCORSOINDCLASTA", autoAbort = TRUE)
Registry23 <- Get_Registry(2023, autoAbort = TRUE)
School2mun23 <- Get_School2mun(2023, input_Registry = Registry23, autoAbort = TRUE)
nstud23.aggr <- Group_nstud(Year = 2023, data = input_nstud23,
input_Registry = Registry23, input_School2mun = School2mun23,
autoAbort = TRUE)
input_nteachers23 <- Get_nteachers_prov(2023, autoAbort = TRUE)
teachers4stud <- Group_teachers4stud(Year = 2023,
input_nteachers = input_nteachers23,
input_nstud_aggr = nstud23.aggr, autoAbort = TRUE)
teachers4stud[, -c(1, 2, 10, 11)]
summary(teachers4stud)
Map school data
Description
This function displays a map of the data arranged trough the function Set_DB
.
It supports two kinds of map:
Interactive map (default option), which allows the user to visualize all the data in scope through the interactive popup, and
Static map (ggplot), which can be easily exported in
.pdf
objects.
The user must select a variable to display.
It is possible to insert either a readily-downloaded database obtained through the function Set_DB
or the basic inputs to plug in that function, other than an input shapefile. Relevant arguments not provided by the user will be download automatically, but not saved into the global environment. However we suggest to plug in at least some inputs, as otherwise the running time may be long.
This function generalises the functionalities of the more data-specific functions Map_School_Buildings
and Map_Invalsi
.
Usage
Map_DB(
data = NULL,
Year = 2023,
field,
level = "LAU",
plot = "mapview",
popup_height = 200,
col_rev = FALSE,
pal = "viridis",
input_shp = NULL,
region_code = c(1:20),
main_pos = "top",
main = "",
order = NULL,
autoAbort = FALSE,
only_observed = FALSE,
...
)
Arguments
data |
Object of class |
Year |
Numeric or Character. The reference school year, needed if either |
field |
Character. The variable to display in the map. |
level |
Character. The administrative level of detailed at which the target variable must be displayed. Either |
plot |
Character. The type of map to display; either |
popup_height |
Numeric. The height of the popup table in terms of pixels if the |
col_rev |
Logical. Whether the scale of the colour palette should be reverted or not. |
pal |
Character. The palette to use if the |
input_shp |
Object of class |
region_code |
Numeric. The NUTS-2 codes of the units that must be displayed.
If the level is set to |
main_pos |
Character.Where the header should be placed if the |
main |
Character. The title to display in the |
order |
Character. The educational level. Either |
autoAbort |
Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. |
only_observed |
Logical. Whether to remove unobserved areas from the plot. |
... |
Additional arguments for the input database, if not provided; see |
Value
If plot == "mapview"
, an object of class mapview
. Otherwise, if plot == "ggplot"
, an object of class gg
and ggplot
.
Examples
DB23 <- Set_DB(Year = 2023, level = "NUTS-3",
Invalsi_grade = c(10,13), NA_autoRM = TRUE,
input_Invalsi_IS = example_Invalsi23_prov, input_nstud = example_input_nstud23,
input_InnerAreas = example_InnerAreas,
input_School2mun = example_School2mun23,
input_AdmUnNames = example_AdmUnNames20220630,
nteachers = FALSE, BroadBand = FALSE, SchoolBuildings = FALSE)
Map_DB(DB23, field = "Students_per_class_13", input_shp = example_Prov22_shp, level = "NUTS-3",
col_rev = TRUE, plot = "ggplot")
Map_DB(DB23, field = "Inner_area", input_shp = example_Prov22_shp, order = "High",
level = "NUTS-3",col_rev = TRUE, plot = "ggplot")
Map_DB(DB23, field = "M_Mathematics_10", input_shp = example_Prov22_shp, level = "NUTS-3",
plot = "ggplot")
Display a map of Invalsi scores
Description
This function displays either a static or interactive map of the Invalsi scores, either at the municipality or province level. It supports two kinds of map:
Interactive map (default option), which allows the user to visualize all the data in scope through the interactive popup, and
Static map (ggplot), which can be easily exported in
.pdf
objects.
Usage
Map_Invalsi(
data = NULL,
Year = 2023,
subj_toplot = "ITA",
grade = 8,
level = "LAU",
main = "",
main_pos = "top",
region_code = c(1:20),
plot = "mapview",
pal = "viridis",
WLE = FALSE,
col_rev = FALSE,
popup_height = 200,
only_observed = FALSE,
verbose = TRUE,
input_shp = NULL,
autoAbort = FALSE
)
Arguments
data |
Object of class |
Year |
Numeric or character value. Reference school year for the data (last available is 2022/23).
Available in the formats: |
subj_toplot |
Character. The school subject to display in the map,
The school subject to include, one among:
|
grade |
Numeric. The school grade to chose. Either |
level |
Character. The level of aggregation of Invalsi census data. Either |
main |
Character. A customary title to the map. If |
main_pos |
Character.Where the header should be placed if the |
region_code |
Numeric. The NUTS-2 codes of the units that must be displayed.
If the level is set to |
plot |
Character. The type of map to display; either |
pal |
Character. The palette to use if the |
WLE |
Logical. Whether the variable to chose should be the average WLE score rather that the percentage of sufficient tests, if both are available. |
col_rev |
Logical. Whether the scale of the colour palette should be reverted or not, if the |
popup_height |
Numeric. The height of the popup table in terms of pixels if the |
only_observed |
Logical. Whether to remove unobserved areas from the plot. |
verbose |
Logical. If |
input_shp |
Object of class |
autoAbort |
Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. |
Value
If plot == "mapview"
, an object of class mapview
. Otherwise, if plot == "ggplot"
, an object of class gg
and ggplot
.
Examples
Map_Invalsi(subj = "Italian", grade = 13, level = "NUTS-3", Year = 2023, WLE = FALSE,
data = example_Invalsi23_prov, input_shp = example_Prov22_shp, plot = "ggplot")
Map_Invalsi(subj = "Italian", grade = 5, level = "NUTS-3", Year = 2023, WLE = TRUE,
data = example_Invalsi23_prov, input_shp = example_Prov22_shp, plot = "ggplot")
Display data fom the school buildings database
Description
This function displays a map of the data downloaded trough the Get_DB_MIUR
function.
It supports two kinds of map:
Interactive map (default option), which allows the user to visualize all the data in scope through the interactive popup, and
Static map (ggplot), which can be easily exported in
.pdf
objects.
Usage
Map_School_Buildings(
data = NULL,
field,
order = NULL,
level = "LAU",
region_code = c(1:20),
plot = "mapview",
pal = "viridis",
col_rev = FALSE,
popup_height = 200,
main_pos = "top",
main = "",
only_observed = FALSE,
verbose = TRUE,
input_shp = NULL,
autoAbort = FALSE,
...
)
Arguments
data |
Object of class |
field |
Character. The variable to display in the map. |
order |
Character. The school order. Either |
level |
Character. The administrative level of detailed at which the target variable must be displayed.
Either |
region_code |
Numeric. The NUTS-2 codes of the units that must be displayed.
If the level is set to |
plot |
Character. The type of map to display; either |
pal |
Character. The palette to use if the |
col_rev |
Logical. Whether the scale of the colour palette should be reverted or not, if the |
popup_height |
Numeric. The height of the popup table in terms of pixels if the |
main_pos |
Character. Where the header should be placed if the |
main |
Character. The customary title to display in the |
only_observed |
Logical. Whether to remove unobserved areas from the plot. |
verbose |
Logical. If |
input_shp |
Object of class |
autoAbort |
Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. |
... |
If |
Value
If plot == "mapview"
, an object of class mapview
. Otherwise, if plot == "ggplot"
, an object of class gg
and ggplot
.
Examples
library(magrittr)
DB23_MIUR <- example_input_DB23_MIUR %>%
Util_DB_MIUR_num(track.deleted = FALSE) %>%
Group_DB_MIUR(InnerAreas = FALSE, count_missing = FALSE)
DB23_MIUR %>% Map_School_Buildings(field = "School_bus",
order = "Primary",level = "NUTS-3", plot = "ggplot",
input_shp = example_Prov22_shp)
DB23_MIUR %>% Map_School_Buildings(field = "Railway_transport",
order = "High",level = "NUTS-3", plot = "ggplot",
input_shp = example_Prov22_shp)
DB23_MIUR %>% Map_School_Buildings(field = "Context_without_disturbances",
order = "Middle",level = "NUTS-3", plot = "ggplot",
input_shp = example_Prov22_shp, col_rev = TRUE)
Build up a comprehensive database regarding the school system
Description
This function generates a unique dataframe of the school system data including a customary choice of available datasets. This function allows the user to aggregate the desired datasets, when available, among these:
Invalsi census survey
School buildings
Number of students and school classes
Number of teachers
Broadband connection availability
To save as much time as possible it is possible to plug in ready-made input data; otherwise they will be downloaded automatically but not saved in the global environment When a new dataset is joined to the existing ones, it is possible that some observations in this datasets are missing. In this case, by default, the choice of keeping as much observational units as possible, or to remove units with missing variables is left to the user.
Usage
Set_DB(
Year = 2023,
level = "LAU",
conservative = TRUE,
Invalsi = TRUE,
SchoolBuildings = TRUE,
nstud = TRUE,
nteachers = TRUE,
BroadBand = TRUE,
verbose = TRUE,
show_col_types = FALSE,
Invalsi_subj = c("ELI", "ERE", "ITA", "MAT"),
Invalsi_grade = c(2, 5, 8, 10, 13),
Invalsi_WLE = FALSE,
SchoolBuildings_certifications = FALSE,
SchoolBuildings_include_numerics = TRUE,
SchoolBuildings_include_qualitatives = FALSE,
SchoolBuildings_row_cutout = FALSE,
SchoolBuildings_col_cut_thresh = 20000,
SchoolBuildings_flag_outliers = TRUE,
SchoolBuildings_count_missing = FALSE,
nstud_imputation_thresh = 19,
nstud_missing_to_1 = FALSE,
UB_nstud_byclass = 99,
LB_nstud_byclass = 1,
UB_nstud_byclass_grade = NULL,
LB_nstud_byclass_grade = NULL,
nstud_filter_by_grade = FALSE,
InnerAreas = TRUE,
ord_InnerAreas = FALSE,
nstud_check = TRUE,
nstud_check_registry = "Any",
BroadBand_impute_missing = TRUE,
Date = as.Date(paste0(substr(year.patternA(Year), 1, 4), "-09-01")),
NA_autoRM = NULL,
input_Invalsi_IS = NULL,
input_Registry = NULL,
input_SchoolBuildings = NULL,
input_nstud = NULL,
input_School2mun = NULL,
input_AdmUnNames = NULL,
input_InnerAreas = NULL,
input_teachers4student = NULL,
input_nteachers = NULL,
input_BroadBand = NULL,
autoAbort = FALSE
)
Arguments
Year |
Numeric or Character. The relevant school year. Available in the formats: |
level |
Character. The administrative level of detail at which data must be aggregated.
Either |
conservative |
Logical. If |
Invalsi |
Logical. Whether the Invalsi census data must be included (see |
SchoolBuildings |
Logical. Whether the school buildings dataset must be included (see |
nstud |
Logical. Whether the students number per class must be included (see |
nteachers |
Logical. Whether the number of teachers by province must be included (see |
BroadBand |
Logical. Whether the broadband availability in schools must be included (see |
verbose |
Logical. If |
show_col_types |
Logical. If |
Invalsi_subj |
Character. If |
Invalsi_grade |
Numeric. If |
Invalsi_WLE |
Logical. Whether to express Invalsi scores as averagev WLE score rather that the percentage of sufficient tests, if both are Invalsi_grade is either or |
SchoolBuildings_certifications |
Logical. If the school buldings database has to be downloaded, whether to include safety certifications. Only relevant from schol year 2020/21 onwards (see |
SchoolBuildings_include_numerics |
Logical. Whether to include strictly numeric variables alongside with Boolean ones in the school buildings database (see |
SchoolBuildings_include_qualitatives |
Logical. Whether to include qualitative variables alongside with Boolean ones in the school buildings database (see |
SchoolBuildings_row_cutout |
Logical. Whether to filter out rows including missing fields in the school buildings database (see |
SchoolBuildings_col_cut_thresh |
Numeric. The threshold of missing values allowed for each variable in the school buildings database (see |
SchoolBuildings_flag_outliers |
Logical. Whether to assign NA to outliers in numeric variables; see |
SchoolBuildings_count_missing |
Logical. Whether the function should return the percentage of NAs in the input school buildings database (see also |
nstud_imputation_thresh |
Numeric. If |
nstud_missing_to_1 |
Numeric. If |
UB_nstud_byclass |
Numeric. Either a unique value for all school orders, or a vector of three order-specific values in the order: primary, middle, high.
If focus is on class size, the upper limit of the acceptable school-level (if |
LB_nstud_byclass |
Numeric. Either a unique value for all school orders, or a vector of three order-specific values in the order: primary, middle, wide.
If focus is on class size, the lower limit of the acceptable school-level (if |
UB_nstud_byclass_grade |
Numeric. IF |
LB_nstud_byclass_grade |
Numeric. IF |
nstud_filter_by_grade |
Logical. If focus is on class size, whether to remove all school grades with average class size outside of the acceptance boundaries. |
InnerAreas |
Logical. Whether the percentage of schools belonging to inner/internal areas must be included (see |
ord_InnerAreas |
Logical. If |
nstud_check |
Logical. If |
nstud_check_registry |
Character. If |
BroadBand_impute_missing |
Whether the schools not included in the Broadband dataset must be considered in the total of schools (i.e. the denominator to the Broadband availability indicator). |
Date |
Character or Date. The threshold date to broadband activation to consider it activated for a school, i.e. the date before which the works of broadband activation must be finished in order to consider a school as provided with the broadband. By default, September 1st at the beginning of the school year. |
NA_autoRM |
Logical. Either |
input_Invalsi_IS |
Object of class |
input_Registry |
Object of class |
input_SchoolBuildings |
Object of class |
input_nstud |
Object of class |
input_School2mun |
Object of class |
input_AdmUnNames |
Object of class |
input_InnerAreas |
Object of class |
input_teachers4student |
Object of class |
input_nteachers |
Object of class |
input_BroadBand |
Object of classs |
autoAbort |
Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. |
Value
An object of class tbl_df
, tbl
and data.frame
See Also
Util_DB_MIUR_num
, Group_DB_MIUR
, Group_nstud
, Util_Check_nstud_availability
, Get_School2mun
for similar arguments.
Examples
DB23_prov <- Set_DB(Year = 2023, level = "NUTS-3",Invalsi_grade = c(5, 8, 13),
Invalsi_subj = "Italian",nteachers = FALSE, BroadBand = FALSE,
SchoolBuildings_count_missing = FALSE,NA_autoRM= TRUE,
input_SchoolBuildings = example_input_DB23_MIUR[, -c(11:18, 10:27)],
input_Invalsi_IS = example_Invalsi23_prov,
input_nstud = example_input_nstud23,
input_InnerAreas = example_InnerAreas,
input_School2mun = example_School2mun23,
input_AdmUnNames = example_AdmUnNames20220630)
DB23_prov
summary(DB23_prov[, -c(22:62)])
Map schools included in the ultra-broadband plan to their LAU codes.
Description
Helper function to provide the ultra-broadband dataset obtained with Get_BroadBand
with the statistical codes of the relevant municipalities, obtained with Get_School2mun
,
in case the ultra-broadband dataset has been downloaded with argument include_municipality_code = FALSE
.
Usage
Util_BroadBand2mun(
data,
input_School2mun = NULL,
input_Registry = NULL,
input_AdmUnNames = NULL,
verbose = FALSE,
autoAbort = FALSE
)
Arguments
data |
Object of class |
input_School2mun |
Object of class |
input_Registry |
If |
input_AdmUnNames |
If |
verbose |
Logical. If |
autoAbort |
Logical. Whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. |
Details
see Get_BroadBand
Value
An object of class tbl_df
, tbl
and data.frame
,
identical to the output of Get_BroadBand
with an additional column for LAU codes
Source
Broadband dashboard: <https://bandaultralarga.italia.it/scuole-voucher/dashboard-scuole/> . ISTAT LAU codes: <https://situas.istat.it/web/#/territorio>
Check how many schools in the school registries are included in the students count dataframe
Description
This function checks for which schools listed in the two registries (the buildings registry and the properly said schools registry)
the count of students is available. The first registry is referred to as as Registry_from_buildings
and the second one as Registry_from_registry
.
Usage
Util_Check_nstud_availability(
data,
Year,
cutout = c("IC", "IS", "NR"),
verbose = TRUE,
ggplot = TRUE,
toplot_registry = "Any",
InnerAreas = TRUE,
ord_InnerAreas = FALSE,
input_Registry = NULL,
input_InnerAreas = NULL,
input_Prov_shp = NULL,
input_AdmUnNames = NULL,
input_School2mun = NULL,
autoAbort = FALSE
)
Arguments
data |
Object of class |
Year |
Numeric or character value. Reference school year.
Available in the formats: |
cutout |
Character. The types of schools not to be taken into account (because not relevant or because they are out of scope in the students number section). By default |
verbose |
Logical. If |
ggplot |
Logical. If |
toplot_registry |
Character. If the |
InnerAreas |
Logical. Whether it must be checked if municipalities belong to inner areas or not. |
ord_InnerAreas |
Logical. Whether the inner areas classification should be treated as an ordinal variable rather than as a categorical one (see |
input_Registry |
Object of class |
input_InnerAreas |
Object of class |
input_Prov_shp |
Object of class |
input_AdmUnNames |
Object of class |
input_School2mun |
Object of class |
autoAbort |
Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. |
Value
An object of class list
including two elements:
-
$Municipality_data
-
$Province_data
Both the elements are objects of class list
including four elements:
-
$Registry_from_buildings
: object of class of classtbl_df
,tbl
anddata.frame
: the availability of the number of students in the schools listed in the buildings section. -
$Registry_from_registry
: object of class of classtbl_df
,tbl
anddata.frame
: the availability of the number of students in the schools listed in the registry section. -
$Any
: object of class of classtbl_df
,tbl
anddata.frame
: the availability of the number of students in the schools listed anywhere. -
$Both
: object of class of classtbl_df
,tbl
anddata.frame
: the availability of the number of students in the schools listed in both sections.
Source
Buildings Registry; Schools Registry
Examples
nstud23 <- Util_nstud_wide(example_input_nstud23, verbose = FALSE)
Util_Check_nstud_availability(nstud23, Year = 2023,
input_Registry = example_input_Registry23, InnerAreas = FALSE,
input_School2mun = example_School2mun23, input_Prov_shp = example_Prov22_shp)
Clean and convert the raw school buildings data to Boolean variables
Description
This function cleans the output of the Get_DB_MIUR
function from missing values in two steps:
First, it deletes both the columns exceeding a threshold of missing values (1000 by default) and the columns that cannot be converted into Boolean variables
Then, it deletes the rows in which missing values remain
Finally, the remaining data are converted into Boolean variables. It is possible to keep track of the deleted rows.
Usage
Util_DB_MIUR_bool(
data = NULL,
cutout = NULL,
col_cut_thresh = 10^3,
verbose = TRUE,
track_deleted = TRUE,
autoAbort = autoAbort,
...
)
Arguments
data |
Object of class |
cutout |
Character. The columns to cut out. If |
col_cut_thresh |
Numeric. The threshold of missing values allowed for each variable.
If a variable as a higher number of missing observations, then it is cut out. |
verbose |
Logical. If |
track_deleted |
Logical. If |
autoAbort |
Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. |
Value
If track_deleted == TRUE
, An object of class list
including two objects:
-
$data
: object of classtbl_df
,tbl
anddata.frame
, the output dataframe. All variables besides the first 8 ones (which identify the record) are numeric. -
$deleted
: character. The school codes corresponding to deleted rows
If track_deleted == FALSE
, the output is only the first element of the list.
Convert the raw school buildings data to numeric or Boolean variables
Description
This function transforms the output variables of the Get_DB_MIUR
into Boolean or Numeric.
Additionally, it removes the columns with an excessive number of missing observations (20.000 by default), and if required it may also delete the rows including missing fields.
In this case, it is possible to keep track of the deleted rows.
Usage
Util_DB_MIUR_num(
data = NULL,
include_numerics = TRUE,
include_qualitatives = FALSE,
row_cutout = FALSE,
track_deleted = TRUE,
verbose = TRUE,
col_cut_thresh = 20000,
flag_outliers = TRUE,
autoAbort = FALSE,
...
)
Arguments
data |
Object of class |
include_numerics |
Logical. Whether to include strictly numeric variables alongside with Boolean ones. |
include_qualitatives |
Logical. Whether to include qualitative variables alongside with Boolean ones. |
row_cutout |
Logical. Whether to filter out rows including missing fields. |
track_deleted |
Logical. If |
verbose |
Logical. If |
col_cut_thresh |
Numeric. The threshold of missing values allowed for each variable.
If a variable as a higher number of missing observations, then it is cut out. |
flag_outliers |
Logical. Whether to assign NA to outliers in numeric variables. |
autoAbort |
Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. |
... |
Additional arguments to the function |
Details
The outliers to be set to NA
if flag_outliers
is active are defined as follows: School area or free area surface of less than 50 squared meters,
building volume of less than 150 cubic meters, 0 floors in the building.
Value
If track_deleted == TRUE
, An object of class list
including two objects:
-
$data
: object of classtbl_df
,tbl
anddata.frame
, the output dataframe. -
$deleted
: object of classtbl_df
,tbl
anddata.frame
. The school IDs of the deleted units.
If track_deleted == FALSE
, the output is only the first element of the list.
Examples
library(magrittr)
DB23_MIUR_num <- example_input_DB23_MIUR %>% Util_DB_MIUR_num(track_deleted = FALSE)
DB23_MIUR_num[, -c(1,4,6,8,9,10)]
summary(DB23_MIUR_num)
Filter the Invalsi data by subject, school grade and year.
Description
This function filters the database of Invalsi scores (see Get_Invalsi_IS
) by school year, education grade and subject and returns a dataframe in wide format.
Each row corresponds to one territorial unit (either municipality or province); the numerical variables are three (the mean score, the score's standard deviation and the students coverage percentage) for each selected subject.
Usage
Util_Invalsi_filter(
data = NULL,
subj = c("ELI", "ERE", "ITA", "MAT"),
grade = 8,
level = "LAU",
WLE = FALSE,
Year = 2023,
verbose = TRUE,
autoAbort = FALSE
)
Arguments
data |
Object of class |
subj |
Character. The school subject(s) to include, among |
grade |
Numeric. The school grade to chose. Either |
level |
Character. The level of aggregation of Invalsi census data. Either |
WLE |
Logical. Whether the variable to choose should be the average WLE score rather that the percentage of sufficient tests, if both are available. |
Year |
Numeric or character value. Reference school year for the data (last available is 2022/23).
Available in the formats: |
verbose |
Logical. If |
autoAbort |
Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. |
Value
An object of class tbl_df
, tbl
and data.frame
. For all subjects and school grades, the variables indicate:
-
M
The mean score, either WLE or percentage of sufficient tests -
S
The standard deviation of the score -
C
The students coverage percentage (expressed in the scale 1 - 100)
Examples
Util_Invalsi_filter(subj = c("Italian", "Mathematics"), grade = 5, level = "NUTS-3", Year = 2023,
WLE = FALSE, data = example_Invalsi23_prov)
Util_Invalsi_filter(subj = c("Italian", "Mathematics"), grade = 5, level = "NUTS-3", Year = 2023,
WLE = TRUE, data = example_Invalsi23_prov)
Invalsi23_high <- Util_Invalsi_filter(subj = "Italian", grade = c(10,13), level = "NUTS-3",
Year = 2023, data = example_Invalsi23_prov)
summary(Invalsi23_high)
Clean the raw dataframe of the number of students and arrange it in a wide format
Description
This function rearranges the output of the Get_nstud
function in such a way to represent the
counts of students and, if required, either the number of students by class and number of classes, or
the counts of students per school timetable (running time) in a unique observation per school.
If the focus is on class size, this function firstly cleans the data from the outliers in terms of
average number of students by class at the school level and imputates the number of classes to 1 when missing.
Usage
Util_nstud_wide(
data = NULL,
missing_to_1 = FALSE,
nstud_imputation_thresh = 19,
UB_nstud_byclass = 99,
LB_nstud_byclass = 1,
filter_by_grade = FALSE,
UB_nstud_byclass_grade = NULL,
LB_nstud_byclass_grade = NULL,
verbose = TRUE,
autoAbort = FALSE,
...
)
Arguments
data |
Object of class |
missing_to_1 |
Logical. If focus is on class size, whether the number of classes should be imputed to 1 when it is missing and the number of students is below a threshold (argument |
nstud_imputation_thresh |
Numeric. If focus is on class size, the minimum threshold below which the number of classes is imputed to 1 if missing, if |
UB_nstud_byclass |
Numeric. Either a unique value for all school orders, or a vector of three order-specific values in the order: primary, middle, high.
If focus is on class size, the upper limit of the acceptable school-level (if |
LB_nstud_byclass |
Numeric. Either a unique value for all school orders, or a vector of three order-specific values in the order: primary, middle, wide.
If focus is on class size, the lower limit of the acceptable school-level (if |
filter_by_grade |
Logical. If focus is on class size, whether to remove all school grades with average class size outside of the acceptance boundaries. |
UB_nstud_byclass_grade |
Numeric. IF |
LB_nstud_byclass_grade |
Numeric. IF |
verbose |
Logical. If |
autoAbort |
Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. |
... |
Arguments to |
Details
In the example, we compare the dataframe obtained with the default settings and the one imposed setting narrow inclusion criteria
Value
An object of class tbl_df
, tbl
and data.frame
Examples
nstud.default <- Util_nstud_wide(example_input_nstud23)
nstud.narrow <- Util_nstud_wide(example_input_nstud23,
UB_nstud_byclass = 35, LB_nstud_byclass = 5 )
nrow(nstud.default)
nrow(nstud.narrow)
nstud.default
summary(nstud.default)
Subset of the administrative codes of municipalities
Description
This table includes the administrative codes of the municipalities from four regions: Molise, Campania, Apulia and Basilicata,
as of June 30th 2022; some strings in field Municipality_description
including accents have been forced to ASCII.
The whole dataset can be retrieved with the command Get_AdmUnNames(Date = "2022-06-30")
Usage
example_AdmUnNames20220630
Format
## 'example_AdmUnNames20220630' A data frame with 1,074 rows and 5 columns:
-
Province_code
Numeric; the NUTS-3 administrative code -
Province_initials
Character;abbreviated NUTS-3 denomination. -
Municipality_code
Character; the ISTAT LAU (municipality) ID. -
Municipality_description
Character; the municipality name. -
Cadastral_code
Character; a LAU - level ID code, different from the official ISTAT municipality code. It is used in the school registry (seeexample_input_Registry23
)
Source
<https://www.istat.it/it/archivio/6789>
See Also
Subset of the school registry in school year 2022/23
Description
This dataframe includes the classification of municipalities , from four regions: Molise, Campania, Apulia and Basilicata.
Only the first 10 columns are included;
some strings in field Municipality_description
including accents have been forced to ASCII.
The whole dataset can be retrieved with the command Get_InnerAreas()
.
For the definition of ISTAT inner areas class, see Get_InnerAreas
Usage
example_InnerAreas
Format
## 'example_InnerAreas' A data frame with 1074 rows and 10 columns:
-
Municipality_code
Character; the ISTAT LAU (municipality) ID. -
Municipality_code_numeric
Numeric; the ISTAT LAU (municipality) ID in numeric format. -
Cadastral_code
Character; a LAU - level ID code, different from the official ISTAT municipality code. -
Region_code
Numeric; the region (NUTS-2 administrative level) ID -
Region_description
Character; the region (NUTS-2 administrative level) name. -
Province_code
Numeric; the NUTS-3 administrative code. -
Province_initials
Character; abbreviated NUTS-3 denomination. -
Province_description
Character; the province (NUTS-3 administrative level) denomination. -
Municipality_description
Character; the municipality name. -
Inner_area_code_2014_2020
Character; the ISTAT inner areas classification between 2014 and 2020. -
Inner_area_description_2014_2020
Character; the description of the classes identified in the previous column -
Inner_area_code_2021_2027
Character; the ISTAT inner areas classification between 2021 and 2027. -
Inner_area_description_2021_2027
Character; the description of the classes identified in the previous column -
Destination_municipality_code
Character; For non-central municipalities (classes C, D, E, F), the ID of the closest pole municipality according to the 2021-2027 classification -
Destination_municipality_code
Character; The denomination of the municipalities in the previous column -
Destination_pole_code
Character; An internal ID convention for the destination poles; it includes a letter (the class of the destination pole, either A or B); a number of two digits (the region code of the destination pole) and the progressive number of poles within a region.
Source
<https://www.istat.it/it/archivio/273176>
See Also
Subset of the Invalsi scores in school year 2022/23
Description
This dataframe includes the Invalsi scores of the schools from four regions: Molise, Campania, Apulia and Basilicata, for the school year 2022/23.
The whole dataset can be retrieved with the command Get_Invalsi_IS(level = "NUTS-3")
Usage
example_Invalsi23_prov
Format
## 'example_Invalsi23_prov' A data frame with 240 rows and 11 columns:
-
Year
Character; the school year. -
Grade
Numeric; the school grade; only includes the school grades subjected to the Invalsi survey. Either 2, 5, 8, 10 or 13. -
Subject
Character; the school subject in which the test is taken; either Italian, Mathematics, English reading or English listening. -
Province_code
Numeric; the NUTS-3 administrative code. -
Province_initials
Character; abbreviated NUTS-3 denomination. -
Province_description
Character; the province (NUTS-3 administrative level) denomination. -
Average_percentage_score
Numeric; the province-level percentage of sufficient tests, only for primary schools; ranges 0-100. -
Std_dev_percentage_score
Numeric; the standard deviation of the percentage of sufficient tests, only for primary schools. -
WLE_average_score
Numeric; the province-level average WLE (Weighted Likelihood Estimator) score. -
Std_dev_WLE_score
Numeric; the standard deviation of WLE scores. -
Students_coverage
Numeric; the percentage of students for which the Invalsi tests are reported.
Source
<https://serviziostatistico.invalsi.it/en/archivio-dati/?_sft_invalsi_ss_data_collective=open-data>
See Also
Subset of Italian provinces shapefile
Description
This is the shapefile for the provinces belonging to four regions: Molise, Campania, Apulia and Basilicata,
as of January 1st 2022. These are the latest administrative units boundaries relevant at the beginning of the school year 2022/23.
The whole shapefile can be retrieved with the command Get_Shapefile(Year = 2022, level = "NUTS-3")
Usage
example_Prov22_shp
Format
## 'example_Prov22_shp' A Spatial polygon data frame with 13 rows/polygons and 15 columns:
-
COD_RIP
Numeric; the code for the macroarea (1 for Northwest, 2 for Northeast, 3 for Center, 4 for South and 5 for Isles) -
COD_REG
Numeric; the region (NUTS-2 administrative level) ID -
COD_PROV
Numeric; the NUTS-3 administrative code -
COD_CM
Numeric; the administrative code for Metropolitan Cities (which are always at the NUTS-3 level), obtained as 200 + NUTS-3 code, if the unit is a Metropolitan city; 0 otherwise. -
COD_UTS
Numeric; the administrative code for Metropolitan cities if the unit is a Metropolitan City; the province code otherwise. -
DEN_PROV
Character; the province (NUTS-3 administrative level) name, if the unit is not a Metropolitan City; blank otherwise. -
DEN_CM
Character; the Metropolitan City (NUTS-3 administrative level) name, if the unit is a Metropolitan City; blank otherwise. -
DEN_UTS
Character; the province or Metropolitan City (NUTS-3 administrative level) name. -
SIGLA
Character; abbreviated NUTS-3 denomination. -
TIPO_UTS
Character; the NUTS-3 type of the unit; either "Provincia" (Province) or "Citta metropolitana" (Metropolitan City) -
Shape_Leng
Numeric; the polygon perimeter. -
Shape_Area
Numeric; the polygon area. -
geometry
the polygon geometry.
Source
<https://www.istat.it/it/archivio/222527>
See Also
Association of the municipality code to a subset of public schools 2022/23
Description
This list maps the IDs of the schools from four regions (Molise, Campania, Apulia and Basilicata) to the corresponding LAU codes.
The whole dataset can be retrieved with the command Get_School2mun(2023)
Usage
example_School2mun23
Format
## 'example_School2mun23' A list of four elements
-
Registry_from_buildings
A data frame of 5527 rows and 5 columns, including the schools listed in the buildings registry. -
Registry_from_registry
A data frame of 5929 rows and 5 columns, including the schools listed in the schools registry. -
Any
A data frame of 5954 rows and 5 columns, including schools listed in any of the registryes -
Both
A data frame of 5510 rows and 5 columns, including schools listed in both registries
For each element, rows correspond to school IDs; the columns are:
-
School_code
Character; the school ID. -
Province_code
Numeric; the NUTS-3 administrative code. -
Province_initials
Character; abbreviated NUTS-3 denomination. -
Municipality_code
Character; the ISTAT LAU (municipality) ID. -
Municipality_description
Character; the municipality name.
Source
Buildings registry (2021 onwards); Buindings registry(until 2019); Schools registry
See Also
Subset of the school buildings database in school year 2022/23
Description
This dataframe includes the schools directly identifiable as primary, middle or high school, from four regions: Molise, Campania, Apulia and Basilicata.
Only the first 35 columns are included. Some strings including accents in fields Other_disturbances_proximity
,
Other_specific_criticalities
and Other
have been forced to ASCII.
The whole dataset can be retrieved with the command Get_DB_MIUR(2023)
Usage
example_input_DB23_MIUR
Format
## 'example_input_DB23_MIUR' A data frame with 7479 rows and 35 columns:
-
Year
Numeric; the school year. -
School_code
Character; the school ID. -
Order
Character; the school order, either primary, middle or high school. -
Reference_institute_code
Character; the ID of the reference institute. -
Building_code
Character; the building ID; the first 6 digits usually identify the municipality. -
Municipality_code
Character; the ISTAT LAU (municipality) ID. -
Municipality_description
Character; the municipality name. -
Province_initials
Character; abbreviated NUTS-3 denomination. -
Postal_code
Character; the ZIP code; slightly finer than municipality boundaries. for big municipalities. -
Context_without_disturbances
Character; whether the school belongs to an environment devoid of disturbances; otherwise, the types of disturbances are listed in columns 11 - 18. -
Dumps_proximity
Character; whether the school is close to dumps (disturbance element). -
Pollutant_industries_proximity
Character; whether the school is close to pollutant industries (disturbance element). -
Pollutant_waters_proximity
Character; whether the school is close to pollutant or stagnant streams or ponds (disturbance element). -
Air_pollution_sourcer_proximity
Character; whether the school is close to sources of air pollution (disturbance element). -
Acoustic_pollution_sourcer_proximity
Character; whether the school is close to sources of acoustic pollution (disturbance element). -
Electromagnetic_radiation_sources_proximity
Character; whether the school is close to sources of electromagnetic radiation (disturbance element). -
Graveyards_proximity
Character; whether the school is close to a graveyard (disturbance element). -
Other_disturbances_proximity
Character; other disturbance elements to which the school is close, other than those already listed. -
School_area_specific_criticalities
Character; whether any specific criticality element occurs inside the school area; specified in columns 20 - 27. -
Layby absence
Character; whether the access to the area pertaining to the school building lacks a lay-by or pitch (school area criticality element). -
Unfenced area
Character; whether the school building area lacks fences or enclosures (school area criticality element). -
Large_traffic
Character; whether the school area is close to large traffic streams (school area criticality element). -
Railway_traffic
Character; whether the school area is close to railway traffic streams (school area criticality element). -
Abandoned_industries
Character; whether the school area is located in pre-existences of abandoned industries (school area criticality element). -
Decayed_urban_area
Character; whether the school belongs or is close to a decayed area (school area criticality element). -
Risky_industries_proximity
Character; whether the school is close to perilous industrial areas (school area criticality element). -
Other_specific_criticalities
Character; specific criticality elements regarding the school area, other than those already listed. -
School_bus
Character; whether the school is reached by school-bus service. -
Urban_public_transport
Character; whether the school is served by a urban public transport station in the range of 250 meters. -
Interurban_public_transport
Character; whether the school is served by a inter-urban public transport station in the range of 500 meters. -
Railway_transport
Character; whether the school ranges 500 meters or less from a train station. -
Private_transport
Character; whether the school can be reached by private transport. -
Disabled_people_transport
Character; whether the school is provided with disabled people specific transport. -
Bicycle_lane
Character; whether the building is in proximity of a bicycle/bike lane. -
Other
Character; whether the building can be reached in any other specific way.
Source
Homepage; more in detail, the dataset blocks are downloaded respectively from: cols 10-18; cols 20-27; cols 28-35
See Also
Subset of the school registry in school year 2022/23
Description
This dataframe includes the schools directly identifiable as primary, middle or high school, from four regions: Molise, Campania, Apulia and Basilicata.
Only the first 10 columns are included.
The whole dataset can be retrieved with the command Get_Registry(2023)
Usage
example_input_Registry23
Format
## 'example_input_Registry23' A data frame with 5929 rows and 10 columns:
-
Year
Numeric; the school year. -
Area
Character; the macro-area of the municipality, i.e. North, Center or South. -
Region_description
Character; the region (NUTS-2 administrative level) name. -
Province_description
Character; the province (NUTS-3 administrative level) name. -
Reference_institute_code
Character; the ID of the reference institute. -
School_code
Character; the school ID. -
Cadastral_code
Character; a LAU - level ID code, different from the official LAU municipality code. The Italian Ministry of Education does provide this code in the place of the LAU code for both the Schools registry and the early school buildings DBs. -
Municipality_description
Character; the municipality name. -
School_address
Character; the school physical address. -
Postal_code
Character; the ZIP code, slightly finer than municipality boundaries for big municipalities.
Source
See Also
Subset of the students and classes counts in school year 2022/23
Description
This dataframe includes students and classes counts for the schools from four regions: Molise, Campania, Apulia and Basilicata.
The whole dataset can be retrieved with the command Get_nstud(2023, filename = "ALUCORSOINDCLASTA")
Usage
example_input_nstud23
Format
## 'example_input_nstud23' A data frame with 21208 rows and 7 columns:
-
Year
Numeric; the school year. -
School_code
Character; the school ID. -
Order
Character; the school order, either primary, middle or high school. -
Grade
Numeric; the school grade. -
Classes
Numeric; the count of classes of a given grade in each school -
Male_students
Numeric; the count of male students in all classes of a given educational grade in each school -
Female_students
Numeric; the count of female students in all classes of a given educational grade in each school