library(grouper)
library(ompr)
library(ompr.roi)
#library(ROI.plugin.gurobi)
library(ROI.plugin.glpk)
This vignette illustrates the use of the package on simple datasets, for which the optimal solutions are apparent from inspection.
The first dataset comprises just 4 students. Here is what it looks like. The name of this dataset indicates that it is for the diversity-based-assignment (dba) model and that it consists of the group composition (gc) information.
It is intuitive that an assignment into two groups of size two, based on the diversity of majors alone, should assign students 1 and 2 into the first group and the remaining two students into another group.
The corresponding YAML dba_gc_ex001.yml
file for this
exercise consists of the following lines:
n_topics: 2
R: 1
nmin: 2
nmax: 2
rmin: 1
rmax: 1
To run the assignment using only the primary major (ignoring the skill), we can use the following commands. We can use either the gurobi solver, or the glpk solver for this example. Both are equally fast.
# indicate appropriate columns using integer ids.
df_ex001_list <- extract_student_info(dba_gc_ex001, "diversity",
demographic_cols = 2, skills = 3,
self_formed_groups = 4)
yaml_ex001_list <- extract_params_yaml(system.file("extdata",
"dba_params_ex001.yml",
package = "grouper"),
"diversity")
m1 <- prepare_model(df_ex001_list, yaml_ex001_list, assignment="diversity",
w1=1.0, w2=0.0)
#result3 <- solve_model(m1, with_ROI(solver="gurobi"))
result3 <- solve_model(m1, with_ROI(solver="glpk"))
assign_groups(result3, assignment = "diversity", dframe=dba_gc_ex001,
group_names="groups")
#> topic rep group id major skill
#> 1 1 1 2 2 A 1
#> 2 1 1 3 3 B 3
#> 3 2 1 1 1 A 1
#> 4 2 1 4 4 B 3
We can see that students 1 and 2 have been assigned to topic 1, repetition 1. Students 3 and 4 have been assigned to topic 2, repetition 1.
# indicate appropriate columns using integer ids.
df_ex001_list <- extract_student_info(dba_gc_ex001, "diversity",
demographic_cols = 2, skills = 3,
self_formed_groups = 4)
yaml_ex001_list <- extract_params_yaml(system.file("extdata",
"dba_params_ex001.yml",
package = "grouper"),
"diversity")
m1a <- prepare_model(df_ex001_list, yaml_ex001_list, assignment="diversity",
w1=0.0, w2=1.0)
#result3 <- solve_model(m1a, with_ROI(solver="gurobi"))
result3 <- solve_model(m1a, with_ROI(solver="glpk"))
assign_groups(result3, assignment = "diversity", dframe=dba_gc_ex001,
group_names="groups")
#> topic rep group id major skill
#> 1 1 1 2 2 A 1
#> 2 1 1 3 3 B 3
#> 3 2 1 1 1 A 1
#> 4 2 1 4 4 B 3
get_solution(result3, smin)
#> smin
#> 4
get_solution(result3, smax)
#> smax
#> 4
We can see that students 1 and 2 have been assigned to topic 1, repetition 1. Students 3 and 4 have been assigned to topic 2, repetition 1.
This dataset demonstrates the use of a custom dissimilarity matrix instead of using the default Gower distance from the cluster package.
dba_gc_ex003
#> year major self_groups id
#> 1 1 math 1 1
#> 2 2 history 2 2
#> 3 3 dsds 3 3
#> 4 4 elts 4 4
Now consider a situation where we wish to consider years 1 and 2
different from years 3 and 4, and math
and
dsds
(STEM majors) to be different from elts
and history
(non-STEM majors). For each difference, we
assign a score of 1.
This means that students 1 and 2 would have a dissimilarity score of 1 due to their difference in majors. Students 1 and 3 would also have a score of 1, but due to their difference in years. Students 1 and 4 would have score of 2, due to their differences in majors and in years. The overall dissimilarity matrix would be:
To run the optimisation for this model, we can execute the following code:
df_ex003_list <- extract_student_info(dba_gc_ex003, "diversity",
skills = NULL,
self_formed_groups = 3,
d_mat=d_mat)
yaml_ex003_list <- extract_params_yaml(system.file("extdata",
"dba_params_ex003.yml",
package = "grouper"),
"diversity")
m3 <- prepare_model(df_ex003_list, yaml_ex003_list, w1=1.0, w2=0.0)
result <- solve_model(m3, with_ROI(solver="glpk", verbose=TRUE))
#> <SOLVER MSG> ----
#> GLPK Simplex Optimizer 5.0
#> 58 rows, 22 columns, 142 non-zeros
#> 0: obj = -0.000000000e+00 inf = 6.000e+00 (6)
#> 12: obj = -0.000000000e+00 inf = 1.110e-15 (0)
#> * 28: obj = 8.000000000e+00 inf = 7.772e-16 (0)
#> OPTIMAL LP SOLUTION FOUND
#> GLPK Integer Optimizer 5.0
#> 58 rows, 22 columns, 142 non-zeros
#> 22 integer variables, all of which are binary
#> Integer optimization begins...
#> Long-step dual simplex will be used
#> + 28: mip = not found yet <= +inf (1; 0)
#> + 43: >>>>> 4.000000000e+00 <= 7.000000000e+00 75.0% (5; 0)
#> + 48: mip = 4.000000000e+00 <= tree is empty 0.0% (0; 9)
#> INTEGER OPTIMAL SOLUTION FOUND
#> <!SOLVER MSG> ----
assign_groups(result, "diversity", dba_gc_ex003, group_names="self_groups")
#> topic rep group year major id
#> 1 1 1 1 1 math 1
#> 2 1 1 4 4 elts 4
#> 3 2 1 2 2 history 2
#> 4 2 1 3 3 dsds 3
As you can see, the members of the two groups have maximal difference between them - they differ in terms of their year, and in terms of their major.
In this example, we demonstrate that grouper
provides
the flexibility to constrain group sizes for individual topics. This
could be useful in a situation where a particular project topic may
require a larger group.
The dataset we use contains only skill levels (Python skills, higher corresponding to more skill).
Suppose we wish to assign the students to two topics, but the second topic requires 3 members, and the first requires only 2. In this example, we only utilise the skill levels; no demographic variables are included in the objective function.
df_ex004_list <- extract_student_info(dba_gc_ex004,
skills = 2,
self_formed_groups = 3,
d_mat=matrix(0, 5, 5))
yaml_ex004_list <- extract_params_yaml(system.file("extdata",
"dba_params_ex004.yml",
package = "grouper"),
"diversity")
m4 <- prepare_model(df_ex004_list, yaml_ex004_list, w1=0.0, w2=1.0)
result <- solve_model(m4, with_ROI(solver="glpk", verbose=TRUE))
#> <SOLVER MSG> ----
#> GLPK Simplex Optimizer 5.0
#> 89 rows, 34 columns, 234 non-zeros
#> 0: obj = -0.000000000e+00 inf = 7.000e+00 (7)
#> 28: obj = -4.000000000e+00 inf = 4.441e-16 (0)
#> * 29: obj = -4.440892099e-16 inf = 4.441e-16 (0)
#> OPTIMAL LP SOLUTION FOUND
#> GLPK Integer Optimizer 5.0
#> 89 rows, 34 columns, 234 non-zeros
#> 32 integer variables, all of which are binary
#> Integer optimization begins...
#> Long-step dual simplex will be used
#> + 29: mip = not found yet <= +inf (1; 0)
#> + 40: >>>>> 0.000000000e+00 <= 0.000000000e+00 0.0% (3; 0)
#> + 40: mip = 0.000000000e+00 <= tree is empty 0.0% (0; 5)
#> INTEGER OPTIMAL SOLUTION FOUND
#> <!SOLVER MSG> ----
assign_groups(result, "diversity", dba_gc_ex004, group_names="self_groups")
#> topic rep group id python
#> 1 1 1 2 2 1
#> 2 1 1 5 5 3
#> 3 2 1 1 1 1
#> 4 2 1 3 3 1
#> 5 2 1 4 4 2
Due to the constraints, topic 2 was assigned 3 members, while preserving the total skill level in each group (to be 4).
The second datasets comprises 8 students. Here is a listing of the dataset:
Each student is in a self-formed group of size 2, indicated via the
grouping
column. Suppose that, for this set of students,
the instructor wishes to assign students into two topics, with each
topic having two sub-groups. This requires the preference matrix to have
4 columns - one for each topic-subgroup combination. Remember that the
ordering of topics/subtopics should be:
T1S1, T2S1, T1S2, T2S2
There should be 4 rows in the preference matrix - one for each self-formed group.
pba_prefmat_ex002
#> col1 col2 col3 col4
#> [1,] 4 3 2 1
#> [2,] 3 4 2 1
#> [3,] 1 2 4 3
#> [4,] 1 2 3 4
It is possible to assign each self-formed group to its optimal choice of topic-subtopic combination. In our solution, we should see that group 1 is assigned to subtopic 1 of topic 1, group 2 is assigned to sub-topic 1 of topic 2, and so on.
df_ex002_list <- extract_student_info(pba_gc_ex002, "preference",
self_formed_groups = 2,
pref_mat = pba_prefmat_ex002)
yaml_ex002_list <- extract_params_yaml(system.file("extdata",
"pba_params_ex002.yml",
package = "grouper"),
"preference")
m2 <- prepare_model(df_ex002_list, yaml_ex002_list, "preference")
#result2 <- solve_model(m2, with_ROI(solver="gurobi"))
result2 <- solve_model(m2, with_ROI(solver="glpk"))
assign_groups(result2, assignment = "preference",
dframe=pba_gc_ex002, yaml_ex002_list,
group_names="grouping")
#> topic2 subtopic rep group size
#> 1 1 1 1 1 2
#> 2 2 1 1 2 2
#> 3 1 2 1 3 2
#> 4 2 2 1 4 2