Type: | Package |
Title: | The Hyperdirichlet Distribution, Mark 2 |
Version: | 3.1-0 |
Maintainer: | Robin K. S. Hankin <hankin.robin@gmail.com> |
Description: | A suite of routines for the hyperdirichlet distribution and reified Bradley-Terry; supersedes the 'hyperdirichlet' package; uses 'disordR' discipline <doi:10.48550/ARXIV.2210.03856>. To cite in publications please use Hankin 2017 <doi:10.32614/rj-2017-061>, and for Generalized Plackett-Luce likelihoods use Hankin 2024 <doi:10.18637/jss.v109.i08>. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
LazyData: | yes |
Depends: | methods, cubature, R (≥ 3.5.0) |
Suggests: | knitr, markdown, rmarkdown, testthat, bookdown, rticles, covr |
VignetteBuilder: | knitr |
Imports: | Rcpp (≥ 1.0-7), partitions, disordR (≥ 0.0-9), alabama, calibrator, Rdpack, magrittr, frab |
LinkingTo: | Rcpp |
URL: | https://github.com/RobinHankin/hyper2 |
BugReports: | https://github.com/RobinHankin/hyper2/issues |
RoxygenNote: | 7.1.1 |
RdMacros: | Rdpack |
NeedsCompilation: | yes |
Packaged: | 2024-05-31 13:26:53 UTC; rhankin |
Author: | Robin K. S. Hankin
|
Repository: | CRAN |
Date/Publication: | 2024-05-31 15:22:53 UTC |
The Hyperdirichlet Distribution, Mark 2
Description
A suite of routines for the hyperdirichlet distribution and reified Bradley-Terry; supersedes the 'hyperdirichlet' package; uses 'disordR' discipline <doi:10.48550/ARXIV.2210.03856>. To cite in publications please use Hankin 2017 <doi:10.32614/rj-2017-061>, and for Generalized Plackett-Luce likelihoods use Hankin 2024 <doi:10.18637/jss.v109.i08>.
Details
The DESCRIPTION file:
Package: | hyper2 |
Type: | Package |
Title: | The Hyperdirichlet Distribution, Mark 2 |
Version: | 3.1-0 |
Authors@R: | person(given=c("Robin", "K. S."), family="Hankin", role = c("aut","cre"), email="hankin.robin@gmail.com", comment = c(ORCID = "0000-0001-5982-0415")) |
Maintainer: | Robin K. S. Hankin <hankin.robin@gmail.com> |
Description: | A suite of routines for the hyperdirichlet distribution and reified Bradley-Terry; supersedes the 'hyperdirichlet' package; uses 'disordR' discipline <doi:10.48550/ARXIV.2210.03856>. To cite in publications please use Hankin 2017 <doi:10.32614/rj-2017-061>, and for Generalized Plackett-Luce likelihoods use Hankin 2024 <doi:10.18637/jss.v109.i08>. |
License: | GPL (>= 2) |
LazyData: | yes |
Depends: | methods, cubature, R (>= 3.5.0) |
Suggests: | knitr, markdown, rmarkdown, testthat, bookdown, rticles, covr |
VignetteBuilder: | knitr |
Imports: | Rcpp (>= 1.0-7), partitions, disordR (>= 0.0-9), alabama, calibrator, Rdpack, magrittr, frab |
LinkingTo: | Rcpp |
URL: | https://github.com/RobinHankin/hyper2 |
BugReports: | https://github.com/RobinHankin/hyper2/issues |
RoxygenNote: | 7.1.1 |
RdMacros: | Rdpack |
Author: | Robin K. S. Hankin [aut, cre] (<https://orcid.org/0000-0001-5982-0415>) |
Index of help topics:
B Normalizing constant for the hyperdirichlet distribution Extract.hyper2 Extract or replace parts of a hyper2 object NBA Basketball dataset Ops.hyper2 Arithmetic Ops Group Methods for hyper2 objects Ops.hyper3 Arithmetic Ops Group Methods for hyper3 objects Print Print methods T20 Indian Premier League T20 cricket as.ordertable Convert an order table with DNS entries to a nice order table attemptstable2supp3 Translate attempt tables to hyper3 support functions balance Enforce the zero power sum condition baseball Baseball results, following Agresti carcinoma Carcinoma dataset discussed by Agresti character_to_number Convert a character vector to a numeric vector chess Chess playing dataset consistency Consistency check for hyper2 objects constructor Formula 1 dataset: the constructors' championship counterstrike Counterstrike cplusplus Wrappers to c calls curling Curling at the Winter Olympics, 1998-2018 dirichlet Dirichlet distribution and generalizations equalp.test Hypothesis testing eurodance Eurovision Dance contest dataset eurovision Eurovision Song contest dataset fillup Fillup function formula1 Formula 1 dataset ggol Order statistics gradient Differential calculus handover Dataset on communication breakdown in handover between physicians head.hyper2 First few terms of a distribution hepatitis Hepatitis dataset discussed by Agresti hyper2 Basic functions in the hyper2 package hyper2-package The Hyperdirichlet Distribution, Mark 2 hyper3 Weighted probability vectors: 'hyper3' objects icons Dataset on climate change due to O'Neill increment Increment and decrement operators interzonal 1963 World Chess Championships javelin Javelin dataset jester Jester dataset karate Karate dataset karpov_kasparov_anand Karpov, Kasparov, Anand keep Keep or discard players length.hyper2 Length method for hyper2 objects loglik Log likelihood functions masterchef Masterchef series 6 matrix2supp Convert a matrix to a likelihood function maxp Maximum likelihood estimation moto MotoGP dataset mult_grid Kronecker matrix product functionality ordertable Order tables ordertable2points Calculate points from an order table ordertable2supp Translate order tables to support functions ordertrans Order transformation ordervec2supp3 Various functionality for races and hyper3 likelihood functions pairwise Pairwise comparisons pentathlon Pentathlon powerboat Powerboat dataset profile Profile likelihood and support psubs Substitute players of a hyper2 object pwa Player with advantage ranktable Convert rank tables to and from order tables rhyper2 Random 'hyper2' objects rhyper3 Random hyper3 objects rowing Rowing dataset, sculling rp Random samples from the prior of a 'hyper2' object rrace A random race with given BT strengths rrank Random ranks skating Figure skating at the 2002 Winter Olympics soling Sailing at the 2000 Summer Olympics - soling summary.hyper2 Summary method for hyper2 objects suplist Methods for suplist objects surfing Surfing dataset table_tennis Match outcomes from repeated table tennis matches tennis Match outcomes from repeated doubles tennis matches tidy Tidy up a hyper2 object universities New Zealand University ranking data volleyball Results from the NOCS volleyball league volvo Race results from the 2014-2015 Volvo Ocean Race zapweak Zap weak competitors zipf Zipf's law
A generalization of the Dirichlet distribution, using a more computationally efficient method than the hyperdirichlet package. The software is designed for the analysis of order statistics and team games.
Author(s)
Robin K. S. Hankin [aut, cre] (<https://orcid.org/0000-0001-5982-0415>)
Maintainer: Robin K. S. Hankin <hankin.robin@gmail.com>
References
-
R. K. S. Hankin (2010). “A Generalization of the Dirichlet Distribution”, Journal of Statistical Software, 33(11), 1-18, doi:10.18637/jss.v033.i11
R. K. S. Hankin 2017. “Partial Rank Data with the hyper2 Package: Likelihood Functions for Generalized Bradley-Terry Models”. The R Journal 9:2, pages 429-439, doi:10.32614/rj-2017-061
R. K. S. Hankin 2024. “Generalized Plackett-Luce Likelihoods”, Journal of Statistical Software, 109(8), 1-17, doi:10.18637/jss.v109.i08
Examples
icons
maxp(icons)
Normalizing constant for the hyperdirichlet distribution
Description
Numerical techniques for calculating the normalizing constant for the hyperdirichlet distribution
Usage
B(H, disallowed=NULL, give=FALSE, ...)
probability(H, disallowed=NULL, ...)
mgf(H, powers, ...)
dhyper2(ip,H,...)
dhyper2_e(e,H,include.Jacobian=TRUE)
mean_hyper2(H, normalize=TRUE, ...)
Jacobian(e)
e_to_p(e)
p_to_e(p)
Arguments
H |
Object of class hyper2 |
powers |
Vector of length |
disallowed |
Function specifying a subset of the simplex
over which to integrate; default |
e , p |
A vector; see details |
ip |
A vector of probabilities corresponding to |
include.Jacobian |
Boolean, with default |
give |
Boolean, with default |
normalize |
Boolean, indicates whether return value of
|
... |
Further arguments passed to |
Details
Function
B()
returns the normalizing constant of a hyperdirichlet likelihood function. Internally,p
is converted toe
(bye_to_p()
) and the integral proceeds over a hypercube. This function can be very slow, especially ifdisallowed
is used.Function
dhyper2(ip,H)
is a probability density function on the independent components of a unit-sum vector, that is,ip=indep(p)
. This function callsB()
each time so might be a performance bottleneck.Function
probability()
gives the probability of an observation from a hyperdirichlet distribution satisfying!disallowed(p)
.Function
mgf()
is the moment generating function, taking an argument that specifies the powers ofp
needed: the expectation of\prod_{i=1}^n {p_i}^{{\rm powers}[i]}
is returned.Function
mean_hyper2()
returns the mean value of the hyperdirichlet distribution. This is computationally slow (considermaxp()
for a measure of central tendency). The function takes anormalize
argument, not passed toadaptIntegrate()
: this is Boolean withFALSE
meaning to return the value found by integration directly, and defaultTRUE
meaning to normalize so the sum is exactly 1
Value
Function
B()
returns a scalar: the normalization constantFunction
dhyper2()
is a probability density function overindep(p)
Function
mean()
returns ak
-tuple with unit sumFunction
mgf()
returns a scalar equal to the expectation ofp^power
Functions
is.proper()
andvalidated()
return a BooleanFunction
probability()
returns a scalar, a (Bayesian) probability
Note
The adapt package is no longer available on CRAN; from 1.4-3, the
package uses adaptIntegrate
of the cubature package.
Author(s)
Robin K. S. Hankin
See Also
Examples
# Two different measures of central tendency:
# mean_hyper2(chess,tol=0.1) # takes ~10s to run
maxp(chess) # faster
# Using the 'disallowed' argument typically results in slow run times;
# use high tol for speed:
# probability(chess,disallowed=function(p){p[1]>p[2]},tol=0.5)
# probability(chess,disallowed=function(p){p[1]<p[2]},tol=0.5)
# Above should sum to 1 [they are exclusive and exhaustive events]
Extract or replace parts of a hyper2 object
Description
Extract or replace parts of a hyper2 object
Usage
## S3 method for class 'hyper2'
x[...]
## S3 replacement method for class 'hyper2'
x[index, ...] <- value
assign_lowlevel(x,index,value)
overwrite_lowlevel(x,value)
Arguments
x |
An object of class |
... |
Further arguments, currently ignored |
index |
A list with integer vector elements corresponding to the brackets whose power is to be replaced |
value |
Numeric vector of powers |
Details
These methods should work as expected, although the off-by-one issue might be a gotcha.
For the extract method, H[L]
, a hyper2
object is
returned. The replace method, H[L] <- value
, the index
specifies the brackets whose powers are to be overwritten; standard
disordR
protocol is used.
If the index argument is missing, viz H1[] <- H2
, this is a
special case. Argument H1
must be a hyper2
object, and
the idiom effectively executes H1[brackets(H2)] <- powers(H2)
,
but more efficiently (note that this operation is well-defined even
though the order of the brackets is arbitrary). This special case is
included in the package because it has a very natural C++
expression [function overwrite()
in the src/
directory]
that was too neat to omit.
Altering (incrementing or decrementing) the power of a single bracket
is possible using idiom like H[x] <- H[x] + 1
; this is
documented at Ops.hyper2
, specifically
hyper2_sum_numeric()
and a discussion is given at
increment.Rd
.
Functions assign_lowlevel()
and overwrite_lowlevel()
are
low-level helper functions and not really intended for the end-user.
Value
The extractor method returns a hyper2
object, restricted to the
elements specified
Note
Use powers()
and brackets()
to extract a numeric vector of
powers or a list of integer vectors respectively.
Replacement idiom H[x] <- val
cannot use non-trivial recycling.
This is because the elements of H
are stored in an arbitrary
order, but the elements of val
are stored in a particular order.
Also see function hyper2_sum_numeric()
.
Author(s)
Robin K. S. Hankin
See Also
Examples
data(chess)
chess["Topalov"]
chess[c("Topalov","Anand")]
chess[c("Anand","Topalov")]
# Topalov plays Anand and wins:
chess["Topalov"] <- chess["Topalov"]+1
chess[c("Topalov","Anand")] <- chess[c("Topalov","Anand")]-1
# Topalov plays *Kasparov* and wins:
chess["Topalov"] <- chess["Topalov"] + 1
chess[c("Topalov","Kasparov")] <- chess[c("Topalov","Kasparov")] -1
# magrittr idiom:
# chess["Topalov"] %<>% inc
# chess[c("Topalov","Kasparov")] %<>% dec
# overwriting idiom:
H <- hyper2(list("Topalov","X"),6)
chess[] <- H
H <- icons
Basketball dataset
Description
A point-by-point analysis of a basketball game
Usage
data(NBA)
Details
Dataset NBA_table
is a dataframe contains a point-by-point
analysis of a basketball match. Each row corresponds to a point scored.
The first column is the time of the score, the second is the number of
points scored, the third shows which team had possession at the start of
play, and the fourth shows which team scored. The other columns show
the players. Table entries show whether or not that particular player
was on the pitch when the point was scored.
Likelihood function NBA
is a hyper2
object that gives the
log-likelihood function for this dataset. There is a player named
“possession
” that is a reified entity representing the
effect of possession.
Object NBA_maxp
is not the result of running maxp(NBA)
; it
was obtained by repeatedly running maxp_simplex()
on a
fault-tolerant system [it triggers a known R bug, bugzilla id 17703,
giving a “wmmin not finite
” error]. It is not clear to me
that likelihood function NBA
has a well-defined global maximum.
Object NBA
poses difficulty for the numerical optimization
routines for some reason.
Note that function volley()
is not applicable because we need to
include possession.
These objects can be generated by running script
inst/NBA.Rmd
, which includes some further discussion and
technical documentation and creates file NBA.rda
which
resides in the data/
directory.
References
https://www.espn.com/nba/playbyplay?gameId=400954514
See Also
Examples
data(NBA)
dotchart(NBA_maxp)
Arithmetic Ops Group Methods for hyper2 objects
Description
Allows arithmetic operators “+
”, “*
” and
comparison operators “==
” and “!=
”, to be
used for hyper2 objects.
Specifically, H1 + H2
implements addition of two log-likelihood
functions, corresponding to incorporation of additional independent
observational data; and n*H1
implements H1+H1+...+H1
,
corresponding to repeated independent observations of the same data.
There are no unary operations for this class.
Usage
## S3 method for class 'hyper2'
Ops(e1, e2 = NULL)
## S3 method for class 'hyper2'
sum(x,...,na.rm=FALSE)
hyper2_add(e1,e2)
hyper2_sum_numeric(H,r)
Arguments
e1 , e2 |
Objects of class |
x , ... , na.rm |
In the |
H , r |
In function |
Details
If two independent datasets have hyper2
objects H1
and
H2
, then package idiom for combining these would be H1+H2
;
the additive notation “+
” corresponds to addition of the
support (or multiplication of the likelihood). So hyper2
objects are better thought of as support functions than likelihood
functions; this is reflected in the print method which explicitly
wraps the likelihood function in a “log()
”.
Idiom H1-H1
returns H1 + (-1)*H2
, useful for investigating
the difference between likelihood functions arising from two different
observations, or different probability models. An example is given in
inst/soling.Rmd
.
Testing for equality is not straightforward for two implementation
reasons. Firstly, the object itself is stored internally as a
stl
map
, which does not store keys in any particular
order; and secondly, the stl
set
class is used for the
brackets. A set does not include information about the order of its
elements; neither does it admit repeated elements. See examples.
Function hyper2_sum_numeric()
is defined so that idiom like
icons["L"] + 5
works as expected. This means that
icons["L"] <- icons["L"] + 3
and icons["L"] %<>%inc(3)
work (without this, one has to type icons["L"] <-
powers(icons["L"]) + 3
, which sucks).
Raising a hyper2
object to a power returns an error.
Value
Returns a hyper2
object or a Boolean.
Author(s)
Robin K. S. Hankin
Examples
chess2 <- hyper2(list("Kasparov","Karpov",c("Kasparov","Karpov")),c(2,3,-5))
chess + chess2
maxp(chess+chess2)
Arithmetic Ops Group Methods for hyper3 objects
Description
Allows arithmetic operators “+
”, “*
” and
comparison operators “==
” and “!=
”, to be
used for hyper3
objects.
Specifically, H1 + H2
implements addition of two log-likelihood
functions, corresponding to incorporation of additional independent
observational data; and n*H1
implements H1+H1+...+H1
,
corresponding to repeated independent observations of the same data.
Usage
## S3 method for class 'hyper3'
Ops(e1, e2 = NULL)
hyper3_add(e1,e2)
hyper3_prod(e1,n)
Arguments
e1 , e2 |
Objects of class |
n |
Numeric vector of length 1 |
Details
Pretty much everything documented here is a straightforward translation
of the corresponding hyper2
functionality.
Value
Returns a hyper3
object or a Boolean.
Author(s)
Robin K. S. Hankin
Examples
H1 <- hyper3(list(c(a=1.2),c(b=1),c(a=1.2,b=1)),powers=c(3,4,-7))
H2 <- hyper3(list(c(a=1.2),c(b=1.2),c(a=2.2,b=1.2)),powers=c(2,3,-5))
H1
H2
Print methods
Description
Print methods for hyper2
and hyper3
objects
Usage
## S3 method for class 'hyper2'
print(x, ...)
## S3 method for class 'hyper3'
print(x, ...)
Arguments
x |
An object of class |
... |
Further arguments, currently ignored |
Details
Used mainly for their side-effect of printing the log-likelihood
function. In the print method, a natural logarithm is indicated with
“log()
”—not “ln()
”—consistent with R
builtin terminology base::log()
.
The hyper2
print method is sensitive to option
give_warning_on_nonzero_power_sum
. If TRUE
, a warning is
issued if the powers have nonzero sum. This is usually what you want
because observations are typically multinomial; a warning indicates
nonzero sum of powers, which should prompt us to check the coding.
Vignette zeropower
gives a discussion of this issue.
Value
Returns the hyper2
or hyper3
object it was sent,
invisibly. Function pnv()
converts a named vector to a
character string that is used in the hyper3
print method.
Author(s)
Robin K. S. Hankin
Examples
data(chess)
chess
Indian Premier League T20 cricket
Description
Cricket dataset, T20 Indian Premier League 2008-2017
Usage
data(T20)
Details
Dataframe T20_table
has one row for each T20 IPL match in the
period 2008-2017 with the exception of seven drawn matches and three
no-result matches which were removed. Object T20
is a
likelihood function for the strengths of the 13 teams, and
T20_toss
is a likelihood function that also includes a toss
strength term.
These objects can be generated by running script inst/T20.Rmd
,
which is based on Chandel and Hankin 2019. This includes some further
discussion and technical documentation and creates file T20.rda
which resides in the data/
directory.
References
T. Chandel and R. K. S. Hankin 2019. “Analysing the impact of winning a coin toss in the Indian Premier League”. Auckland University of Technology.
Examples
summary(T20)
dotchart(T20_maxp)
Convert an order table with DNS entries to a nice order table
Description
Given an ordertable such as F1_table_2017
which is a
“wikitable” object, function as.ordertable()
returns a
nicified version in which entries such as DNS are replaced with zeros.
Finishing competitors are assigned numbers 1-n
with no gaps; the
function can be used to extract a subset of competitors.
Function ordertable2supp()
offers similar functionality but
returns a hyper2
object directly.
Usage
as.ordertable(w)
Arguments
w |
A generalized ordertable, a wikitable |
Details
Operates columnwise, and treats any entry not coercible to numeric as DNF.
Value
Returns an ordertable suitable for coercion to a hyper2
object.
Author(s)
Robin K. S. Hankin
See Also
Examples
as.ordertable(F1_table_2017)
ordertable2supp(as.ordertable(F1_table_2017[1:9,]))
Translate attempt tables to hyper3 support functions
Description
description here
Usage
attemptstable2supp3(a, decreasing, give.supp=TRUE, dnf.last=TRUE)
Arguments
a |
Data frame, see details |
decreasing |
Boolean, with |
give.supp |
Boolean, return the support function or the order statistic |
dnf.last |
Boolean, should NA entries count as coming last (TRUE) or be ignored (FALSE) |
Details
Function attemptstable2supp3()
is intended for use on attempts
tables like javelin
.
These objects can be generated by running script
inst/javelin.Rmd
, which includes some further discussion and
technical documentation, and creates file javelin.rda
which
resides in the data/
directory.
Value
Returns a hyper3
object
Author(s)
Robin K. S. Hankin
See Also
Examples
jj <- javelin_table[1:3,]
jj
attemptstable2supp3(jj)
Enforce the zero power sum condition
Description
Sometimes a hyper2
object is unbalanced in the sense that its
powers do not sum to zero. This is rectified by balance()
, which
modifies the power of the bracket corresponding to the sum of all pnames
accordingly.
Usage
balance(H)
Arguments
H |
object of class |
Details
This is just a convenience function, all it does is
H[pnames(H)] <- 0 H[pnames(H)] <- -sum(pnames(H)) H
Package vignette zeropower
discusses the zero power sum
condition.
Value
Returns a balanced hyper2
object
Author(s)
Robin K. S. Hankin
See Also
Examples
H <- hyper2()
H["a"] <- 6
H["b"] <- 3
H[c("a","c")] <- 7
H <- balance(H)
maxp(H)
Baseball results, following Agresti
Description
Results from repeated games among seven baseball teams, following Agresti
Usage
data(baseball)
Format
A hyper2
object that gives a likelihood function
Details
Agresti discusses results from seven baseball teams in the 1987 season of the Eastern Division of the American League.
A results table and likelihood function is given in the package as
baseball_table
and baseball
respectively. The maximum
likelihood estimate is given as baseball_maxp
, but can be
reproduced by maxp(baseball)
.
These objects can be generated by running script
inst/home_advantage.Rmd
, which includes some further discussion
and technical documentation, and creates file baseball.rda
which
resides in the data/
directory.
References
A. Agresti 2002. “Categorical data analysis”. John Wiley and Sons; p437
See Also
Examples
baseball_table
baseball_table[1:3,1:3]
home_away3(baseball_table[1:3,1:3],1.3)
Carcinoma dataset discussed by Agresti
Description
A dataset considered by Agresti. Seven clinicians are asked whether they see evidence for carcinoma on different patients.
Usage
data(carcinoma)
Format
A hyper2
object that gives a likelihood function
Details
Object carcinoma_table
is drawn from Agresti. The first seven
columns correspond to the seven clinicians A-G, the next is the count
of observations, and the remaining columns are fitted values according
to different models discussed by Agresti.
Object carcinoma
is a likelihood function (of class lsl
)
on the Bradley-Terry strengths of the seven clinicians. The
clinicians diagnosed the presence or absence of carcinoma on a total
of 118 patients in a blind rating scheme. The maximum likelihood
estimator for the clinicicans' Bradley-Terry strengths is given as
carcinoma_maxp
, which is computationally expensive to find.
The package also includes carcinoma_count
, which is a different
estimator for the Clinicians' BT strengths.
These objects can be generated by running script inst/carcinoma.Rmd
,
which includes some further discussion and technical documentation, and
creates file carcinoma.rda
which resides in the data/
directory.
References
A. Agresti, 2002. "Categorical data analysis". John Wiley and Sons. Table 13.1, p542.
See Also
Examples
pie(carcinoma_maxp)
Convert a character vector to a numeric vector
Description
Convert string descriptions of competitors into their number
Usage
character_to_number(char, pnames)
char2num(char, pnames)
Arguments
char |
Character vector to be converted |
pnames |
Names vector (usually |
Details
In earlier versions of this package, the internal mechanism of functions
such as ggrl()
, and all the C++
code, operated with the
competitors labelled with a non-negative integer; it is then natural to
refer to the competitors as p1
, p2
, etc.
However, sometimes the competitors have names (as in, for example, the
rowing
dataset). If so, it is more natural to refer to the
competitors using their names rather than an arbitrary integer.
Function character_to_number()
converts the names to numbers. If
an element of char
is not present in pnames
, an error is
returned (function char2num()
is an easy-to-type synonym). The
function is here because it is used in ggrl()
.
Author(s)
Robin K. S. Hankin
See Also
Examples
x <- sample(9)
names(x) <- sample(letters[1:9])
H <- rank_likelihood(x)
character_to_number(letters[1:3],pnames(H))
char2num(c("PB","L"),pnames(icons))
Chess playing dataset
Description
A tally of wins and losses for games between three chess players: Topalov, Anand, Karpov.
Usage
data(chess)
Details
(there are three chess datasets in the package, documented at
interzonal.Rd
[the 1963 World championship], kka.Rd
[Karpov-Kasparov-Anand dataset], and chess.Rd
[rock-paper-scissors using Topalov-Anand-Karpov])
This is a very simple dataset that can be used for illustration of
hyper2
idiom.
The players are:
Grandmaster Veselin Topalov. FIDE world champion 2005-2006; peak rating 2813
Grandmaster Viswanathan Anand. FIDE world champion 2000-2002, 2008; peak rating 2799
Grandmaster Anatoly Karpov. FIDE world champion 1993-1999; peak rating 2780
Observe that Topalov beats Anand, Anand beats Karpov, and Karpov beats Topalov (where “beats” means “wins more games than”).
The games thus resemble a noisy version of “rock paper scissors”.
The likelihood function does not record who played white; see
karpov_kasparov_anand
for such a dataset.
These objects can be generated by running script
inst/rock_paper_scissors.Rmd
, which includes some further
discussion and technical documentation and creates file chess.rda
which resides in the data/
directory.
File inst/ternaryplot_hyper2.Rmd
gives an example showing the
chess
likelihood function that uses
Ternary::ternaryPlot()
.
References
See Also
Examples
data(chess)
maxp(chess)
mgf(chess,c(Anand=2),tol = 0.1) # tolerance for speed
Consistency check for hyper2 objects
Description
Given a hyper2
object, calculate the maximum likelihood point in
two ways and plot one against the other to check for consistency.
Usage
consistency(H, plot=TRUE, ...)
Arguments
H |
A |
plot |
If |
... |
Further arguments, passed to |
Details
Given a hyper2
object, calculate the maximum likelihood
estimate of the players' strengths using maxp()
; then reverse
the pnames
attribute and calculate the players' strengths
again. These two estimates should be identical but small differences
highlight numerical problems. Typically, the differences are small if
there are fewer than about 25 players.
Reversing the pnames()
is cosmetic in theory but is a
non-trivial operation: for example, it changes the identity of the
fillup from the last player to the first.
Value
Returns a named three-row matrix with first row being the direct evaluate, second row being the reverse of the reversed evaluate, and the third being the difference
Author(s)
Robin K. S. Hankin
See Also
Examples
# consistency(icons)
x <- icons
y <- icons
pnames(y) <- rev(pnames(y))
gradient(x,indep(equalp(x)))
gradient(y,indep(equalp(y)))
Formula 1 dataset: the constructors' championship
Description
Race results from 2017 Formula One constructors' Championship
Usage
data(constructor)
Format
A hyper3
object that gives a likelihood function
Details
The Constructors championship runs parallel to the Formula 1 drivers' championship. The package currently includes data from 2020 and 2021; the following text applies to both years. I will add more years eventually.
Object constructor_2021_table
is a dataframe, taken from
Wikipedia, with rows corresponding to performance of a constructor.
Each constructor fields two cars in each race; the identity of the
driver is not important in this context (and indeed may change as the
season progresses). The first column is the name of the constructor,
the next 22 columns show the ranks of the constructors' cars, and the
final one is the points awarded. At each venue, the constructor's best
performance is listed first. The constructors' names change quite
frequently (e.g. “Red Bull Racing-TAG Heuer” raced 2016,2017, and
2018; “Red Bull Racing-Honda” raced 2019, 2020, and 2021. I am
not sure whether to treat these as separate entities or not; file
inst/constructor_names.txt
gives a dataframe of team names and
years they competed (not currently part of the package). The row names
of the dataframe cannot be the constructors because these are not
unique.
Object constructor_2021_maxp
gives the maximum likelihood
estimate for the constructors' strengths.
The corresponding hyper3
likelihood function
constructor_2021
is produced by ordertable2supp3()
.
These objects can be generated by running script inst/race3.Rmd
,
which includes some further discussion and technical documentation, and
creates file constructor.rda
which resides in the data/
directory.
References
Wikipedia contributors. (2022, April 14). 2021 Formula One World Championship. In _Wikipedia, The Free Encyclopedia_. Retrieved 05:16, April 17, 2022, from https://en.wikipedia.org/w/index.php?title=2021_Formula_One_World_Championship&oldid=1082745216
See Also
Examples
dotchart(constructor_2021_maxp)
Counterstrike
Description
A kill-by-kill analysis of a counterstrike game.
Usage
data(counterstrike)
Details
E-sports are a form of competition using video games. E-sports are becoming increasingly popular, with high-profile tournaments attracting over 400 million viewers, and prize pools exceeding US$20m.
Counter Strike: Global Offensive (CS-GO) is a multiplayer first-person shooter game in which two teams of five compete in an immersive virtual reality combat environment. CS-GO is distinguished by the ability to download detailed gamefiles in which every aspect of an entire match is recorded, it being possible to replay the match at will.
Statistical analysis of such gamefiles is extremely difficult, primarily due to complex gameplay features such as cooperative teamwork, within-team communication, and real-time strategic fluidity.
It is the task of the statistician to make robust inferences from such complex datasets, and here I discuss data from an influential match between “FaZe Clan” and “Cloud9”, two of the most successful E-sports syndicates of all time, when they competed at Boston 2018.
Dataset counterstrike
is a loglikelihood function for the
strengths of ten counterstrike players; counterstrike_maxp
is a
precomputed evaluate, and zacslist
the observations used to
calculate the loglikelihood function.
The probability model is similar to that of NBA
: when a player
kills (scores), this is taken to be a success of the whole team rather
than the shooter.
File inst/counterstrike.R
and inst/counterstrike_random.R
include some further randomisation tests and discussion.
The objects documented here can be generated by running script
inst/counterstrike.Rmd
, which includes some further discussion
and technical documentation and creates file counterstrike.rda
which resides in the data/
directory.
Counterstrike dataset kindly supplied by Zachary Hankin.
References
Examples
dotchart(counterstrike_maxp)
Wrappers to c calls
Description
Various low-level wrappers to C functions, courtesy of Rcpp
Usage
overwrite(L1, powers1, L2, powers2)
accessor(L,powers,Lwanted)
assigner(L,p,L2,value)
addL(L1,p1,L2,p2)
identityL(L,p)
evaluate(L, powers, probs, pnames)
differentiate(L, powers, probs, pnames, n)
differentiate_n(L, powers, probs, pnames, n)
Arguments
L , L1 , L2 , Lwanted |
Lists with character vector elements, used to specify the brackets of the hyperdirichlet distribution |
p , p1 , p2 , powers , powers1 , powers2 |
A numeric vector specifying the powers to which the brackets are raised |
value |
RHS in assignment, a numeric vector |
probs |
Vector of probabilities for evaluation of log-likelihood |
pnames |
Character vector of names |
n |
Integer specifying component to differentiate with respect to |
Details
These functions are not really intended for the end-user, as out-of-scope calls may cause crashes.
Value
These functions return a named List
Author(s)
Robin K. S. Hankin
Curling at the Winter Olympics, 1998-2018
Description
Data for women's Olympic Curling at the 2002 Winter Olympics.
Usage
data(curling)
Details
There are five datasets loaded by data("curling")
:
-
curling_table
, an order table for Winter Olympics years 1998,2002,2006,2010,2014, and 2018 for 13 countries. -
curling1
, a log likelihood function on the assumption that not attending (indicated byNA
) is equivalent to aDNS
in Formula 1 -
curling2
, a log likelihood function on the assumption that not attending is noninformative -
curling1_maxp
andcurling2_maxp
, corresponding evaluates
These objects can be generated by running script
inst/curling.Rmd
, which includes some further discussion and
technical documentation and creates file curling.rda
which
resides in the data/
directory.
Author(s)
Robin K. S. Hankin
References
-
Wikipedia contributors. Curling at the Winter Olympics [Internet]. Wikipedia, The Free Encyclopedia; 2021 Jan 7, 14:23 UTC [cited 2021 Jan 21]. Available from: https://en.wikipedia.org/w/index.php?title=Curling_at_the_Winter_Olympics&oldid=998891075
Examples
data(curling)
dotchart(curling1_maxp)
Dirichlet distribution and generalizations
Description
The Dirichlet distribution in likelihood (for p) form, including the generalized Dirichlet distribution due to Connor and Mosimann
Usage
dirichlet(powers, alpha)
GD(alpha, beta, beta0=0)
GD_wong(alpha, beta)
rdirichlet(n,H)
is.dirichlet(H)
rp_unif(n,H)
Arguments
powers |
In function |
alpha , beta |
A vector of parameters for the Dirichlet or generalized Dirichlet distribution |
beta0 |
In function |
H |
Object of class |
n |
Number of observations |
Details
These functions are really convenience functions.
Function rdirichlet()
returns random samples drawn from a
Dirichlet distribution. If second argument H
is a
hyper2
object, it is tested [with is.dirichlet()
] for
being a Dirichlet distribution. If so, samples from it are returned.
If not, (e.g. icons
), an error is given. If H
is not a
hyper2
object, it is interpreted as a vector of parameters
\alpha
[not a vector of powers].
Function rp_unif()
returns uniformly distributed vectors,
effectively using H*0
; but note that this uses Dirichlet
sampling which is much faster and better than the Metropolis-Hastings
functionality documented at rp.Rd
.
Functions GD()
and GD_wong()
return a likelihood
function corresponding to the Generalized Dirichlet distribution as
presented by Connor and Mosimann, and Wong, respectively. In
GD_wong()
, alpha
and beta
must be named vectors;
the names of alpha
give the names of
x_1,\ldots,x_k
and the last element of beta
gives the name of x_{k+1}
.
Note
A dirichlet distribution can have a term with zero power. But
this poses problems for hyper2
objects as zero power brackets
are dropped.
Author(s)
Robin K. S. Hankin
References
R. J. Connor and J. E. Mosimann 1969. “Concepts of independence for proportions with a generalization of the Dirichlet distribution”. Journal of the American Statistical Association, 64:194–206
T.-T. Wong 1998. “Generalized Dirichlet distribution in Bayesian Analysis”. Applied Mathematics and Computation, 97:165–181
See Also
Examples
x1 <- dirichlet(c(a=1,b=2,c=3))
x2 <- dirichlet(c(c=3,d=4))
x1+x2
H <- dirichlet(c(a=1,b=2,c=3,d=4))
rdirichlet(10,H)
colMeans(rdirichlet(1e4,H))
Eurovision Dance contest dataset
Description
Voting patterns from Eurovision Dance Contest 2008
Usage
data(eurovision)
Format
A hyper2
object that gives a likelihood
function.
Details
Object eurodance
is a hyper2
object that gives a
likelihood function for the skills of the 14 competitor countries in
2008 Eurovision Dance contest. Object eurodance_table
gives the
original dataset and eurodance_maxp
the evaluate of the
competitors' Plackett-Luce strengths.
The dataset is interesting because, in addition to the regular votes by each nation, there is an Expert jury vote as well. We may use Plackett-Luce likelihoods to compare the performance of the Expert jury with the national votes.
These objects can be generated by running script
inst/eurodance.Rmd
, which includes some further discussion and
technical documentation and creates file eurodance.rda
which
resides in the data/
directory.
References
-
Wikipedia contributors, “Eurovision Song Contest 2009—Wikipedia, The Free Encyclopedia”, 2018, https://en.wikipedia.org/w/index.php?title=Eurovision_Song_Contest_2009&oldid=838723921 [Online; accessed 13-May-2018].
P. M. E. Altham, personal communication
See Also
Examples
data(eurodance)
dotchart(eurodance_maxp)
Eurovision Song contest dataset
Description
Voting patterns from Eurovision 2009
Usage
data(eurovision)
Format
A hyper2
object that gives a likelihood
function.
Details
Object eurovision
is a hyper2
object that gives a
likelihood function for the skills of the 18 competitor countries in
semi-final 1 of the 2009 Eurovision Song contest. Object
eurovision_table
gives the original dataset and
eurovision_maxp
the evaluate of the competitors' Plackett-Luce
strengths.
The motivation for choosing this particular dataset is that Pat Altham
(Statistical Laboratory, Cambridge) considered it with a view to
discover similarities between voters. In the current analysis, the
likelihood function eurovision
assumes their independence.
These objects can be generated by running script
inst/eurovision.Rmd
, which includes some further discussion and
technical documentation and creates file eurovision.rda
which
resides in the data/
directory.
References
-
Wikipedia contributors, “Eurovision Song Contest 2009—Wikipedia, The Free Encyclopedia”, 2018, https://en.wikipedia.org/w/index.php?title=Eurovision_Song_Contest_2009&oldid=838723921 [Online; accessed 13-May-2018].
P. M. E. Altham, personal communication
See Also
Examples
data(eurovision)
dotchart(eurovision_maxp)
Fillup function
Description
Function fillup()
concatenates a vector with a ‘fillup’
value to ensure a unit sum; if given a matrix, attaches a column so
the rowsums are 1.
Function indep()
is the inverse: it removes the final element
of a vector, leaving only an independent set.
Usage
fillup(x,H=NULL,total=1)
indep(x)
Arguments
x |
Numeric vector |
H |
Object with |
total |
Total value for probability |
Details
Usually you want the total to be one, to enforce the unit sum
constraint. Passing total=0
constrains the sum to be
zero. This is useful when considering \delta p
; see the
example at gradient.Rd
.
Author(s)
Robin K. S. Hankin
See Also
Examples
fillup(c(1/2,1/3))
indep(c(1/2,1/3,1/6))
fillup(indep(icons_maxp))
fillup(indep(icons_maxp),icons)
Formula 1 dataset
Description
Race results from 2017 Formula One World Championship
Usage
data(formula1)
formula1_points_systems(top=11)
Arguments
top |
Number of drivers to retain in
|
Format
A hyper2
object that gives a likelihood function
Details
Object formula1
is a hyper2
object that gives a likelihood
function for the strengths of the competitors of the 2017 Formula One
(Drivers') World Championship. Object F1_table_2017
is an order table: a
data frame with rows being drivers, columns being venues, and entries
being places. Thus looking at the first row, first column we see that
Hamilton placed second in Austria.
The package uses files like inst/formula1_2017.txt
as primary
sources. These are generally copied from wikipedia, converted into
tab-separated clean seven bit ascii, and tidied up a little. I have
removed diacritics from names, so we see “Raikkonen
”,
“Perez
”, etc. Also where distinct drivers with the same
surname compete, I have indicated this, e.g. schumacher_R
is Ralf
Schumacher, schumacher_M
is Michael, and schumacher_Mick
is Mick; the underscore device means that quoting should not be needed
in R idiom. I have not been entirely consistent here, with Bruno Senna
appearing as “Senna_B
” and Nelson Piquet Junior appearing
as “PiquetJ
” [on the grounds that in these cases the
fathers, being more eminent, should be the primary eponym] although this might
change in the future.
Object F1_table_2017
is simply the first 20 columns of
read.table(inst/formula1_2017.txt)
and object
F1_points_2017
is column 21. The likelihood function
formula1
is ordertable2supp(F1_table_2017)
. The datasets
in the package are derived from text files in the inst/
directory
(e.g. formula1_2017.txt
) by script file
inst/f1points_Omaker.R
. Executing this script creates files like
formula1_results_2017.rda
.
The text files can be converted directly into ranktable
objects
and support functions as follows:
a <- read.table("formula1_2022.txt",header=TRUE) a <- a[,seq_len(ncol(a)-1)] # strips out the points column wikitable_to_ranktable(a) ordertable2supp(a)
Function formula1_points_system()
gives various possible points
systems for the winner, second, third, etc, placing drivers.
The constructors' championship is discussed at constructor.Rd
.
There is a large amount of documentation in the inst/
directory
in the form of Rmd files.
References
“Wikipedia contributors”, 2017 Formula One World Championship—Wikipedia, The Free Encyclopedia, 2018. https://en.wikipedia.org/w/index.php?title=2017_Formula_One_World_Championship&oldid=839923210 [Online; accessed 14-May-2018]
See Also
Examples
summary(formula1)
## Not run: #Takes too long
dotchart(maxp(formula1))
## End(Not run)
Order statistics
Description
Various functions for calculating the likelihood function for order statistics
Usage
ggrl(H, ...)
general_grouped_rank_likelihood(H, ...)
goodbad(winners,losers)
elimination(all_players)
rank_likelihood(M,times=1)
rankvec_likelihood(v)
race(v)
Arguments
H |
Object of class |
... |
Numeric or character vectors specifying groups of players with equal rank, with higher-ranking groups coming earlier in the argument list |
all_players , winners , losers |
Numeric or character vectors specifying competitors. See details |
M |
In function |
times |
Vector specifying the number of times each row is observed |
v |
A character vector specifying ranks. Thus
|
Details
These functions are designed to return likelihood functions, in the
form of lists of hyper2()
objects, for typical order statistics
such as the results of rowing heats or MasterChef tournaments.
Function ggrl()
is an easily-typed alias for
general_grouped_rank_likelihood()
.
Function goodbad()
is a convenience function for ggrl()
in which a bunch of contestants is judged. It returns a likelihood
function for the observation that the members of one subset were
better than those of another. Thus
goodbad(letters[1:3],letters[4:5])
corresponds to the
observation that d
and e
were put into an elimination
trial (and abc
were not).
Function elimination()
gives a likelihood function for situations
where the weakest player is identified at each stage and
subsequently eliminated from the competition. It is intended for
situations like the Great British Bake-off and Masterchef in which the
observation is which player was chosen to leave the show. In this
function, argument all_players
is sensitive to order, unlike
choose_winners()
and choose_losers()
(an integer
n
is interpreted as letters[seq_len(n)]
). Element
i
of all_players
is the i^\mathrm{th}
player
to be eliminated. Thus the first element of all_players
is the
first player to be eliminated (and would be expected to have the
lowest strength). The final element of all_players
is the last
player to be eliminated (or alternatively the only player not to be
eliminated).
Function rank_likelihood()
takes a matrix M
with rows
corresponding to a judge (or race); column names are interpreted as
competitor names. A named vector is coerced to a one-row matrix.
Each row of M
is an order statistic: thus c(3,4,2,1)
means that person 3 came first, person 4 came second, person 2 came
third and person 1 came last. Note that in data frames like
F1_table_2017
, each column is a race.
Function rankvec_likelihood()
takes a character vector of
competitors with the order of elements corresponding to the finishing
order; a Plackett-Luce likelihood function is returned. Thus
v=c("d","b","c","a")
corresponds to d
coming first,
b
second, c
third, and a
fourth. Function
race()
is an arguably more memorable synonym.
An example of race()
is given in inst/rowing.Rmd
, and
examples of ggrl()
are given in inst/loser.Rmd
and
inst/masterchef.Rmd
.
Author(s)
Robin K. S. Hankin
See Also
Examples
W <- hyper2(pnames=letters[1:5])
W1 <- ggrl(W, 'a', letters[2:4],'e') # 6-element list
W2 <- ggrl(W, 'b', letters[3:5],'a') # 6-element list
like_single_list(equalp(W1),W1)
like_series(equalp(W1),list(W1,W2))
if(FALSE){ # takes too long
# run 10 races:
r1 <- rrank(10,p=(7:1)/28)
colnames(r1) <- letters[1:7]
# Likelihood function for r1:
W <- rank_likelihood(r1)
# convert a rank table to a support function:
rank_likelihood(wikitable_to_ranktable(volvo_table))
H <- hyper2()
for(i in 1:20){
H <- H + race(sample(letters[1:5],sample(3,1),replace=FALSE))
}
equalp.test(H) # should not be significant (null is true)
H1 <- hyper2(pnames=letters[1:5])
H2 <- choose_losers(H1,letters[1:4],letters[1:2]) # {a,b} vs {c,d}; {a,b} lost
maxplist(H2,control=list(maxit=1)) # control set to save time
}
Differential calculus
Description
Given a hyper2
object and a point in probability space,
function gradient()
returns the gradient of the log-likelihood;
function hessian()
returns the bordered Hessian matrix. By
default, both functions are evaluated at the maximum likelihood estimate
for p
, as given by maxp()
.
Usage
gradient(H, probs=indep(maxp(H)))
hessian(H,probs=indep(maxp(H)),border=TRUE)
hessian_lowlevel(L, powers, probs, pnames,n)
is_ok_hessian(M, give=TRUE)
Arguments
H |
A |
L , powers , n |
Components of a |
probs |
A vector of probabilities |
pnames |
Character vector of names |
border |
Boolean, with default |
M |
A bordered Hessian matrix, understood to have a single
constraint (the unit sum) at the last row and column; the output of
|
give |
Boolean with default |
Details
Function gradient()
returns the gradient of the log-likelihood
function. If the hyper2
object is of size n
, then argument
probs
may be a vector of length n-1
or n
; in the
former case it is interpreted as indep(p)
. In both cases, the
returned gradient is a vector of length n-1
.
The function returns the derivative of the loglikelihood with respect to
the n-1
independent components of
\left(p_1,\ldots,p_n\right)
, namely
\left(p_1,\ldots,p_{n-1}\right)
. The fillup
value p_n
is calculated as
1-\left(p_1+\cdots + p_{n-1}\right)
.
Function gradientn()
returns the gradient of the loglikelihood
function but ignores the unit sum constraint. If the hyper2
object is of size n
, then argument probs
must be a vector
of length n
, and the function returns a named vector of length
n
. The last element of the vector is not treated differently from
the others; all n
elements are treated as independent. The sum
need not equal one.
Function hessian()
returns the bordered Hessian, a matrix
of size n+1\times n+1
, which is useful when using
Lagrange's method of undetermined multipliers. The first row and column
correspond to the unit sum constraint, \sum p_1=1
.
Row and column names of the matrix are the pnames()
of the
hyper2
object, plus “usc
” for “Unit Sum
Constraint”.
The unit sum constraint borders could have been added with idiom
magic::adiag(0,pad=1,hess)
, which might be preferable.
Function is_ok_hessian()
returns the result of the second
derivative test for the maximum likelihood estimate being a local
maximum on the constraint hypersurface. This is a generalization of the
usual unconstrained problem, for which the test is the Hessian's being
negative-definite.
Function hessian_lowlevel()
is a low-level helper function that
calls the C++ routine.
Further examples and discussion is given in file
inst/gradient.Rmd
. See also the discussion at maxp on the
different optimization routines available.
Value
Function gradient()
returns a vector of length n-1
with
entries being the gradient of the log-likelihood with respect to the
n-1
independent components of
\left(p_1,\ldots,p_n\right)
, namely
\left(p_1,\ldots,p_{n-1}\right)
. The fillup
value p_n
is calculated as
1-\left(p_1,\ldots,p_{n-1}\right)
.
If argument border
is TRUE
, function hessian()
returns an n
-by-n
matrix of second derivatives; the borders
are as returned by gradient()
. If border
is FALSE
,
ignore the fillup value and return an n-1
-by-n-1
matrix.
Calling hessian()
at the evaluate will not return exact zeros for
the constraint on the fillup value; gradient()
is used and this
does not return exactly zeros at the evaluate.
Author(s)
Robin K. S. Hankin
Examples
data(chess)
p <- c(1/2,1/3)
delta <- rnorm(2)/1e5 # delta needs to be quite small
deltaL <- loglik(p+delta,chess) - loglik(p,chess)
deltaLn <- sum(delta*gradient(chess,p + delta/2)) # numeric
deltaL - deltaLn # should be small [zero to first order]
H <- hessian(icons)
is_ok_hessian(H)
Dataset on communication breakdown in handover between physicians
Description
Object handover
is a likelihood function corresponding to a
dataset arising from 69 medical malpractice claims and concerns
handover (or hand-off) between physicians. This dataset was analysed
by Lin et al. (2009), and further analysed by Altham and Hankin
(2010). The computational methods are presented in the (unmaintained)
hyperdirichlet and aylmer packages and a further
discussion is given in the “integration” vignette of the
hyper2 package. The original dataset is handover_table
,
a three-by-three matrix of counts.
Usage
data(handover)
Details
These objects can be generated by running script
inst/handover.Rmd
, which includes some further discussion and
technical documentation, and creates file handover.rda
which
resides in the data/
directory.
References
Y. Lin and S. Lipsitz and D. Sinha and A. A. Gawande and S. E. Regenbogen and C. C. Greenberg, 2009. “Using Bayesian
p
-values in a2\times 2
table of matched pairs with incompletely classified data”. Journal of the Royal Statistical Society, Series C, 58:2P. M. E. Altham and R. K. S. Hankin, 2010. “Using recently developed software on a
2\times 2
table of matched pairs with incompletely classified data”. Journal of the Royal Statistical Society, series C, 59(2): 377-379R. K. S. Hankin 2010. “A generalization of the Dirichlet distribution”. Journal of Statistical software, 33:11
L. J. West and R. K. S. Hankin 2008. “Exact tests for two-way contingency tables with structural zeros”. Journal of Statistical software, 28:11
Examples
data(handover)
maxp(handover)
First few terms of a distribution
Description
First few terms in a hyperdirichlet distribution
Usage
## S3 method for class 'hyper2'
head(x, ...)
Arguments
x |
Object of class |
... |
Further arguments, passed to |
Details
Function is x[head(brackets(x), ...)]
Value
Returns a hyper2
object
Author(s)
Robin K. S. Hankin
Examples
p <- zipf(5)
names(p) <- letters[1:5]
H <- rank_likelihood(rrank(20,p))
head(H)
Hepatitis dataset discussed by Agresti
Description
A dataset considered by Agresti
Usage
data(hepatitis)
Format
A hyper2
object that gives a likelihood function
Details
Object hepatitis_table
is drawn from Agresti, table 12.16, page
533. Object hepatitis
is a likelihood function of class
lsl
and hepatitis_maxp
a pre-calculated evaluate.
These objects can be generated by running script inst/hepatitis.Rmd
,
which includes some further discussion and technical documentation, and
creates file hepatitis.rda
which resides in the data/
directory.
References
A. Agresti, 2002. "Categorical data analysis". John Wiley and Sons. Table 13.1, p542.
See Also
Examples
pie(hepatitis_maxp)
Basic functions in the hyper2 package
Description
Basic functions in the hyper2 package
Usage
hyper2(L=list(), d=0, pnames)
## S3 method for class 'hyper2'
brackets(H)
## S3 method for class 'hyper2'
powers(H)
## S3 method for class 'hyper2'
pnames(H)
## S3 method for class 'suplist'
pnames(H)
size(H)
as.hyper2(L,d,pnames)
is.hyper2(H)
is_valid_hyper2(L,d,pnames)
is_constant(H)
Arguments
H |
A hyper2 object |
L |
A list of character vectors whose elements specify the brackets of a hyper2 object |
d |
A vector of powers; |
pnames |
A character vector specifying the names of |
Details
These are the basic functions of the hyper2 package. Function
hyper()
is the low-level creator function; as.hyper2()
is
a bit more user-friendly and attempts to coerce its arguments into a
suitable form; for example, a matrix is interpreted as rows of brackets.
Functions pnames()
and pnames<-()
are the accessor and
setter methods for the player names. Length-zero character strings are
acceptable player names. The setter method pnames<-()
can be
confusing. Idiom such as pnames(H) <- value
does not change the
likelihood function of H
(except possibly its domain). When
called, it changes the pnames
internal vector, and will throw an
error if any element of c(brackets(H))
is not present in
value
. It has two uses: firstly, to add players who do not
appear in the brackets; and secondly to rearrange the pnames
vector (the canonical use-case is pnames(H) <- rev(pnames(H))
).
If you want to change the player names, use psubs()
to substitute
players for other players.
Function is_valid_hyper2()
tests for valid input, returning a
Boolean. This function returns an error if a bracket contains a
repeated element, as in hyper2(list(c("a","a")),1)
.
Note that it is perfectly acceptable to have an element of
pnames
that is not present in the likelihood function (this would
correspond to having no information about that particular player).
Function size()
returns the (nominal) length n
of
nonnegative vector p=\left(p_1,\ldots,p_n\right)
where p_1+\cdots+p_n=1
.
Author(s)
Robin K. S. Hankin
See Also
Ops.hyper2
,
Extract.hyper2
,
loglik
,
hyper2-package
psubs
Examples
o <- hyper2(list("a","b","c",c("a","b"),letters[1:3]),1:5)
# Verify that the MLE is invariant under reordering
pnames(icons) <- rev(pnames(icons))
maxp(icons) - icons_maxp # should be small
Weighted probability vectors: hyper3
objects
Description
Objects of class hyper3
are a generalization of hyper2
objects
that allow the brackets to contain weighted probabilities.
As a motivating example, suppose two players with Bradley-Terry
strengths p_1,p_2
play chess where we quantify the first-mover
advantage with a term \lambda
. If p_1
plays
white a+b
times with a
wins and b
losses, and
plays black c+d
times with c
wins and d
losses, then a sensible likelihood function might be
\left(\frac{\lambda p_1}{\lambda p_1 + p_2}\right)^{a}
\left(\frac{p_2 }{\lambda p_1 + p_2}\right)^{b}
\left(\frac{p_1 }{p_1 + \lambda p_2}\right)^{c}
\left(\frac{\lambda p_2}{p_1 + \lambda p_2}\right)^{d}
If a=1,b=2,c=3,d=4
and \lambda=1.3
appropriate
package idiom might be:
H <- hyper3() H[c(p1=1.3)] %<>% inc(1) # a=1 H[c(p2=1)] %<>% inc(2) # b=2 H[c(p1=1.3,p2=1)] %<>% dec(3) # a+b=1+2=3 H[c(p1=1)] %<>% inc(3) # c=3 H[c(p2=1.3)] %<>% inc(4) # d=4 H[c(p1=1,p2=1.3)] %<>% dec(7) # c+d=3+4=7 H > log( (p1=1)^3 * (p1=1, p2=1.3)^-7 * (p1=1.3)^1 * (p1=1.3, p2=1)^-3 * (p2=1)^2 * (p2=1.3)^4)
The general form of terms of a hyper3
object would be
\left(w_1p_1+\cdots+w_rp_r\right)^{\alpha}
; the
complete object would be
\mathcal{L}\left(p_1,\ldots,p_n\right)=
\prod_{j=1}^N\left(\sum_{i=1}^n
w_{ij}p_i\right)^{\alpha_i}
where we understand that p_n=1-\sum_{i=1}^{n-1}p_i
;
many of the weights might be zero. We see that the weights
w_{ij}
may be arranged as a matrix and this form is taken
by function hyper3_m()
.
Usage
hyper3(B = list(), W = list(), powers = 0, pnames)
hyper3_bw(B = list(), W = list(), powers = 0, pnames)
hyper3_nv(L=list(),powers=0,pnames)
hyper3_m(M,p,stripzeros=TRUE)
Arguments
B |
A list of brackets |
W |
A list of weights |
L |
A list of named vectors |
powers |
Numeric vector of powers |
pnames |
Character vector of player names |
M |
Matrix of weights, column names being player names |
p |
Vector of powers, length equal to |
stripzeros |
Boolean with default |
Details
Function
hyper3()
is the user-friendly creation method, which dispatches to a helper function depending on its arguments.Function
hyper3_bw()
takes a list of brackets (character vectors) and a list of weights (numeric vectors) and returns ahyper3
object.Function
hyper3_nv()
takes a list of named vectors and returns ahyper3
object.Function
hyper3_m()
takes a matrix with rows being the brackets (entries are weights) and a numeric vector of powers.Function
evaluate3()
is a low-level helper function that evaluates a log-likelihood at a point in probability space. Don't use this: use the user-friendlyloglik()
instead, which dispatches toevaluate3()
.Function
maxp3()
is a placeholder (it is not yet written). But the intention is that it will maximize the log-likelihood of ahyper3
object over the Bradley Terry strengths and any weights given. This might not be possible as envisaged right now; I present some thoughts ininst/kka.Rmd
.Function
list2nv()
converts a list of character vectors into a named vector suitable for use as argumente
of functioncheering3()
. It is used ininst/global_liveability_ranking.Rmd
.Function
as.namedvectorlist()
takes ahyper3
object and returns a disoRdered list of named vectors corresponding to the brackets and their weights.Function
setweight()
alters the weight of every occurrence of a set of players. It is vectorised, sosetweight(H,c("a","b"),88:89)
sets the weight ofa
to 88 andb
to 89. Replacement methods are defined, so “H["a"] <- as.weight(3)
” will set the weight of every occurrence of playera
to 3. IfH
is ahyper2
object, it will be coerced tohyper3
.
Value
Generally return or deal with hyper3
objects
Note
Functionality for hyper3
objects is generally indicated by adding
a “3
” to function names, eg gradient()
goes to
gradient3()
.
Author(s)
Robin K. S. Hankin
See Also
Examples
hyper3(B=list("a",c("a","b"),"b"),W=list(1.2,c(1.2,1),1),powers=c(3,4,-7))
hyper3(list(c(a=1.2),c(b=1),c(a=1.2,b=1)),powers=c(3,4,-7))
## Above two objects should be identical.
## Third method, send a matrix:
M <- matrix(rpois(15,3),5,3)
colnames(M) <- letters[1:3]
hyper3(M,c(2,3,-1,-5,1)) # second argument interpreted as powers
## Standard way to generate a hyper3 object is to create an empty object
## and populate it using the replacement methods:
a <- hyper3() # default creation method [empty object]
a[c(p1=1.3)] <- 5
a[c(p2=1 )] <- 2
a[c(p1=1.3,p2=1)] <- -7
a
chess3 # representative simple hyper3 object
H1 <- rankvec_likelihood(letters[sample(6)])
H2 <- rankvec_likelihood(letters[sample(6)])
H1["a"] <- as.weight(1.2) # "a" has some disadvantage in H1
H1[c("b","c")] <- as.weight(2:3) # "b" and "c" have some advantage in H1
H2[c("c","d")] <- as.weight(1.5) # "c" and "d" have some advantage in H2
H1+H2
Dataset on climate change due to O'Neill
Description
Object icons_matrix
is a matrix of nine rows and six columns,
one column for each of six icons relevant to climate change. The
matrix entries show the number of respondents who indicated which icon
they found most concerning. The nine rows show different classes of
respondents who were exposed to different subsets (of size four) of
the six icons.
The columns correspond to the different stimulus icons used, detailed
below. An extensive discussion is given in West and Hankin 2008, and
Hankin 2010; an updated analysis is given in the icons
vignette.
Object icons
is the corresponding likelihood function, which
can be created with saffy(icons_matrix)
.
The object is used in inst/ternaryplot_hyper2.Rmd
which shows a
ternary plot of random samples.
Usage
data(icons)
Details
The six icons were used in this study were:
- PB
polar bears, which face extinction through loss of ice floe hunting grounds
- NB
The Norfolk Broads, which flood due to intense rainfall events
- L
London flooding, as a result of sea level rise
- THC
The Thermo-haline circulation, which may slow or stop as a result of anthropogenic modification of the hydrological cycle
- OA
Oceanic acidification as a result of anthropogenic emissions of carbon dioxide
- WAIS
The West Antarctic Ice Sheet, which is calving into the sea as a result of climate change
Author(s)
Robin K. S. Hankin
Source
Data kindly supplied by Saffron O'Neill of the University of East Anglia
References
S. J. O'Neill and M. Hulme 2009. An iconic approach for representing climate change. Global Environmental Change, 19:402-410
I. Lorenzoni and N. Pidgeon 2005. Defining Dangers of Climate Change and Individual Behaviour: Closing the Gap. In Avoiding Dangerous Climate Change (conference proceedings), UK Met Office, Exeter, 1-3 February
R. K. S. Hankin 2010. “A generalization of the Dirichlet distribution”. Journal of Statistical software, 33:11
See Also
Examples
data(icons)
pie(icons_maxp)
equalp.test(icons)
Increment and decrement operators
Description
Syntactic sugar for incrementing and decrementing likelihood functions
Usage
inc(H, val = 1)
dec(H, val = 1)
trial(winners,players,val=1)
Arguments
H |
A hyper2 object |
winners , players |
Numeric or character vectors specifying the winning team and the losing team |
val |
Numeric |
Details
A very frequent operation is to increment a single term in a hyper2 object. If
> H <- hyper2(list("b",c("a","b"),"c",c("b","c")),c(2,4,3,5)) > H a * (a + b)^4 * b^2 * (b + c)^5 * c^3
Suppose we wish to increment the power of a+b
. We could do:
H[c("a","b")] <- H[c("a","b")] + 1
(see the discussion of hyper2_sum_numeric
at
Ops.hyper2.Rd
). Alternatively we could use magrittr
pipes:
H[c("a","b")] %<>% `+`(1)
But inc
and dec
furnish convenient idiom to accomplish the
same thing:
H[c("a","b")] %<>% inc
Functions inc
and dec
default to adding or subtracting 1,
but other values can be supplied:
H[c("a","b")] %<>% inc(3)
Or even
H[c("a","b")] %<>% inc(H["a"])
The convenience function trial()
takes this one step further and
increments the ‘winning team’ and decrements the bracket
containing all players. The winners are expected to be players.
> trial(c("a","b"),c("a","b","c")) > (a + b) * (a + b + c)^-1
Using trial()
in this way ensures that the powers sum to zero.
The inc
and dec
operators and the trial()
function are used in inst/kka.Rmd
.
Author(s)
Robin K. S. Hankin
Examples
data(chess)
## Now suppose we observe an additional match, in which Topalov beats
## Anand. To incorporate this observation into the LF:
trial("a",c("a","b"))
chess <- chess + trial("Topalov",c("Topalov","Anand"))
1963 World Chess Championships
Description
Likelihood functions for players' strengths in the fifth Interzonal tournament which occurred as part of the 1963 Chess world Championships in Stockholm, 1962.
Details
(there are three chess datasets in the package, documented at
interzonal.Rd
[the 1963 World championship], kka.Rd
[Karpov-Kasparov-Anand dataset], and chess.Rd
[rock-paper-scissors using Topalov-Anand-Karpov])
The 1963 World Chess Championship was notable for allegations of Soviet collusion. Specifically, Fischer publicly alleged that certain Soviet players had agreed in advance to draw all their games. The championship included an “interzonal” tournament in which 23 players competed in Stockholm; and a “Candidates” tournament in which 8 players competed in Curacao.
Likelihood functions interzonal
and interzonal_collusion
are created by files ‘inst/interzonal.Rmd’, which is heavily
documented and include some analysis. Object interzonal
includes
a term for drawing, (“draw
”), assumed to be the same for
all players; object interzonal_collusion
includes in addition to
draw
, a term for the drawing in Soviet-Soviet matches,
“coll
”.
Some other analysis is given in files
inst/curacao11962_threeplayers.R
and
inst/curacao1962_threeplayers_rest_monster.Rmd
.
See Also
Examples
pie(interzonal_maxp)
# samep.test(interzonal,c("Fischer","Geller")) # takes too long
Javelin dataset
Description
Results from the men's javelin, 2020 Summer Olympics.
-
javelin_table
, a dataframe in the form of an “attempts table”, detailing the throw distances of eight competitors (diacritics have been removed) for each of six throws -
javelin1
andjavelin2
Support functions corresponding to the weighted Plackett-Luce likelihood. The suffix “1” means that no-throws are counted as losing attempts; suffix “2” means that no-throws are ignored. -
javelin1_maxp
andjavelin2_maxp
are the corresponding maximum likelihood estimates for the players' strengths javelin_vector is a named vector with elements being the throw distances and names being the thrower
Usage
data(javelin)
Format
As detailed above
Details
These objects can be generated by running script
inst/javelin.Rmd
, which includes some further discussion and
technical documentation, and creates file javelin.rda
which
resides in the data/
directory.
See Also
Examples
pie(javelin1_maxp)
Jester dataset
Description
A likelihood function for the Jester datasets
Usage
data(jester)
Details
Object jester
is a likelihood function for the 91 jokes rated by
the first 150 respondents in file ‘jester_dataset_1_3.zip’, taken
from Goldberg et al. Object jester_maxp
is the result of running
maxp(jester)
. The results table of (nearly) all jokes and
respondents is given as jester_table
in which each row is a joke
and each column a respondent.
The dataset is interesting because it has been analysed by many workers,
including Goldberg, for patterns; here I assume that all the respondents
behave identically (but randomly). It is included here because it is a
very severe numerical challenge in the context of the hyper2
package. I am not convinced that maxjest
is even close to the
true evaluate.
Objects jester
, jester_table
, and jester_maxp
can
be generated by running script ‘inst/jester.Rmd’, which includes
some further technical documentation. This file takes about 10 minutes
to run.
References
Eigentaste: A Constant Time Collaborative Filtering Algorithm. Ken Goldberg, Theresa Roeder, Dhruv Gupta, and Chris Perkins. Information Retrieval, 4(2), 133-151. July 2001.
Examples
# maxp(jester) # takes too long
# Note that the possibly poor identification of the evaluate
# nevertheless allows us to reject the null of equality:
(LAM <- -2*(loglik(equalp(jester),jester)-loglik(jester_maxp,jester)))
pval <- pchisq(LAM,df=size(jester),lower.tail=FALSE)
Karate dataset
Description
Dataset from the 2018 World Karate Championships, men's 67kg. It is an example of a dataset with too many degrees of freedom to be analysed easily by the package.
Usage
data(karate)
Details
Object karate_table
is a dataframe of results showing results
from the 2018 World Karate Championships, men's 67kg; karate
is
the associated likelihood function. There are two maximum likelihood
estimates given; karate_maxp
, the evaluate as returned by
maxp()
, and karate_maxp
, returned by zermelo()
[the
value given by maxp()
itself is less likely].
These objects can be generated by running script inst/karate.Rmd
,
which includes some further discussion and technical documentation and
creates file karate.rda
which resides in the data/
directory.
Note
Table karate_table
misses uninformative matches, that is,
competitions with 0-0 results.
References
https://en.wikipedia.org/wiki/2018_World_Karate_Championships
See Also
Examples
summary(karate)
Karpov, Kasparov, Anand
Description
Data of three chess players: Karpov, Kasparov, and Anand. Includes two likelihood functions for the strengths of the players, and an array of game results
Details
(there are three chess datasets in the package, documented at
interzonal.Rd
[the 1963 World championship], kka.Rd
[Karpov-Kasparov-Anand dataset], and chess.Rd
[rock-paper-scissors using Topalov-Anand-Karpov])
The strengths of chess players may be assessed using the generalized
Bradley-Terry model. The karpov_kasparov_anand
hyper2
likelihood function allows one to estimate the players' strengths,
propensity to draw, and also the additional strength conferred by
playing white as personified by a draw monster and a white monster
draw
and white
respectively.
Object karpov_kasparov_anand
assumes that the draw potential
is the same for all three players; likelihood function
kka_3draws
allows the propensity to draw to differ between
the three players.
The reason that the players are different from those in the
chess
dataset is that the original data does not seem to be
available any more.
Dataset kka
refers to scorelines of matches between three
chess players (Kasparov, Karpov, Anand). It is a named numeric
vector with names such as
‘karpov_plays_white_beats_kasparov
’ which has value
18: we have a total of 18 games between Karpov and Kasparov in which
Karpov played white and beat Kasparov.
Object chess3
is a simple hyper3
object corresponding
to pairwise comparison with draws; chess3_maxp
is the
evaluate, conditional on the estimated white-player advantage and
draw proclivity. This object is created and discussed in
inst/kka.Rmd
. Array kka_array
presents the same
information in a 3D array.
All data drawn from https://www.chessgames.com
(search for
“Kasparov vs Karpov”, etc). Note that the database allows one
to sort by white wins or black wins (there is a ‘refine search’
tab at the bottom). Some searches have more than one page of results.
Numbers here downloaded 17 February 2019. Note that only
‘classical games’ are considered here (rapid and exhibition
games being ignored).
These objects can be generated by running script
inst/kka.Rmd
, which includes some further discussion and
technical documentation and creates file kka.rda
which
resides in the data/
directory.
See Also
Examples
karpov_kasparov_anand
# pie(maxp(karpov_kasparov_anand)) # takes ~10s
M <- kka_array[,,1] + 1i*kka_array[,,3]
home_away(M)
home_away3(M,lambda=1.2)
Keep or discard players
Description
Flawed functionality to keep or discard subsets of the players in
a hyper2
object or order table.
Usage
discard_flawed2(x, unwanted,...)
keep_flawed(H, wanted)
discard_flawed(H, unwanted)
Arguments
H |
A |
x |
An order table |
wanted , unwanted |
Players to keep or discard. May be character or integer or logical |
... |
Further arguments passed to
|
Details
Do not use these functions. They are here as object lessons in poor thinking. To work with a subset of competitors, see the example at as.ordertable.
Functions keep_flawed2()
and discard_flawed2()
take an
order table and keep or discard specified rows, returning a reduced
order table. This is not a trivial operation.
Functions keep_flawed()
and discard_flawed()
will either
keep or discard players specified in the second argument. It is not
clear to me that these functions have any reasonable probabilistic
interpretation and file inst/retain.Rmd
gives a discussion.
Given a wikitable or ordertable, it is possible to create a likelihood
function based on a subset of rows using the incomplete=TRUE
argument; see the example at ?ordertable2supp
. But this method
is flawed too because it treats non-finishers as if they finished in
the order of their rows.
Function as.ordertable()
is the correct way to consider a
subset of players in a wikitable.
Author(s)
Robin K. S. Hankin
See Also
Examples
maxp(icons)
discard_flawed(icons,c("OA","WAIS"))
## Not run: # (takes too long)
data("skating")
maxp(skating)[1:4] # numbers work, keep the first four skaters
maxp(keep_flawed(skating,pnames(skating)[1:4])) # differs!
## End(Not run)
Length method for hyper2 objects
Description
Length method for hyper2 objects, being the number of different brackets in the expression
Usage
## S3 method for class 'hyper2'
length(x)
Arguments
x |
hyper2 object |
Author(s)
Robin K. S. Hankin
Examples
data("oneill")
length(icons)
seq_along(icons)
Log likelihood functions
Description
Returns a log-likelihood for a given hyper2
or hyper3
object at a specific point in probability space
Usage
loglik(p, H, log = TRUE)
loglik_single(p,H,log=TRUE)
like_single_list(p,Lsub)
like_series(p,L,log=TRUE)
Arguments
H |
An object of class |
p |
A probability point. See details |
log |
Boolean with default |
L , Lsub |
A list of |
Details
Function loglik()
is a straightforward likelihood function. It
can take a vector of length n=size(H)
or size(H)-1
; if
given the vector
p=\left(p_1,\ldots,p_{n-1}\right)
it
appends the fillup value, and then returns returns the (log)
likelihood.
If p
is a matrix, the rows are interpreted as probability
points.
Function loglik_single()
is a helper function that takes a
single point in probability space. Functions
like_single_list()
and like_series()
are intended for
use with ggrl()
.
Note
Likelihood is defined up to an arbitrary multiplicative constant. Log-likelihood (also known as support) is defined up to an arbitrary additive constant.
If function loglik()
is given a probability vector of length
n
, the vector must satisfy the unit sum constraint (up to a
small tolerance). Also, it must be a named vector with names
(collectively) equal to the pnames
of argument H
.
> pnames(chess) [1] "Topalov" "Anand" "Karpov" > loglik(c(Topalov=0.7,Anand=0.2,Karpov=0.1),chess) [1] -69.45364 > loglik(c(Karpov=0.1,Topalov=0.7,Anand=0.2),chess) # identical, just a different order [1] -69.45364
But if given a vector of length n-1
[e.g. the value of
indep()
], then the names are ignored and the entries are
interpreted as the BT strengths of pnames(H)[seq_len(n-1)]
:
> loglik(c(0.7,0.2),chess) [1] -69.45364 > loglik(c(foo=0.7,bar=0.2),chess) # names are ignored [1] -69.45364
(the above applies for H
a hyper2
or hyper3
object).
Empty brackets are interpreted consistently: that is, zero whatever the probability vector (although the print method is not perfect).
Author(s)
Robin K. S. Hankin
See Also
Examples
data(chess)
loglik(c(1/3,1/3),chess)
loglik(rp(14,icons),icons)
## Not run: # takes too long
like_series(masterchef_maxp,masterchef)
like_series(indep(equalp(masterchef)),masterchef)
## End(Not run)
W <- hyper2(pnames=letters[1:6])
W1 <- ggrl(W, 'a', letters[2:5],'f') # 24-element list
W2 <- ggrl(W, c('a','b'), c('c','d'),c('e','f')) # 2^3=8 element list
like_single_list(rep(1/6,5),W1) # information from first observation
like_series(rep(1/6,5),list(W1,W2)) # information from both observations
# hyper3 objects:
H3 <- ordervec2supp3(letters[c(1,2,3,3,2,1,2)])
loglik(c(a=1,b=2,c=3)/6,H3)
loglik(c(a=1,c=3,b=2)/6,H3) # identical
Masterchef series 6
Description
Data from Australian Masterchef Series 6
Usage
data(masterchef)
Format
Object masterchef
is a list of hyper2
objects;
masterchef_pmax
and masterchef_constrained_pmax
are named
vectors with unit sum.
Details
The object is created using the code in inst/masterchef.Rmd
,
which is heavily documented. Not all the information available is
included in the likelihood function as some of the early rounds result
in an unmanageably large list. Inclusion is controlled by Boolean
vector doo
.
The definitive source is the coloured table on the wiki page.
References
Wikipedia contributors, “MasterChef Australia (series 6),” Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=MasterChef_Australia_(series_6)&oldid=758432561 (accessed January 5, 2017).
See Also
Examples
a1 <- indep(equalp(masterchef[[1]])) # equal strengths
a2 <- indep(masterchef_maxp) # MLE
a3 <- indep(masterchef_constrained_maxp) # constrained MLE
## Not run: # takes too long
like_series(a1, masterchef)
like_series(a2, masterchef)
like_series(a3, masterchef)
## End(Not run)
Convert a matrix to a likelihood function
Description
Functions to convert matrix observations to likelihood functions. Each row is an observation of some kind, and each column a player.
Function ordertable2supp()
is documented separately at
ordertable2supp
.
Usage
saffy(M)
volley(M)
Arguments
M |
A matrix of observations |
Details
Two functions are documented here:
-
saffy()
, which converts a matrix of restricted choices into a likelihood function; it is named for Saffron O'Neill. The canonical example would be Saffron's climate change dataset, documented aticons
. Functionsaffy()
returns the appropriate likelihood function for the dataset. -
volley()
, which converts a matrix of winning and losing team members to a likelihood function. The canonical example is the volleyball dataset. Each row is a volleyball game; each column is a player. An entry of 0 means “on the losing side”, an entry of 1 means “on the winning side”, and an entry ofNA
means “did not play”.
Author(s)
Robin K. S. Hankin
See Also
Examples
icons == saffy(icons_table) # should be TRUE
volley(volleyball_table) == volleyball # also should be TRUE
Maximum likelihood estimation
Description
Find the maximum likelihood estimate for p, also equal probabilities
Usage
maxp(H, startp=NULL, give=FALSE, fcm=NULL, fcv=NULL, SMALL=1e-6, n=1,
show=FALSE, justlikes=FALSE, ...)
maxplist(Hlist, startp=NULL, give=FALSE, fcm=NULL, fcv=NULL, SMALL=1e-6, ...)
maxp_single(H, startp=NULL, give=FALSE, fcm=NULL, fcv=NULL, SMALL=1e-6,
maxtry=100, ...)
maxp_single2(H, startp=NULL, give=FALSE, fcm=NULL, fcv=NULL, SMALL=1e-6,
maxtry=100, ...)
maxp_simplex(H, n=100, show=FALSE, give=FALSE, ...)
maxp_lsl(HLSL, startp = NULL, give = FALSE, fcm = NULL, fcv = NULL, SMALL=1e-6, ...)
equalp(H)
Arguments
H |
A |
Hlist |
A list with elements all |
HLSL |
An |
startp |
A vector of probabilities specifying the start-point for
optimization; if a full unit-sum vector, then the fill-up value will
be removed by |
give |
Boolean, with default |
fcm , fcv |
Further problem-specific constraints |
n |
Number of start points to use |
show |
Boolean, with |
justlikes |
Boolean, with |
SMALL |
Numerical minimum for probabilities |
maxtry |
Integer specifying maximum number of times to try
|
... |
Further arguments which |
Details
Function maxp()
returns the maximum likelihood estimate for
p
, which has the unit sum constraint implemented.
Function maxplist()
does the same but takes a list of
hyper2
objects (for example, the output of ggrl()
).
Note that maxplist()
does not have access to the gradient of
the objective function, which makes it slow.
If function maxp()
is given a suplist
object it
dispatches to maxplist()
.
Functions maxp_single()
and maxp_single2()
are helper
functions which perform a single constrained optimization using
base::constrOptim()
or alabama::constrOptim.nl()
respectively. The functions should produce identical (or at least
very similar) results. They are used by maxp()
and
maxp_simplex()
which dispatch to either maxp_single()
or
maxp_single2()
depending on the value of option
use_alabama
. If TRUE
, they will use (experimental)
maxp_single2()
, otherwise (default) maxp_single()
.
Function maxp_single()
is prone to the “wmmin not
finite” bug [bugzilla id 17703] but on the other hand is a bit
slower. I am not sure which one is better at this time.
Function maxp_simplex()
is intended for complicated or flat
likelihood functions where finding local maxima might be a problem.
It repeatedly calls maxp_single()
, starting from a different
randomly chosen point in the simplex each time. This function does
not take fcm
or fcv
arguments, it operates over the
whole simplex (hence the name). Further arguments, ...
, are
passed to maxp_single()
.
The functions do not work for the masterchef_series6
likelihood
function. These require a bespoke optimization as shown in the
vignette.
Function equalp()
returns the value of p
for which all
elements are the same.
In functions maxp()
etc, arguments fcm
and fcv
implement linear constraints to be passed to constrOptim()
.
These constraints are in addition to the usual nonnegativity
constraints and unit-sum constraint, and are added to the ui
and ci
arguments of constrOptim()
with rbind()
and c()
respectively. The operative lines are in
maxp_single()
:
UI <- rbind(diag(nrow = n - 1), -1, fcm) CI <- c(rep(SMALL, n - 1), -1 + SMALL, fcv)
where in UI
, the first n-1
rows enforce nonnegativity of
p_i
, 1\leq p < n
; row n
enforces
nonnegativity of the fillup value p_n
; and the remaining
(optional) rows enforce additional linear constraints. Argument
CI
is a vector with corresponding elements.
Examples of their use are given in the “icons” vignette.
Note
In manpages elsewhere, n=2
is sometimes used. Previous
advice was to use n=10
or greater in production work, but I
now think this is overly cautious and n=1
is perfectly
adequate unless the dimension of the problem is large.
The (bordered) Hessian is given by function hessian()
,
documented at gradient.Rd
; use this to assess the
“sharpness” of the maximum.
Function maxp()
takes hyper2
or hyper3
objects
but it does not currently work with lsl
objects; use
maxp_lsl()
.
The built-in datasets generally include a pre-calculated result of
running maxp()
; thus hyper2 object icons
and
icons_maxp
are included in the same .rda
file.
Function maxp()
can trigger a known R bug (bugzilla id 17703)
which reports “wmmin is not finite
”. Setting option
use_alabama
to TRUE
makes the package use a different
optimization routine.
Author(s)
Robin K. S. Hankin
See Also
Examples
maxp(icons)
W <- hyper2(pnames=letters[1:5])
W1 <- ggrl(W, 'a', letters[2:3],'d') # W1 is a suplist object
## Not run: maxp(W1) # takes a long time to maximize a suplist
MotoGP dataset
Description
Race results from the 2019 Grand Prix motorcycling season
Usage
data(moto)
Details
Object moto_table
is a dataframe of results showing ranks of 28
drivers (riders?) in the 2019 FIM MotoGP World Championship. The format
is standard, that is, can be interpreted by function
ordertable2supp()
if the final points column is removed. The
corresponding support function is motoGP_2019
.
These objects can be generated by running script inst/moto.Rmd
,
which includes some further discussion and technical documentation and
creates file moto.rda
which resides in the data/
directory.
Note
Many drivers have names with diacritics, which have been removed from the dataframe.
References
Wikipedia contributors. (2020, February 8). 2019 MotoGP season. In Wikipedia, The Free Encyclopedia. Retrieved 08:16, February 20, 2020, from https://en.wikipedia.org/w/index.php?title=2019_MotoGP_season&oldid=939711064
See Also
Examples
pie(moto_maxp)
Kronecker matrix product functionality
Description
Peculiar version of expand.grid()
for matrices
Usage
mult_grid(L)
pair_grid(a,b)
Arguments
L |
List of matrices |
a , b |
Matrices |
Details
Function pair_grid(a,b)
returns a matrix with each column of
a
cbind()
-ed to each column of b
.
Function mult_grid()
takes a list of matrices; it is designed for
use by ggrl()
.
Author(s)
Robin K. S. Hankin
See Also
Examples
pair_grid(diag(2),diag(3))
mult_grid(lapply(1:4,diag))
Order tables
Description
Order tables
Details
The package makes extensive use of order tables and these are discussed
here together with a list of order tables available in the package as
data. See also ranktable.Rd
.
Consider pentathlon_table
:
> pentathlon_table shooting fencing swimming riding running Moiseev 5 1 1 6 5 Zadneprovskis 6 2 5 5 1 Capalini 4 6 2 3 4 Cerkovskis 3 3 7 7 2 Meliakh 1 7 4 1 6 Michalik 2 4 6 2 7 Walther 7 5 3 4 3
Although pentathlon_table
is a dataset in the package, the source
dataset is also included in the inst/
directory as file
pentathlon.txt
; use idiom like
read.table("inst/pentathlon.txt")
to load the order table.
Object pentathlon_table
is a representative example of an
ordertable. Each row is a competitor, each column an event (venue,
judge, ...). The first row shows Moiseev's ranking in shooting
(5th), fencing (1st), and so on. The first column shows the ranks of
the competitors in shooting. Thus Moiseev came fifth, Zadneprovskis
came 6th, and so on.
However, to create a likelihood function we need ranks, not orders. We
need to know, for a given event, who came first, who came second, and so
on (an extended discussion on the difference between rank and order is
given at rrank). We can convert from an order table to a rank
table using ordertable_to_ranktable()
(see also
ranktable.Rd
):
> ordertable_to_ranktable(pentathlon_table) c1 c2 c3 c4 c5 shooting Meliakh Michalik Cerkovskis Capalini Moiseev fencing Moiseev Zadneprovskis Cerkovskis Michalik Walther swimming Moiseev Capalini Walther Meliakh Zadneprovskis riding Meliakh Michalik Capalini Walther Zadneprovskis running Zadneprovskis Cerkovskis Walther Capalini Moiseev c6 c7 shooting Zadneprovskis Walther fencing Capalini Meliakh swimming Michalik Cerkovskis riding Moiseev Cerkovskis running Meliakh Michalik
Above, we see the same data in a different format (an extended discussion on the difference between rank and order is given in rrank).
Many of the order tables in the package include entries that correspond to some variation on “did not finish”. Consider the volvo dataset:
> volvo_table_2014 leg1 leg2 leg3 leg4 leg5 leg6 leg7 leg8 leg9 AbuDhabi 1 3 2 2 1 2 5 3 5 Brunel 3 1 5 5 4 3 1 5 2 Dongfeng 2 2 1 3 DNF 1 4 7 4 MAPFRE 7 4 4 1 2 4 2 4 3 Alvimedica 5 5 3 4 3 5 3 6 1 SCA 6 6 6 6 5 6 6 1 7 Vestas 4 DNF DNS DNS DNS DNS DNS 2 6
In the above order table, we have DNF
for “did not finish”
and DNS
for “did not start”. The formula1
order
table has other similar entries such as DSQ
for
“disqualified” and a discussion is given at
ordertable2supp.Rd
.
Links are given below to all the order tables in the package. Note that
the table in inst/eurovision.Rmd
(wiki_matrix
) is not an
order table because no country is allowed to vote for itself.
To coerce a table like the Volvo dataset shown above into an order table
[that is, replace DNS
with zeros, and also force nonzero entries
to be contiguous], use as.ordertable()
.
Author(s)
Robin K. S Hankin
See Also
ordertable2supp
,rrank
,
ranktable
,as.ordertable
Examples
ordertable_to_ranktable(soling_table)
ordertable2supp(soling_table) == soling # should be TRUE
Calculate points from an order table
Description
Given an order table and a schedule of points, calculate the points awarded to each competitor.
Usage
ordertable2points(o, points,totals=TRUE)
Arguments
o |
Order table |
points |
A numeric vector indicating number of points awarded for first, second, third, etc placing |
totals |
Boolean, with default |
Value
Returns either an order table or a named numeric vector
Author(s)
Robin K. S. Hankin
See Also
Examples
points <- c(25, 18, 15, 12, 10, 8, 6, 4, 2, 1, 0, 0)
o <- as.ordertable(F1_table_2017)
ordertable2points(o,points)
ordertable2points(ranktable_to_ordertable(rrank(9,volvo_maxp)),1)
Translate order tables to support functions
Description
Wikipedia gives a nice summary in table form of Formula 1 racing results
on pages like
https://en.wikipedia.org/wiki/2017_Formula_One_World_Championship
(at World Drivers' Championship standings) but the data format is
commonly used for many sports [see ordertable.Rd
] and function
ordertable2supp()
translates such tables into a hyper2
support function and also a order table.
Both functions interpret zero to mean “Did not finish” (wikipedia usually signifies DNF as a blank).
Usage
ordertable2supp(x, noscore, incomplete=TRUE)
ordervec2supp(d)
Arguments
x |
Data frame, see details |
d |
A named numeric vector giving order; zero entries are interpreted as that competitor coming last (due to, e.g., not finishing) |
incomplete |
Boolean, with |
noscore |
Character vector giving the abbreviations
for a non-finishing status such as “did not finish”
or “disqualified”. A missing argument is interpreted as
|
Details
Function ordertable2supp()
is intended for use on order tables
such as found at https://en.wikipedia.org/wiki/2019_Moto3_season.
This is a common format, used for Formula 1, motoGP, and other racing
sports. Prepared text versions are available in the package in the
inst/
directory, for example inst/motoGP_2019.txt
. Use
read.table()
to create a data frame which can be interpreted by
ordertable2supp()
.
Function ordervec2supp()
takes an order vector d
and
returns the corresponding Plackett-Luce loglikelihood function as a
hyper2
object. It requires a named vector; names of the elements
are interpreted as names of the players. Use argument pnames
to
supply the players' names (see the examples).
> x <- c(b=2,c=3,a=1,d=4,e=5) # a: 1st, b: 2nd, c: 3rd etc > ordervec2supp(x) log( a * (a + b + c + d + e)^-1 * (a + b + d + e)^-1 * b * (b + d + e)^-1 * c * (d + e)^-1 * e)
\frac{a}{a+b+c+d+e}\cdot
\frac{b}{b+c+d+e}\cdot
\frac{c}{c+d+e}\cdot
\frac{d}{d+e}\cdot
\frac{e}{e}
Zero entries mean “did not finish”:
> ordervec2supp(c(b=1,a=0,c=2)) # b: 1st, a: DNF, c: second log((a + b + c)^-1 * (a + c)^-1 * b * c)
\frac{b}{a+b+c}\cdot
\frac{c}{a+c}
Note carefully the difference between ordervec2supp()
and
rankvec_likelihood()
, which takes a character vector:
> names(sort(x)) [1] "a" "b" "c" "d" "e" > rankvec_likelihood(names(sort(x))) log( a * (a + b + c + d + e)^-1 * b * (b + c + d + e)^-1 * c * (c + d + e)^-1 * d * (d + e)^-1) > rankvec_likelihood(names(sort(x))) == ordervec2supp(x) [1] TRUE >
Function order_obs()
was used in the integer-indexed paradigm but
is obsolete in the name paradigm. A short vignette applying
ordervec2supp()
and ordertable2supp()
to the salad
dataset of the prefmod package [and further analysed in the
PlackettLuce package] is presented at inst/salad.Rmd
.
Value
Returns a hyper2
object
Author(s)
Robin K. S. Hankin
See Also
Examples
ordertable2supp(soling_table)
# competitors a-f, racing at two venues:
x <- data.frame(
venue1=c(1:5,"Ret"),venue2=c("Ret",4,"Ret",1,3,2),
row.names=letters[1:6])
## First consider all competitors; incomplete=FALSE checks that all
## finishing competitors have ranks 1-n in some order for some n:
ordertable2supp(x,incomplete=FALSE)
## Now consider just a-d; must use default incomplete=TRUE as at venue2
## the second and third ranked competitors are not present in x[1:4,]:
ordertable2supp(x[1:4,])
## Function ordervec2supp() is lower-level, used for order vectors:
a1 <- c(a=2,b=3,c=1,d=5,e=4) # a: 2nd, b: 3rd, c: 1st, d: 5th, e: 4th
a2 <- c(a=1,b=0,c=0,d=2,e=3) # a: 2nd, b: DNF, c: DNF, d: 2nd, e: 3rd
a3 <- c(a=1,b=3,c=2) # a: 1st, b: 3rd, c: 2nd. NB only a,b,c competed
a4 <- c(a=1,b=3,c=2,d=0,e=0) # a: 1st, b: 3rd, c: 2nd, d,e: DNF
## results of ordervec2supp() may be added with "+" [if the observations
## are independent]:
H1 <- ordervec2supp(a1) + ordervec2supp(a2) + ordervec2supp(a3)
H2 <- ordervec2supp(a1) + ordervec2supp(a2) + ordervec2supp(a4)
## Thus H1 and H2 are identical except for the third race. In H1, 'd'
## and 'e' did not compete, but in H2, 'd' and 'e' did not finish (and
## notionally came last):
pmax(H1)
pmax(H2) # d,e not finishing affects their estimated strength
Order transformation
Description
Given an order vector, shuffle so that the players appear in a specified order.
Usage
ordertrans(x,players)
ordertransplot(ox,oy,plotlims, ...)
Arguments
x |
A (generalized) order vector |
players |
A character vector specifying the order in which the
players will be listed; if missing, use |
ox , oy |
Rank vectors |
plotlims |
Length two numeric vector giving x and y plot limits. If missing, use sensible default |
... |
Further arguments, passed to |
Details
The best way to describe this function is with an example:
> x <- c(d=2,a=3,b=1,c=4) > x d a b c 2 3 1 4
In the above, we see x
is an order vector showing that d
came second, a
came third, b
came first, and c
came
fourth. This is difficult to deal with because one has to search
through the vector to find a particular competitor, or a particular
rank. This would be harder if the vector was longer.
If we wish to answer the question “where did competitor a
come? where did b
come?” we would want an order vector
in which the competitors are in alphabetical order. This is
accomplished by ordertrans()
:
> o <- ordertrans(x) > o a b c d 3 1 4 2
(this is equivalent to o <- x[order(names(x))]
). Object o
contains the same information as x
, but presented differently.
This says that a
came third, b
came first, c
came
fourth, and d
came second. In particular, the Plackett-Luce
order statistic is identical:
> ordervec2supp(x) == ordervec2supp(o) > [1] TRUE
There is a nice example of ordertrans()
in
inst/eurovision.Rmd
, and package vignette ordertrans
provides further discussion and examples.
Function ordertrans()
takes a second argument which allows the
user to arrange an order vector into the order specified.
Function ordertrans()
also works in the context of hyper3
objects:
x <- c(d=2,a=3,b=1,a=4) x d a b a 2 3 1 4 ordertrans(x) a a b d 3 4 1 2
Object x
shows that d
came second, a
came third and
fourth, and b
came first. We can see that ordertrans()
gives the same information in a more intelligible format. This
functionality is useful in the context of hyper3
likelihood
functions.
Value
Returns a named vector
Note
The argument to ordertrans()
is technically an order vector
because it answers the question “where did the first-named
competitor come?” (see the discussion at rrank). But it is
not a helpful order vector because you have to go searching through
the names—which can appear in any order—for the competitor you are
interested in. I guess “generalised order vector” might be a
better description of the argument.
Author(s)
Robin K. S. Hankin
See Also
Examples
x <- c(e=4L,a=7L,c=6L,b=1L,f=2L,g=3L,h=5L,i=8L,d=9L)
x
ordertrans(x,letters[1:9])
o <- skating_table[,1]
names(o) <- rownames(skating_table)
o
ordertrans(o)
ordertrans(sample(icons_maxp),icons)
rL <- volvo_maxp # rL is "ranks Likelihood"
rL[] <- rank(-volvo_maxp)
r1 <- volvo_table[,1] # ranks race 1
names(r1) <- rownames(volvo_table)
ordertransplot(rL,r1,xlab="likelihood rank, all races",ylab="rank, race 1")
Various functionality for races and hyper3 likelihood functions
Description
Various functions for calculating the likelihood function for order
statistics in the context of hyper3 likelihood functions. Compare
ggol()
for hyper2
objects. Used in the
constructor()
suite of analysis.
Usage
num3(v,helped=NULL,lambda=1)
den3(v,helped=NULL,lambda=1)
char2nv(x)
ordervec2supp3(v,nonfinishers=NULL)
ordervec2supp3a(v,nonfinishers=NULL,helped=NULL,lambda=1)
rankvec_likelihood3(v,nonfinishers=NULL)
ordertable2supp3(a)
cheering3(v,e,help,nonfinishers=NULL)
args2ordervec(...)
Arguments
v |
Ranks in the form of a character vector. Element |
nonfinishers |
Character vector (a set) showing players that did not finish. See details section and examples |
a |
An ordertable |
helped |
vector of entities being helped |
e , help , lambda |
Parameters controlling non-independence with
|
x |
A character vector of competitors |
... |
Arguments passed to |
Details
Function args2ordervec()
takes arguments with names
corresponding to players, and entries corresponding to performances
(e.g. distances thrown by a javelin, or times for completing a
race). It returns a character vector indicating the rank statistic.
See examples, and also the javelin vignette.
Function ordervec2supp3()
takes character vector showing the
order of finishing [i.e. a rank statistic], and returns a generalized
Plackett-Luce support function in the form of a hyper3
object.
It can take the output of args2ordervec()
or rrace3()
.
For example:
ordervec2supp3(c("a","b"),nonfinishers=c("a","b"))
corresponds to a race between two twins of strength a
and two
twins of strength b
, with only one of each pair finishing;
a
comes first and b
comes second; symbolically
a\succ b\succ\left\lbrace a,b\right\rbrace\longrightarrow
\mathcal{L}(a,b\left|a+b=1\right.)=\frac{a}{2a+2b}\cdot\frac{b}{a+2b}
Further,
ordervec2supp3(c("a","b"),c("a","b","c"))
corresponds to adding a singleton competitor of strength c
who
did not finish:
a\succ b\succ\left\lbrace a,b,c\right\rbrace\longrightarrow
\mathcal{L}(a,b,c\left|a+b+c=1\right.)=\frac{a}{2a+2b+c}\cdot\frac{b}{a+2b+c}
(observe that this likelihood function is informative about
c
). See the examples section below. Experimental function
ordervec2supp3a()
is a generalized version of
ordervec2supp3()
that allows for cheering effects.
Functions num3()
and den3()
are low-level helper
functions that calculate the numerator and denominator for
Plackett-Luce likelihood functions with clones; used in
ordervec2supp3()
and ordervec2supp3a()
.
Function ordertable2supp3()
takes an order table (the canonical
example is the constructors' formula 1 grand prix results, see
constructor.Rd
and returns a generalized Plackett-Luce support
function in the form of a hyper3
object.
Function char2nv()
takes a character vector and returns a named
vector with entries corresponding to their names' counts. It is used
in the extraction and replacement methods for hyper3
objects.
Function cheering3()
is a generalization of
ordervec2supp3()
. Competitors who are not mentioned in
argument e
are assumed to be in an equivalence class of size 1,
that is, they are not supported (or indeed suppressed) by anyone else:
they are singletons in the terminology of Hankin (2006). Extensive
discussions are presented at inst/plackett_luce_monster.Rmd
and
inst/eurovision.Rmd
.
File inst/javelin.Rmd
and inst/race3.Rmd
show some
use-cases for these functions.
Note
Function ordervec2supp3()
is mis-named [it takes a rank
vector, not an order vector]; it will be renamed
rankvec_likelihood3()
, eventually.
Author(s)
Robin K. S. Hankin
See Also
Examples
ordervec2supp3(c("a","a","b","c","a","b","c"))
ordervec2supp3(rrace3())
ordervec2supp3(c("a","b"),nonfinishers=c("a","b")) # a > b >> {a,b}
(o <- args2ordervec(a=c(1,6,9), b=c(2,3,4), c=c(1.1,11.1)))
H <- ordervec2supp3(o)
H
# equalp.test(H) # takes too long for here
## Race: six competitors a-f finishing in alphabetical order. Mutually
## supporting groups: (acd), (bf), (e). Competitor "e" is not
## suppported by anyone else (he is a singleton) so does not need to be
## mentioned in argument 'e' and there are only two helpfulnesses to be
## considered: that of (acd) and that of (bf), which we will take to be
## 1.88 and 1.1111 respectively:
cheering3(v=letters[1:6],e=c(a=1,c=1,b=2,d=1,e=2),help=c(1.88,1.1111))
## Another race: four competitors, including two clones of "a", and two
## singletons "b" and "c". Here "a" helps his clone at 1.88; and "b"
## and "c" help one another at 1.111:
cheering3(v=c("a","b","a","c"),e=c(a=1,b=2,c=2),help=c(1.8,1.111))
## Same race as above but this time there are two clones of "b", one of
## whom did not finish:
cheering3(v=c("a","b","a","c"),e=c(a=1,b=2,c=2),help=c(1.8,1.111),"b")
## Most common case would be that the clones help each other but noone
## else:
cheering3(v=c("a","b","a","c"),e=c(a=1,b=2,c=3),help=c(1.8,1.111,1),"b")
Pairwise comparisons
Description
Function pairwise()
takes a matrix of pairwise comparisons and
returns a hyper2
likelihood function. Function
zermelo()
gives a standard iterative procedure for likelihood
maximization of pairwise Bradley-Terry likelihoods (such as those
produced by function pairwise()
).
Function home_away()
takes two matrices, one for home wins and
one for away wins. It returns a hyper2
support function that
includes a home advantage ghost. Function home_away3()
is the
same, but returns a hyper3
object. A complex matrix is
interpreted as real parts being the home wins and imaginary parts away
wins.
Function white_draw3()
returns a hyper3
likelihood
function for pairwise comparisons, one of whom has a home team-type
advantage (white player in the case of chess). It is designed to
work with an array of dimensions n\times n\times 3
,
where n
is the number of players. It is used in
inst/kka.Rmd
to create chess3
likelihood function.
Usage
pairwise(M)
zermelo(M, maxit = 100, start, tol = 1e-10, give = FALSE)
home_away(home_games_won, away_games_won)
home_away3(home_games_won, away_games_won,lambda)
white_draw3(A,lambda,D)
Arguments
M |
Matrix of pairwise comparison results |
maxit |
Maximum number of iterations |
start |
Starting value for iteration; if missing, use
|
tol |
Numerical tolerance for stopping criterion |
give |
Boolean with default |
home_games_won , away_games_won |
Matrices showing home games won and away games won |
lambda |
The home ground advantage (or white advantage in chess) |
D |
Weight of draw |
A |
Array of dimension |
Details
In function zermelo()
, the diagonal is disregarded.
If home_games_won
is complex, then the real parts of the
entries are interpreted as home games won, and the imaginary parts as
away games won.
Note
An extended discussion of pairwise()
is given in
inst/zermelo.Rmd
and also inst/karate.Rmd
. Functions
home_away()
and home_away3()
are described and used in
inst/home_advantage.Rmd
; see Davidson and Beaver 1977.
Author(s)
Robin K. S. Hankin
References
D. R. Hunter 2004. “MM algorithms for generalized Bradley-Terry models”. The Annals of Statistics, volume 32, number 1, pages 384–406
S. Borozki and others 2016. “An application of incomplete pairwise comparison matrices for ranking top tennis players”.
arXiv:1611.00538v1
10.1016/j.ejor.2015.06.069
R. R. Davidson and R. J. Beaver 1977. “On extending the Bradley-Terry model to incorporate within-pair order effects”. Biometrics, 33:693–702
See Also
Examples
#Data is the top 5 players from Borozki's table 1
M <- matrix(c(
0,10,0, 2,5,
4, 0,0, 6,6,
0, 0,0,15,0,
0, 8,0, 0,7,
1 ,0,3, 0,0
),5,5,byrow=TRUE)
players <- c("Agassi","Becker","Borg","Connors","Courier")
dimnames(M) <- list(winner=players,loser=players)
M
# e.g. Agassi beats Becker 10 times and loses 4 times
pairwise(M)
zermelo(M)
# maxp(pairwise(M)) # should be identical (takes ~10s to run)
M2 <- matrix(c(NA,19+2i,17,11+2i,16+5i,NA,12+4i,12+6i,12+2i,19+10i,
NA,12+4i,11+2i,16+2i,11+7i,NA),4,4)
teams <- LETTERS[1:4]
dimnames(M2) <- list("@home" = teams,"@away"=teams)
home_away(M2)
# home_away3(M2,lambda=1.2) # works but takes too long (~3s)
home_away3(M2[1:3,1:3],lambda=1.2)
M <- kka_array[,,1] + 1i*kka_array[,,3] # ignore draws
home_away(M)
# home_away3(M,lambda=1.3) # works but takes too long (~3s)
white_draw3(kka_array,1.88,1.11)
Pentathlon
Description
Results from the Men's pentathlon at the 2004 Summer Olympics
Usage
data(pentathlon)
Format
A hyper2
object that gives a likelihood function
Details
Object pentathlon
is a hyper2
object that gives a
likelihood function for the strengths of the top seven competitors at
the Modern Men's Pentathlon, 2004 Summer Olympics.
Object pentathlon_table
is an order table: a data frame with rows
being competitors, columns being disciplines, and entries being places.
Thus looking at the first row, first column we see that Moiseev placed
fifth at shooting.
These objects can be generated by running script
inst/pentathlon.Rmd
, which includes some further discussion and
technical documentation and creates file pentathlon.rda
which
resides in the data/
directory.
Note
Many of the competitors' names have diacritics, which I have removed.
References
“Wikipedia contributors”, Modern pentathlon at the 2004 Summer Olympics - Men's. Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=Modern_pentathlon_at_the_2004_Summer_Olympics_%E2%80%93_Men%27s&oldid=833081611, [Online; accessed 5-March-2020]
See Also
Examples
data(pentathlon)
pie(pentathlon_maxp)
Powerboat dataset
Description
Race results from the 2018 F1 Powerboat World Championship
Usage
data(powerboat)
Details
Object powerboat_table
is a dataframe of results showing ranks
of 21 drivers in the 2018 F1 Powerboat World Championship. The format
is standard, that is, can be interpreted by function
ordertable2supp()
and indeed
ordertable2supp(powerboat_table[,1:7])
gives the corresponding
support function, powerboat
.
File inst/powerboat.txt
is the source text file; to create
powerboat_table
use
read.table(system.file("powerboat.txt",package="hyper2"))
The dataset used here corrects an apparent typo in the wikipedia table (see github issue 37).
These objects can be generated by running script
inst/powerboat.Rmd
, which includes some further discussion and
technical documentation and creates file powerboat.rda
which
resides in the data/
directory.
Note
Many drivers have names with diacritics, which have been removed from the dataframe.
References
Wikipedia contributors. (2019, October 9). 2018 F1 Powerboat World Championship. In Wikipedia, The Free Encyclopedia. Retrieved 00:45, February 21, 2020, from https://en.wikipedia.org/w/index.php?title=2018_F1_Powerboat_World_Championship&oldid=920386507
See Also
Examples
pie(powerboat_maxp)
Profile likelihood and support
Description
Given a support function, return a profile likelihood curve
Usage
profsupp(H, i, p, relative=TRUE, ...)
profile_support_single(H, i, p, evaluate=FALSE, ...)
Arguments
H |
hyper2 object |
i |
Name of player for which profile support is to be calculated |
p |
Strength of element |
evaluate |
Boolean, with default |
relative |
Boolean; if |
... |
Arguments passed to |
Value
Returns the support at a particular value of p_i
, or the
evaluate conditional on p_i
.
Author(s)
Robin K. S. Hankin
See Also
Examples
## Not run: # takes too long
p <- seq(from=0.5,to=0.4,len=10)
u <- profsupp(icons,"NB",p)
plot(p,u-max(u))
abline(h=c(0,-2))
## End(Not run)
Substitute players of a hyper2 object
Description
Given a hyper2 object, substitute some players
Usage
psubs(H, from, to)
psubs_single(H, from, to)
Arguments
H |
hyper2 object |
from , to |
Character vector of players to substitute and their substitutes |
Details
Function psubs()
substitutes one or more player names, replacing
player from[i]
with to[i]
. If argument to
is
missing, all players are substituted, the second argument taken to be
the replacement: interpret psubs(H,vec)
as
psubs(H,from=pnames(H),to=vec)
.
Compare pnames<-()
, which can only add players, or reorder
existing players.
Function psubs_single()
is a low-level helper function that takes
a single player and its substitute; it is not intended for direct use.
Value
Returns a hyper2 object
Author(s)
Robin K. S. Hankin
Examples
psubs(icons,c("L","NB"),c("London","Norfolk Broads"))
rhyper2() |> psubs(letters,LETTERS) # ignore i,j,k,...,z
psubs(icons,tolower(pnames(icons)))
Player with advantage
Description
Commonly, when considering competitive situations we suspect that one
player has an advantage of some type which we would like to quantify in
terms of an additional strength. Examples might include racing at pole
position, playing white in chess, or playing soccer at one's home
ground. Function pwa()
(“player with advantage”) returns
a modified hyper2
object with the additional strength represented
as a reified entity.
Usage
pwa(H, pwa, chameleon = "S")
Arguments
H |
A hyper2 object |
pwa |
A list of the players with the supposed advantage; may be character in the case of a named hyper2 object, or an integer vector |
chameleon |
String representing the advantage |
Details
Given an object of class hyper2
and a competitor a
, we
replace every occurrence of a
with a+S
, with S
representing the extra strength conferred.
However, the function also takes a vector of competitors. If there is more than one competitor, the resulting likelihood function does not seem to instantiate any simple situation.
Nice examples of pwa()
are given in ‘inst/cook.Rmd’ and
‘inst/universities.Rmd’.
Value
Returns an object of class hyper2
.
Note
Earlier versions of this package gave a contrived sequence of
observations, presented as an example of pwa()
with multiple
advantaged competitors. I removed it because the logic was flawed, but
it featured a chameleon who could impersonate (and indeed eat) certain
competitors, which is why the third argument is so named.
The aliases commemorate some uses of the function in the vignettes and markdown files in the ‘inst/’ directory.
Author(s)
Robin K. S. Hankin
See Also
Examples
summary(formula1 |> pwa("Hamilton","pole"))
H <- ordervec2supp(c(a = 2, b = 3, c = 1, d = 5, e = 4))
pwa(H,'a')
## Four races between a,b,c,d:
H1 <- ordervec2supp(c(a = 1, b = 3, c = 4, d = 2))
H2 <- ordervec2supp(c(a = 0, b = 1, c = 3, d = 2))
H3 <- ordervec2supp(c(a = 4, b = 2, c = 1, d = 3))
H4 <- ordervec2supp(c(a = 3, b = 4, c = 1, d = 2))
## Now it is revealed that a,b,c had some advantage in races 1,2,3
## respectively. Is there evidence that this advantage exists?
## Not run: # takes ~10 seconds, too long for here
specificp.test(pwa(H1,'a') + pwa(H2,'b') + pwa(H3,'c') + H4,"S")
## End(Not run)
Convert rank tables to and from order tables
Description
Convert rank tables (as generated by rrank()
, for example) to
order tables like the formula 1 tables; and convert back. Print and
summary methods for rank tables are documented here. See also
ordertable.Rd
.
Usage
ranktable_to_ordertable(xrank)
ordertable_to_ranktable(xorder)
wikitable_to_ranktable(wikitable, strict=FALSE)
## S3 method for class 'ranktable'
summary(object, ...)
ranktable_to_printable_object(x)
## S3 method for class 'ranktablesummary'
print(x,...)
Arguments
x , xrank , object |
A rank table, an object with class
|
xorder , wikitable |
Order tables. Argument |
strict |
Controls for |
... |
Further arguments (currently ignored) |
Details
Function ranktable_to_ordertable()
is trivial;
ordertable_to_ranktable()
less so. The prototype for order
tables would be skating_table
.
Function ordertable_to_ranktable(x)
checks for each column being
a permutation of seq_len(nrow(x))
and, if not, it stops. In
particular, DNF entries are out of scope. To convert order
tables such as F1_table_2017
, which include DNF
entries, use wikitable_to_ranktable()
or ordertable2supp()
to produce a likelihood function.
Function ranktable_to_printable_object()
is a helper function
that coerces a ranktable
object to a matrix that prints nicely.
The print method is discussed in
inst/ordertable_to_ranktable.Rmd
.
Value
An order table or rank table
Author(s)
Robin K. S. Hankin
See Also
Examples
p <- (5:1)/15
names(p) <- letters[1:5]
xrank <- rrank(12,p,rnames=month.abb)
xorder <- ranktable_to_ordertable(xrank)
## Can convert back and forth:
identical(xrank,ordertable_to_ranktable(ranktable_to_ordertable(xrank)))
# maxp(ordertable2supp(xorder)) # should be close to p
ordertable_to_ranktable(skating_table)
# convert a rank table to a support function:
rank_likelihood(wikitable_to_ranktable(volvo_table))
Random hyper2
objects
Description
Random hyper2
loglikelihood functions, intended as quick
“get you going” examples
Usage
rhyper2(n = 8, s = 5, pairs = TRUE, teams = TRUE, race = TRUE, pnames)
Arguments
n |
Number of competitors, treated as even |
s |
Integer, Measure of the complexity of the log likelihood function |
pairs , teams , race |
Boolean, indicating whether or not to include different observations |
pnames |
Character vector of names, if missing interpret as
|
Note
Function rhyper2()
returns a likelihood function based on
random observations. To return a random probability vector drawn from
a from a given (normalized) likelihood function, use rp()
.
Author(s)
Robin K. S. Hankin
See Also
Examples
rhyper2()
rp(2,icons)
Random hyper3 objects
Description
Various random hyper3 objects, in the context of the race metaphor.
They return “get you going” examples of hyper3
objects.
The defaults correspond to simple but non-trivial with straightforward
interpretations.
The defaults are
pn: c(a=2, b=4, c=2, d=1 ) # numbers (two "a"s, four "b"s etc) ps: c(a=0.3, b=0.1, c=0.2, d=0.4) # strengths
Usage
rwinner3(pn=c(a=2,b=4,c=2,d=1),ps=c(a=0.3, b=0.1,c=0.2,d=0.4))
rpair3(n=5,s=3,lambda=1.3)
rrace3(pn=c(a=2,b=4,c=2,d=1),ps=c(a=0.3, b=0.1,c=0.2,d=0.4))
rracehyper3(n=4,size=9,ps=NULL,races=3)
rhyper3(n=5,s=4,type='race',...)
Arguments
pn |
A named integer vector showing numbers of each type of player |
ps |
A named vector showing strengths of each type of player |
n , size , races , s , type |
Arguments specifying the complexity
of the random |
lambda |
Parameter |
... |
Further arguments passed to |
Details
These functions return hyper3
objects, as indicated by the
3
in their names.
Function
rwinner3()
is a low-level helper function that takes a player number argumentpn
, and a player strength argumentps
. It performs an in silico race, and returns the (name of) the winner, chosen randomly from a field of runners with appropriate strengths. It is used repeatedly byrrace3()
to select a winner from the diminishing pool of still-running players.Function
rpair3()
returns ahyper3
object corresponding to repeated pairwise comparisons including a white-player advantage represented bylambda
.Function
rrace3()
returns a rank statistic corresponding to finishing order for a Plackett-Luce race. The output can be passed toordervec2supp3()
.Function
rracehyper3()
returns a more complicatedhyper3
object corresponding to repeated races.Function
rhyper3()
returns an even more complicatedhyper3
object corresponding to repeated races and pairwise comparisons.
Argument n
generally specifies the number of distinct types of
players. Files inst/mann_whitney_wilcoxon.Rmd
and
inst/javelin.Rmd
show some use-cases for these functions.
Note
In function rracehyper3()
[and by extension rhyper3()
],
if argument n
exceeds 26 and argument pn
takes its
default value of NULL
, then an error will be returned because
there are only 26 players, one for each letter a
-z
.
Author(s)
Robin K. S. Hankin
See Also
rrank
,ordertable2supp
,ordertrans
Examples
rracehyper3()
rrace3()
rwinner3()
rhyper3()
rpair3()
ordervec2supp3(rrace3())
table(replicate(100,which(rrace3(pn=c(a=1,b=10),ps=c(a=0.9,b=0.1))=='a')))
Rowing dataset, sculling
Description
Data from Men's single sculls, 2016 Summer Olympics
Usage
data(rowing)
Format
Object rowing
is a hyper2
object that gives a likelihood
function for the 2016 men's sculls.
Details
Object rowing
is created by the code in inst/rowing.Rmd
.
This reads file inst/rowing.txt
, each line of which is a heat
showing the finishing order. Object rowing_table
is the
corresponding R list.
File inst/rowing_minimal.txt
has the same data but with dominated
players (that is, any group of players none of whom have beaten any
player not in the group) have been removed. This is because dominated
players have a ML strength of zero.
References
Wikipedia contributors, “Rowing at the 2016 Summer Olympics—Men's single sculls”, Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=Rowing_at_the_2016_Summer_Olympics_%E2%80%93_Men%27s_single_sculls&oldid=753517240 (accessed December 7, 2016).
See Also
Examples
dotchart(rowing_maxp)
Random samples from the prior of a hyper2
object
Description
Uses Metropolis-Hastings to return random samples from the prior of a
hyper2
object
Usage
rp(n, H, startp = NULL, fcm = NULL, fcv = NULL, SMALL = 1e-06, l=loglik, fillup=TRUE, ...)
Arguments
H |
Object of class |
n |
Number of samples |
startp |
Starting value for the Markov chain, with default
|
fcm , fcv |
Constraints as for |
SMALL |
Notional small value for numerical stability |
l |
Log-likelihood function with default |
fillup |
Boolean, with default |
... |
Further arguments, currently ignored |
Details
Uses the implementation of Metropolis-Hastings from the MCE
package to sample from the posterior PDF of a hyper2
object.
If the distribution is Dirichlet, use rdirichlet()
to generate
random observations: it is much faster, and produces serially
independent samples. To return uniform samples, use
rp_unif()
(documented at dirichlet.Rd
).
Value
Returns a matrix, each row being a unit-sum observation.
Note
Function rp()
a random sample from a given normalized
likelihood function. To return a random likelihood function, use
rhyper2()
.
File inst/ternaryplot_hyper2.Rmd
shows how to use
Ternary::ternaryPlot()
with rp()
.
Author(s)
Robin K. S. Hankin
See Also
Examples
rp(10,icons)
plot(loglik(rp(30,icons),icons),type='b')
A random race with given BT strengths
Description
Returns a rank vector suitable for interpretation with race()
.
Usage
rrace(strengths)
Arguments
strengths |
Named vector with names being players and values being their Bradley-Terry strengths |
Details
Uses a simple recursive system to generate the ranks.
Value
Returns a character vector with entries corresponding to the competitor. The first element is the winner, the second the runner-up, and so on, until the final element is the last to cross the finishing line.
Author(s)
Robin K. S. Hankin
See Also
Examples
o <- c(a=0.4, b=0.3, c=0.2, d=0.1)
rrace(o)
rankvec_likelihood(rrace(o))
D <- t(replicate(10,rrace(o))) # 10 races
H <- hyper2()
for(i in seq_len(nrow(D))){H <- H+rankvec_likelihood(D[i,])}
maxp(H) # should be close to o
Random ranks
Description
A function for producing ranks randomly, consistent with a specified strength vector
Usage
rrank(n = 1, p, pnames=NULL, fill = FALSE, rnames=NULL)
## S3 method for class 'ranktable'
print(x, ...)
rrank_single(p)
rorder_single(p)
Arguments
n |
Number of observations |
p |
Strength vector |
pnames |
Character vector (“player names”) specifying names of the columns |
rnames |
Character vector (“row names” or “race names”) specifying names of the rows |
fill |
Boolean, with default |
x , ... |
Arguments passed to the print method |
Value
If n=1
, rrank()
returns a vector; if n>1
it returns
a matrix with n
rows, each corresponding to a ranking. The
canonical example is a race in which the probability of competitor
i
coming first is p_i/\sum p_j
, where the
summation is over the competitors who have not already finished.
If, say, the first row of rrank()
is c(2,5,1,3,4)
, then
competitor 2 came first, competitor 5 came second, competitor 1 came
third, and so on.
Note that function rrank()
returns an object of class
ranktable
, which has its own special print method. The column
names appear as “c1, c2, ...
” which is intended to be read
“came first”, “came second”, and so on. The difference
between rank and order can be confusing.
> x <- c(a=3.01, b=1.04, c=1.99, d=4.1) > x a b c d 3.01 1.04 1.99 4.10 > rank(x) a b c d 3 1 2 4 > order(x) [1] 2 3 1 4
In the above, rank()
shows us that element a
of x
(viz 3.01) is the third largest, element b
(viz 1.04) is the
smallest, and so on; order(x)
shows us that the smallest element
x
is x[2]
, the next smallest is x[3]
, and so on.
Thus x[order(x)] == sort(x)
, and rank(x)[order(x)] ==
seq_along(x)
. In the current context we want ranks not orders; we want
to know who came first, who came second, and so on:
R> rrank(2,(4:1)/10) c1 c2 c3 c4 [1,] 2 3 1 4 [2,] 1 3 2 4 R>
In the above, each row is a race; we have four runners and two races. In the first race (the top row), runner number 2 came first, runner 3 came second, runner 1 came third, and so on. In the second race (bottom row), runner 1 came first, etc. Taking the first race as an example:
Rank: who came first? runner 2. Who came second? runner 3.
Who came third? runner 1. Who came fourth? runner 4. Recall that the
Placket-Luce likelihood for a race in which the rank statistic was
2314
(the first race) would be
\frac{p_2}{p_2+p_3+p_1+p_4}\cdot
\frac{p_3}{p_3+p_1+p_4}\cdot
\frac{p_1}{p_1+p_4}\cdot
\frac{p_4}{p_4}
.
Order: where did runner 1 come? third. Where did runner 2
come? first. Where did runner 3 come? second. Where did runner 4
come? fourth. Thus the order statistic would be 3124
.
Function rrank()
is designed for rank_likelihood()
, which
needs rank data, not order data. Vignette
“skating_analysis
” gives another discussion.
Note that function rrank()
returns an object of class
“rrank
”, which has its own print method. This can be
confusing. Further details are given at ranktable.Rd
.
Function rrank_single()
is a low-level helper function:
> p <- c(0.02,0.02,0.9,0.02,0.02,0.02) # competitor 3 the strongest > rank_single(p) [1] 3 2 4 6 4 1
Above, we see from p
that competitor 3 is the strongest, coming
first with 90% probability. And indeed the resulting rank statistic
given by rorder_single()
shows competitor 3 coming first, 2
coming second, and so on. Compare rrank_single()
:
> rorder_single(p) [1] 6 3 1 4 5 2 >
Above we see see from rrank_single(p)
that competitor 1 came
sixth, competitor 2 came third, and competitor 3 came first (as you
might expect, as competitor 3 is the strongest). Note that the R idiom
for rorder_single()
is the same as that used in the
permutations package for inverting a permutation: o[o] <-
seq_along(o)
.
Note
Similar functionality is given by rrace()
, documented at
rhyper3.
Author(s)
Robin K. S. Hankin
See Also
ordertrans
,rank_likelihood
,skating
,rhyper3
Examples
rrank_single(zipf(9))
ptrue <- (4:1)/10
names(ptrue) <- letters[1:4]
rrank(10,p=ptrue)
H <- rank_likelihood(rrank(40,p=ptrue))
## Following code commented out because they take too long:
# mH <- maxp(H) # should be close to ptrue
# H <- H + rank_likelihood(rrank(30,mH)) # run some more races
# maxp(H) # revised estimate with additional data
Figure skating at the 2002 Winter Olympics
Description
A likelihood function for the competitors at the Ladies' Free Skate at the 2002 Winter Olympics
Usage
skating
Details
Three objects
skating
, a log-likelihood function for the competitors'
strengths, skating_table
, an order table for each of the 9
judges, and skating_maxp
, the result of maxp(skating)
,
which is included to save time in the examples.
These objects can be generated by running script
inst/skating.Rmd
, which includes some further discussion and
technical documentation. The dataset is interesting because it has
been analysed by many workers, including Lock and Lock, for
consistency between the judges.
Note that file is structured so that each competitor is a row, and
each judge is a column. Function rank_likelihood()
requires a
transpose of this to operate.
Object skating_table
is an order table, taken from Lock and
Lock. It corrects what appears to be an error in which judge 5 ranked
both Butyrskaya and Kettunen 12; there is no 13. Using EM, I reckon
that Butyrskaya should be ranked twelfth and Kettunen thirteenth.
Note
There is an (Rbuildignore
-d) discussion of a
skeleton
dataset in the inst/
directory of the repo,
it's easy to confuse this with skating
.
Author(s)
Robin K. S. Hankin
References
-
https://en.wikipedia.org/wiki/Figure_skating_at_the_2002_Winter_Olympics#Full_results_2
Robin Lock and Kari Frazer Lock, Winter 2003. “Judging Figure Skating Judges”. STATS 36, ASA
Examples
data(skating)
dotchart(skating_maxp)
ordertable_to_ranktable(skating_table)
rL <- sort(skating_maxp,decreasing=TRUE)
rL[] <- seq_along(rL)
rO <- seq_len(nrow(skating_table))
names(rO) <- rownames(skating_table)
ordertransplot(rO,rL,
xlab="official rank",ylab="likelihood rank",
main="Ladies free skating, 2002 Winter Olympics")
Sailing at the 2000 Summer Olympics - soling
Description
Race results from the 2000 Summer Olympics: soling
Usage
data(soling)
Format
A hyper2
object that gives a likelihood function
Details
The Soling three person keelboat event at the 2000 Summer Olympic games
furnishes a rich dataset. An order table and likelihood function is
given in the package as soling_table
and soling
respectively. Data from the round robins and the quarter final is given
in matrices soling_rr1
, soling_rr2
, soling_qf
respectively.
These objects can be generated by running script inst/soling.Rmd
,
which includes some further discussion and technical documentation, and
creates file soling.rda
which resides in the data/
directory.
References
Wikipedia contributors, “Sailing at the 2000 Summer Olympics - Soling,” Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=Sailing_at_the_2000_Summer_Olympics_%E2%80%93_Soling&oldid=945362535 (accessed March 23, 2020).
See Also
Examples
data(soling)
ordertable_to_ranktable(soling_table)
pie(soling_maxp)
Summary method for hyper2 objects
Description
Give a summary of a hyper2 object, and a print method
Usage
## S3 method for class 'hyper2'
summary(object, ...)
## S3 method for class 'summary.hyper2'
print(x, ...)
Arguments
object , x |
Object of class hyper2 |
... |
Further arguments, currently ignored |
Details
Mostly self-explanatory, based on the equivalent in the untb
package.
Author(s)
Robin K. S. Hankin
See Also
Examples
summary(icons)
Methods for suplist objects
Description
Basic functionality for lists of hyper2
objects, allowing the
user to concatenate independent observations which are themselves
composite objects such as returned by ggrl()
.
Usage
## S3 method for class 'suplist'
Ops(e1, e2)
## S3 method for class 'suplist'
sum(x,...,na.rm=FALSE)
suplist_add(e1,e2)
suplist_times_scalar(e1,e2)
as.suplist(L)
Arguments
e1 , e2 |
Objects of class |
x , ... , na.rm |
In the |
L |
A list of |
Details
A suplist
object is a list of hyper2
objects. Each
element is a hyper2
object that is consistent with an
incomplete rank observation R
; the list elements are exclusive
and exhaustive for R
. If S
is a suplist
object,
and S=list(H1,H2,...,Hn)
where the Hi
are hyper2
objects, then
\mbox{Prob}(p|H_1)+\cdots+\mbox{Prob}(p|H_n)
.
This is because the elements of a suplist
object are disjoint
alternatives.
It is incorrect to say that a likelihood function
\mathcal{L}_S(p)
for p
is the sum of separate
likelihood functions. This is incorrect because the arbitrary
multiplicative constant messes up the math, for example we might have
\mathcal{L}_{H_1}(p)=C_1\mathrm{Prob}(p|H_1)
and
\mathcal{L}_{H_2}(p)=C_2\mathrm{Prob}(p|H_2)
and indeed
\mathcal{L}_{{H_1}\cup H_2}(p)=C_{12}\left(\mathrm{Prob}(p|H_1)+\mathrm{Prob}(p|H_2)\right)
but
\mathcal{L}_{H_1}(p)+\mathcal{L}_{H_2}(p) \neq
C_1\mathrm{Prob}(p|H_1)+C_2\mathrm{Prob}(p|H_2)
(the right hand side is meaningless).
Functions suplist_add()
and sum.suplist()
implement
“S1+S2
” as the support function for independent
observations S1
and S2
. The idea is that the support
functions “add” in the following sense. If
S1=list(H1,...,Hr)
and S2=list(I1,...,Is)
where
Hx,Ix
are hyper2
objects, then the likelihood function
for “S1+S2
” is the likelihood function for S1
followed by (independent) S2
. Formally
\mbox{Prob}(p|S_1+S_2) =
\left(
\mbox{Prob}(p|H_1)
+\cdots+
\mbox{Prob}(p|H_r)
\right)\cdot\left(
\mbox{Prob}(p|I_1)
+\cdots+
\mbox{Prob}(p|I_s)
\right)
\log\mbox{Prob}(p|S_1+S_2) =
\log\left(
\mbox{Prob}(p|H_1)
+\cdots+
\mbox{Prob}(p|H_r)
\right)+\log\left(
\mbox{Prob}(p|I_1)
+\cdots+
\mbox{Prob}(p|I_s)
\right)
However, S1+S2
is typically a large and unwieldy object, and
can be very slow to evaluate. These functions are here because they
provide slick package idiom.
The experimental lsl
mechanism furnishes an alternative
methodology which is more computationally efficient at the expense of
a non-explicit likelihood function. It is not clear at present (2022)
which of the two systems is better.
Value
Returns a suplist
object.
Author(s)
Robin K. S. Hankin
See Also
Examples
W <- hyper2(pnames=letters[1:5])
W1 <- ggrl(W, 'a', letters[2:3],'d') # 2-element list
W2 <- ggrl(W, 'e', letters[1:3],'d') # 6-element list
W3 <- ggrl(W, 'c', letters[4:5],'a') # 2-element list
# likelihood function for independent observations W1,W2,W3:
W1+W2+W3 # A 2*6*2=24-element list
like_single_list(equalp(W),W1+W2+W3)
## Not run: dotchart(maxplist(W1+W1+W3),pch=16) # takes a long time
a <- lsl(list(W1,W2,W3),4:6) # observe W1 four times, W2 five times and W3 six times
loglik_lsl(equalp(W),a,log=TRUE)
Surfing dataset
Description
Data from the 2019 World Surf League (WSL) tour
Usage
data(surfing)
Details
The package contains four datasets from WSL 2019:
-
surfing
, a log likelihood function for the strengths of the competitors -
surfing_maxp
, corresponding precalculated evaluate -
surfing_venuetypes
, a dataframe showing the beach types at the different venues of the tour
These objects can be generated by running script
inst/surfing.Rmd
, which includes some further discussion and
technical documentation and creates file surfing.rda
which
resides in the data/
directory.
Author(s)
Robin K. S. Hankin
Examples
dotchart(surfing_maxp)
Match outcomes from repeated table tennis matches
Description
Match outcomes from repeated singles table tennis matches
Usage
data(table_tennis)
Format
A likelihood function corresponding to the match outcomes listed below.
Details
There are four players, A, B, and C, who play singles table tennis matches with the following results:
A vs B, A serves, 5-1
A vs B, B serves, 1-3
A vs C, A serves, 4-1
A vs C, C serves, 1-2
As discussed in vignette table_tennis_serve
, we wish to assess
the importance of the serve. The vignette presents a number of analyses
including a profile likelihood plot.
See vignette table_tennis_serve
for an account of how to create
table_tennis
.
Examples
data(table_tennis)
dotchart(maxp(table_tennis))
Match outcomes from repeated doubles tennis matches
Description
Match outcomes from repeated doubles tennis matches
Usage
data(tennis)
Format
A hyper2 object corresponding to the match outcomes listed below.
Details
There are four players, p_1
to p_4
. These players
play doubles tennis matches with the following results:
match | score |
\lbrace p_1,p_2\rbrace vs \lbrace p_3,p_4\rbrace | 9-2 |
\lbrace p_1,p_3\rbrace vs \lbrace p_2,p_4\rbrace | 4-4 |
\lbrace p_1,p_4\rbrace vs \lbrace p_2,p_3\rbrace | 6-7 |
\lbrace p_1\rbrace vs \lbrace p_3\rbrace | 10-14 |
\lbrace p_2\rbrace vs \lbrace p_3\rbrace | 12-14 |
\lbrace p_1\rbrace vs \lbrace p_4\rbrace | 10-14 |
\lbrace p_2\rbrace vs \lbrace p_4\rbrace | 11-10 |
\lbrace p_3\rbrace vs \lbrace p_4\rbrace | 13-13 |
It is suspected that p_1
and p_2
have some form of
team cohesion and play better when paired than when either solo or with
other players. As the scores show, each player and, apart from p1-p2,
each doubles partnership, is of approximately the same strength.
Dataset tennis
gives the appropriate likelihood function for the
players' strengths; and dataset tennis_ghost
gives the
appropriate likelihood function if the extra strength due to team
cohesion of \lbrace p_1,p_2\rbrace
is represented by a
ghost player.
These objects can be generated by running script
inst/tennis.Rmd
, which includes some further discussion and
technical documentation and creates file tennis.rda
which
resides in the data/
directory.
Source
Doubles tennis matches at NOCS, Jan-May 2008
References
Robin K. S. Hankin (2010). “A Generalization of the Dirichlet Distribution”, Journal of Statistical Software, 33(11), 1-18
Examples
summary(tennis)
tennis |> psubs(c("Federer","Laver","Graf","Navratilova"))
## Following line commented out because it takes too long:
# specificp.gt.test(tennis_ghost,"G",0)
Hypothesis testing
Description
Tests different nulls against a free alternative
Usage
equalp.test(H, startp=NULL, ...)
knownp.test(H, p, ...)
samep.test(H, i, give=FALSE, startp=NULL, ...)
specificp.test(H, i, specificp=1/size(H),
alternative = c("two.sided","less","greater"), ...)
specificp.ne.test(H, i, specificp=1/size(H), ...)
specificp.gt.test(H, i, specificp=1/size(H), delta=1e-5, ...)
specificp.lt.test(H, i, specificp=1/size(H), ...)
## S3 method for class 'hyper2test'
print(x, ...)
Arguments
H |
A likelihood function, an object of class |
p |
In |
... |
Further arguments passed by |
startp |
Starting value for optimization |
i |
A character vector of names |
specificp |
Strength, real number between 0 and 1 |
alternative |
a character string specifying the alternative
hypothesis, must be one of |
give |
Boolean, with |
x |
Object of class |
delta |
Small value for numerical stability |
Details
Given a hyper2
likelihood function, there are a number of
natural questions to ask about the strengths of the players; see
Hankin 2010 (JSS) for examples. An extended discussion is presented
in vignette “hyper2
” and the functions documented here
cover most of the tests used in the vignette.
The tests return an object with class hyper2test
, which has its
own print method.
Function
equalp.test(H,p)
tests the null that all strengths are equal to vectorp
. Ifp
is missing, it testsH_0\colon p_1=p_2=\cdots=p_n=\frac{1}{n}
, for exampleequalp.test(icons)
Function
knownp.test()
tests the null that the strengths are equal to the elements of named vectorp
; it is a generalization ofequalp.test()
. Example:knownp.test(icons,zipf(6))
.Function
specificp.test(H,i,p)
testsH_0\colon p_i=p
, for examplespecificp.test(icons,"NB",0.1)
Function
samep.test()
testsH_0\colon p_{i_1}=p_{i_2}=\cdots=p_{i_k}
, for examplesamep.test(icons,c("NB","L"))
tests thatNB
has the same strength asL
.Functions
specificp.ne.test(H,i,p)
,specificp.gt.test(H,i,p)
, andspecificp.lt.test(H,i,p)
are low-level helper functions that implement one- or two-sided versions ofspecificp.test()
via thealternative
argument, followingt.test()
Value
The test functions return a list with class "hyper2test"
containing the following components:
statistic |
the difference in support between the null and alternative |
p.value |
the (asymptotic) p-value for the test, based on Wilks's theorem |
estimate |
the maximum likelihood estimate for |
method |
a character string indicating what type of test was performed |
data.name |
a character string giving the name(s) of the data. |
Note
Function specificp.gt.test()
includes quite a bit of messing
about to ensure that frequently-used idiom like
specificp.gt.test(icons,"NB",0)
works as expected, testing a null
of p_NB=0
(observe that specificp.ne.test(icons,"NB",0)
and specificp.gt.test(icons,"NB",0)
will (correctly) throw an
error). In the case of testing a strength's being zero, the support
function is often quite badly-behaved near the constraint [think tossing
a coin with probability p
twice, observing one head and one tail,
and testing p=0
; at the constraint, the likelihood is zero, the
support negative infinity, and the gradient of the support is infinite].
Numerically, the code tests p_NB=delta
. Note that similar
machinations are not required in specificp.lt.test()
because a
null of p_NB=1
is unrealistic.
Function samep.test()
does not have access to gradient
information so it is slow, inaccurate, and may fail completely for
high-dimensional datasets. If any(i==n)
, this constrains the
fillup value; this makes no difference mathematically but the function
idiom is involved.
In functions specificp.??.test(H,i,...)
, if i
is not
present in H
, an error is returned although technically the
result should be “not enough evidence to reject”, as H
is
uninformative about i
.
See Also
Examples
equalp.test(chess)
# samep.test(icons,c("NB","L"))
# knownp.test(icons,zipf(icons))
Tidy up a hyper2 object
Description
Tidy up a hyper2 object by removing players about which we have no information
Usage
tidy(H)
Arguments
H |
A |
Details
Function tidy(H)
returns a hyper2 object mathematically
identical to H
but with unused players (that is, players that
do not appear in any bracket) removed. Players about which H
is uninformative are removed from the pnames
attribute.
Note that idiom pnames(H) <- foo
can also be used to manipulate
the pnames
attribute.
Author(s)
Robin K. S. Hankin
Examples
H <- hyper2(pnames=letters)
H["a"] <- 1
H["b"] <- 2
H[c("a","b")] <- -3
pnames(H)
pnames(tidy(H))
H == tidy(H) # should be TRUE
New Zealand University ranking data
Description
Times Higher Education World University Rankings
Usage
data(universities)
Format
A hyper2
object that gives a likelihood function for
ranking of NZ universities
Details
The data is taken directly from the THE website, specifying “New Zealand”:
Object universities
is a hyper2
support function and
universities_table
a data frame.
These objects can be generated by running script
inst/universities.Rmd
, which includes some further discussion and
technical documentation, and creates file universities.rda
which
resides in the data/
directory.
See Also
Examples
summary(universities)
psubs(universities,c("AUT","UoA"),c("University of Auckland","Auckland University of Technology"))
pie(universities_maxp)
Results from the NOCS volleyball league
Description
Results from the NOCS volleyball league. Object
volleyball_table
is a matrix in which each column corresponds to
a player and each row corresponds to a volleyball set; volleyball
is the corresponding likelihood function in the form of a hyper2
distribution.
Usage
data(volleyball)
Details
A volleyball set is a Bernoulli trial between two disjoint subsets of
the players. The two subsets are denoted (after the game) as the
“winners” and the “losers”: these are denoted by 1
and 0
respectively.
Thus the first line reads of volleyball_results
reads:
p1 p2 p3 p4 p5 p6 p7 p8 p9 1 0 NA 1 0 0 NA 1 NA
showing that the teams were p1
, p4
and p8
against
p2
, p5
and p6
; players p3
, p7
and
p9
did not play.
These datasets illustrate the fact that such Bernoulli trials are only weakly informative.
These objects can be generated by running script
inst/volleyball.Rmd
, which includes some further discussion and
technical documentation and creates file volleyball.rda
which
resides in the data/
directory.
Source
Volleyball games at NOCS, 2006-2008
References
Robin K. S. Hankin (2010). “A Generalization of the Dirichlet Distribution”, Journal of Statistical Software, 33(11), 1-18
Examples
volleyball == volley(volleyball_table) # should be TRUE
Race results from the 2014-2015 Volvo Ocean Race
Description
Race results from the twelfth edition of the round-the-world Volvo Ocean Race.
Usage
data(volvo)
Format
A hyper2
object that gives a likelihood function
Details
Object volvo
is a hyper2
object that gives a likelihood
function for the strengths of the competitors of the 2014-2015 Volvo
Ocean Race; volvo_maxp
is a precomputed maximum likelihood
estimate of the competitors' strengths. Object volvo_table
is a
data frame with rows being teams and columns being legs.
These objects can be generated by running script inst/volvo.Rmd
,
which includes some further discussion and technical documentation and
creates file volvo.rda
which resides in the data/
directory.
References
Wikipedia contributors, 2019. “2014-2015 Volvo Ocean Race”. In Wikipedia, the free encyclopedia. Retrieved 22:21, February 28, 2020. https://en.wikipedia.org/w/index.php?title=2014%E2%80%932015_Volvo_Ocean_Race&oldid=914916131,
See Also
Examples
pie(volvo_maxp)
# equalp.test(volvo) # takes ~10 seconds to run
# convert table to a support function:
rank_likelihood(wikitable_to_ranktable(volvo_table))
Zap weak competitors
Description
Given a hyper2
object, discard competitors with a small estimated
strength.
Usage
zapweak(H, minstrength = 1e-05, maxit, ...)
Arguments
H |
Object of class |
minstrength |
Strength below which to discard competitors |
maxit |
Maximum number of iterations; if missing, use
|
... |
Further arguments, passed to |
Details
Iteratively discards the weakest player (if the estimated strength is
less than minstrength
) using discard_flawed()
.
maxp(..,n=1)
for efficiency.
Value
Returns a slimmed-down hyper2
object with weak players
removed.
Note
This function is experimental and appears to be overly aggressive.
For some likelihood functions zapweak()
removes all the
players.
I now think that there is no consistent way to remove weaker players from a likelihood function. I think the only way to do it is to look at the dataset that generates the likelihood function, somehow weed out the players with the poorest performance, and generate a new likelihood function without them.
Author(s)
Robin K. S. Hankin
See Also
Examples
zapweak(icons) # removes noone
# zapweak(rowing) # removes everyone...
Zipf's law
Description
A very short function that reproduces Zipf's law: a harmonic rank-probability distribution, formally
p(i)=\frac{i^{-1}}{\sum_{i=1}^{N} i^{-1}},\qquad i=1,\ldots,N
Usage
zipf(n)
Arguments
n |
Integer; if a hyper2 object is supplied this is
interpreted as |
Value
Returns a numeric vector summing to one
Author(s)
Robin K. S. Hankin
See Also
Examples
zipf(icons)