Recall that to determine confidence interval bounds, we solve the equation \[\begin{align*} F_Y(y_{\rm obs} \vert \theta) - q = 0 \end{align*}\] for \(\theta\). Here, \(Y\) is our chosen statistic (e.g., \(\bar{X}\)), \(F_Y(\cdot)\) is the cumulative distribution function (or cdf) for \(Y\)’s sampling distribution, \(y_{\rm obs}\) is the observed statistic value, and \(q\) is a quantile that we determine given the confidence coefficient \(1-\alpha\) and the type of interval we are constructing (one-sided lower bound, one-sided upper bound, or two-sided).
In this vignette, we examine the situation in which the sampling
distribution cdf is analytically derivable, but (a) is not
(easily) invertible by hand, and (b) is (to our knowledge!) not coded in
an existing R
package. In this situation, we would develop
our own cdf-calculating function, and then pass it to
cdfinv()
.
Let’s suppose we sample \(n = 16\) data from a \(\text{Uniform}(0,\theta)\) distribution and we wish to compute a 95% upper bound for \(\theta\). Further, let’s suppose that the \(10^{\rm th}\) datum is \(x_{(10),\rm obs} = 0.45\), and that we will use this datum to determine the confidence interval. (This is obviously an academic exercise: we would normally utilize \(x_{(16),\rm obs}\) and solve for the upper bound by hand.)
For the \(\text{Uniform}(0,\theta)\) distribution, the cdf within the domain is \[\begin{align*} F_X(x) = \int_0^x \frac{1}{\theta - 0} dy = \frac{x}{\theta} \,. \end{align*}\] We utilize a result found in Casella & Berger (2002; page 229): the cdf for the \(j^{\rm th}\) order statistic is \[\begin{align*} F_{(j)}(x) = \sum_{k=j}^n \binom{n}{k} [F_X(x)]^k [1-F_X(x)]^{n-k} \,. \end{align*}\] (Our notation slightly differs from that of Casella & Berger.) Here, that cdf would be \[\begin{align*} F_{(j)}(x) = \sum_{k=j}^n \binom{n}{k} \left(\frac{x}{\theta}\right)^k \left[1-\left(\frac{x}{\theta}\right)\right]^{n-k} \,. \end{align*}\] This is obviously not a cdf that we can easily invert by hand when \(j < n\).
Below we show how one might create one’s own code that returns the
cdf value as a function of a coordinate \(x\) and a parameter \(\theta\). Note that the first argument must
be q
…the observed statistic value passed to
cdfinv()
will be matched to this argument name.
Below we show how we can pass the function, now defined in
R
’s global environment, to cdfinv()
. The first
three arguments are standard: the distribution “name” is
unif.ord
(cdfinv()
will tack a p
on the front), the parameter of interest is theta
, and the
observed statistic value is 0.45. The next argument specifies that we
wish to compute a (95% by default) upper bound on \(\theta\), instead of the default two-sided
interval. As we know that \(\theta \geq
x_{(10),\rm obs}\), we specify a lower parameter bound
(lpb
) of 0.45. The last two arguments are extra ones
specifically required for punif.ord()
.
## $DISTR
## [1] "unif.ord"
##
## $PARAM
## [1] "theta"
##
## $STAT
## [1] 0.45
##
## $bound
## [1] 1.150861
##
## $q
## [1] 0.05
The 95% upper bound on \(\theta\) determined using the \(10^{\rm th}\) ordered datum is \(\hat{\theta}_U = 1.151\).