Type: | Package |
Title: | Generalized Ridge Trace Plots for Ridge Regression |
Version: | 0.8.0 |
Date: | 2024-11-30 |
Maintainer: | Michael Friendly <friendly@yorku.ca> |
Depends: | R (≥ 3.5.0), car |
Imports: | rgl, colorspace, splines |
Suggests: | MASS, bestglm, vcdExtra |
Description: | The genridge package introduces generalizations of the standard univariate ridge trace plot used in ridge regression and related methods. These graphical methods show both bias (actually, shrinkage) and precision, by plotting the covariance ellipsoids of the estimated coefficients, rather than just the estimates themselves. 2D and 3D plotting methods are provided, both in the space of the predictor variables and in the transformed space of the PCA/SVD of the predictors. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
LazyLoad: | yes |
LazyData: | yes |
Language: | en-US |
BugReports: | https://github.com/friendly/genridge/issues |
URL: | https://github.com/friendly/genridge, https://friendly.github.io/genridge/ |
RoxygenNote: | 7.3.2 |
Encoding: | UTF-8 |
NeedsCompilation: | no |
Packaged: | 2024-12-02 14:41:12 UTC; friendly |
Author: | Michael Friendly |
Repository: | CRAN |
Date/Publication: | 2024-12-02 15:00:02 UTC |
Generalized ridge trace plots for ridge regression
Description
The genridge package introduces generalizations of the standard univariate ridge trace plot used in ridge regression and related methods (Friendly, 2012). These graphical methods show both bias (actually, shrinkage) and precision, by plotting the covariance ellipsoids of the estimated coefficients, rather than just the estimates themselves. 2D and 3D plotting methods are provided, both in the space of the predictor variables and in the transformed space of the PCA/SVD of the predictors.
Details
This package provides computational support for the
graphical methods described in Friendly (2013). Ridge regression models may
be fit using the function ridge
, which incorporates features
of lm.ridge
. In particular, the shrinkage factors in
ridge regression may be specified either in terms of the constant added to
the diagonal of X^T X
matrix (lambda
), or the equivalent number
of degrees of freedom.
More importantly, the ridge
function also calculates and
returns the associated covariance matrices of each of the ridge estimates,
allowing precision to be studied and displayed graphically.
This provides the support for the main plotting functions in the package:
plot.ridge
: Bivariate ridge trace plots
pairs.ridge
: All pairwise bivariate ridge trace plots
plot3d.ridge
: 3D ridge trace plots
traceplot
: Traditional univariate ridge trace plots
In addition, the function pca.ridge
transforms the
coefficients and covariance matrices of a ridge
object from predictor
space to the equivalent, but more interesting space of the PCA of X^T
X
or the SVD of X. The main plotting functions also work for these
objects, of class c("ridge", "pcaridge")
.
Finally, the functions precision
and vif.ridge
provide other useful measures and plots.
Author(s)
Michael Friendly
Maintainer: Michael Friendly <friendly@yorku.ca>
References
Friendly, M. (2013). The Generalized Ridge Trace Plot: Visualizing Bias and Precision. Journal of Computational and Graphical Statistics, 22(1), 50-68, doi:10.1080/10618600.2012.681237, https://www.datavis.ca/papers/genridge-jcgs.pdf
Arthur E. Hoerl and Robert W. Kennard (1970). Ridge Regression: Biased Estimation for Nonorthogonal Problems, Technometrics, 12(1), pp. 55-67.
Arthur E. Hoerl and Robert W. Kennard (1970). Ridge Regression: Applications to Nonorthogonal Problems Technometrics, 12(1), pp. 69-82.
See Also
Examples
# see examples for ridge, etc.
Acetylene Data
Description
The data consist of measures of yield
of a chemical manufacturing
process for acetylene in relation to numeric parameters.
Format
A data frame with 16 observations on the following 4 variables.
yield
conversion percentage yield of acetylene
temp
reactor temperature (celsius)
ratio
H2 to N-heptone ratio
time
contact time (sec)
Details
Marquardt and Snee (1975) used these data to illustrate ridge regression in a model containing quadratic and interaction terms, particularly the need to center and standardize variables appearing in high-order terms.
Typical models for these data include the interaction of temp:ratio
,
and a squared term in temp
Source
SAS documentation example for PROC REG
, Ridge
Regression for Acetylene Data.
References
Marquardt, D.W., and Snee, R.D. (1975), "Ridge Regression in Practice," The American Statistician, 29, 3-20.
Marquardt, D.W. (1980), "A Critique of Some Ridge Regression Methods: Comment," Journal of the American Statistical Association, Vol. 75, No. 369 (Mar., 1980), pp. 87-91
Examples
data(Acetylene)
# naive model, not using centering
amod0 <- lm(yield ~ temp + ratio + time + I(time^2) + temp:time, data=Acetylene)
y <- Acetylene[,"yield"]
X0 <- model.matrix(amod0)[,-1]
lambda <- c(0, 0.0005, 0.001, 0.002, 0.005, 0.01)
aridge0 <- ridge(y, X0, lambda=lambda)
traceplot(aridge0)
traceplot(aridge0, X="df")
pairs(aridge0, radius=0.2)
Detroit Homicide Data for 1961-1973
Description
The data set Detroit
was used extensively in the book by Miller
(2002) on subset regression. The data are unusual in that a subset of three
predictors can be found which gives a very much better fit to the data than
the subsets found from the Efroymson stepwise algorithm, or from forward
selection or backward elimination. They are also unusual in that, as time
series data, the assumption of independence is patently violated, and the
data suffer from problems of high collinearity.
As well, ridge regression reveals somewhat paradoxical paths of shrinkage in univariate ridge trace plots, that are more comprehensible in multivariate views.
Format
A data frame with 13 observations on the following 14 variables.
Police
Full-time police per 100,000 population
Unemp
Percent unemployed in the population
MfgWrk
Number of manufacturing workers in thousands
GunLic
Number of handgun licences per 100,000 population
GunReg
Number of handgun registrations per 100,000 population
HClear
Percent of homicides cleared by arrests
WhMale
Number of white males in the population
NmfgWrk
Number of non-manufacturing workers in thousands
GovWrk
Number of government workers in thousands
HrEarn
Average hourly earnings
WkEarn
Average weekly earnings
Accident
Death rate in accidents per 100,000 population
Assaults
Number of assaults per 100,000 population
Homicide
Number of homicides per 100,000 of population
Details
The data were originally collected and discussed by Fisher (1976) but the
complete dataset first appeared in Gunst and Mason (1980, Appendix A).
Miller (2002) discusses this dataset throughout his book, but doesn't state
clearly which variables he used as predictors and which is the dependent
variable. (Homicide
was the dependent variable, and the predictors
were Police
... WkEarn
.) The data were obtained from
StatLib.
A similar version of this data set, with different variable names appears in
the bestglm
package.
Source
https://lib.stat.cmu.edu/datasets/detroit
References
Fisher, J.C. (1976). Homicide in Detroit: The Role of Firearms. Criminology, 14, 387–400.
Gunst, R.F. and Mason, R.L. (1980). Regression analysis and its application: A data-oriented approach. Marcel Dekker.
Miller, A. J. (2002). Subset Selection in Regression. 2nd Ed. Chapman & Hall/CRC. Boca Raton.
Examples
data(Detroit)
# Work with a subset of predictors, from Miller (2002, Table 3.14),
# the "best" 6 variable model
# Variables: Police, Unemp, GunLic, HClear, WhMale, WkEarn
# Scale these for comparison with other methods
Det <- as.data.frame(scale(Detroit[,c(1,2,4,6,7,11)]))
Det <- cbind(Det, Homicide=Detroit[,"Homicide"])
# use the formula interface; specify ridge constants in terms
# of equivalent degrees of freedom
dridge <- ridge(Homicide ~ ., data=Det, df=seq(6,4,-.5))
# univariate trace plots are seemingly paradoxical in that
# some coefficients "shrink" *away* from 0
traceplot(dridge, X="df")
vif(dridge)
pairs(dridge, radius=0.5)
plot3d(dridge, radius=0.5, labels=dridge$df)
# transform to PCA/SVD space
dpridge <- pca(dridge)
# not so paradoxical in PCA space
traceplot(dpridge, X="df")
biplot(dpridge, radius=0.5, labels=dpridge$df)
# show PCA vectors in variable space
biplot(dridge, radius=0.5, labels=dridge$df)
Hospital manpower data
Description
The hospital manpower data, taken from Myers (1990), table 3.8, are a well-known example of highly collinear data to which ridge regression and various shrinkage and selection methods are often applied.
The data consist of measures taken at 17 U.S. Naval Hospitals and the goal is to predict the required monthly man hours for staffing purposes.
Format
A data frame with 17 observations on the following 6 variables.
Hours
monthly man hours (response variable)
Load
average daily patient load
Xray
monthly X-ray exposures
BedDays
monthly occupied bed days
AreaPop
eligible population in the area in thousands
Stay
average length of patient's stay in days
Details
Myers (1990) indicates his source was "Procedures and Analysis for Staffing Standards Development: Data/Regression Analysis Handbook", Navy Manpower and Material Analysis Center, San Diego, 1979.
Source
Raymond H. Myers (1990). Classical and Modern Regression with Applications, 2nd ed., PWS-Kent, pp. 130-133.
References
Donald R. Jensen and Donald E. Ramirez (2012). Variations on Ridge Traces in Regression, Communications in Statistics - Simulation and Computation, 41 (2), 265-278.
See Also
manpower
for the same data, and other
analyses
Examples
data(Manpower)
mmod <- lm(Hours ~ ., data=Manpower)
vif(mmod)
# ridge regression models, specified in terms of equivalent df
mridge <- ridge(Hours ~ ., data=Manpower, df=seq(5, 3.75, -.25))
vif(mridge)
# univariate ridge trace plots
traceplot(mridge)
traceplot(mridge, X="df")
# bivariate ridge trace plots
plot(mridge, radius=0.25, labels=mridge$df)
pairs(mridge, radius=0.25)
# 3D views
# ellipsoids for Load, Xray & BedDays are nearly 2D
plot3d(mridge, radius=0.2, labels=mridge$df)
# variables in model selected by AIC & BIC
plot3d(mridge, variables=c(2,3,5), radius=0.2, labels=mridge$df)
# plots in PCA/SVD space
mpridge <- pca(mridge)
traceplot(mpridge, X="df")
biplot(mpridge, radius=0.25)
Biplot of Ridge Regression Trace Plot in SVD Space
Description
biplot.pcaridge
supplements the standard display of the covariance
ellipsoids for a ridge regression problem in PCA/SVD space with labeled
arrows showing the contributions of the original variables to the dimensions
plotted.
Usage
## S3 method for class 'pcaridge'
biplot(
x,
variables = (p - 1):p,
labels = NULL,
asp = 1,
origin,
scale,
var.lab = rownames(V),
var.lwd = 1,
var.col = "black",
var.cex = 1,
xlab,
ylab,
prefix = "Dim ",
suffix = TRUE,
...
)
Arguments
x |
A |
variables |
The dimensions or variables to be shown in the the plot.
By default, the last two dimensions, corresponding to the smallest
singular values, are plotted for |
labels |
A vector of character strings or expressions used as labels
for the ellipses. Use |
asp |
Aspect ratio for the plot. The default value, |
origin |
The origin for the variable vectors in this plot, a vector of length 2. If not specified, the function calculates an origin to make the variable vectors approximately centered in the plot window. |
scale |
The scale factor for variable vectors in this plot. If not specified, the function calculates a scale factor to make the variable vectors approximately fill the plot window. |
var.lab |
Labels for variable vectors. The default is the names of the predictor variables. |
var.lwd , var.col , var.cex |
Line width, color and character size used to draw and label the arrows representing the variables in this plot. |
xlab , ylab |
Labels for the plot dimensions. If not specified,
|
prefix |
Prefix for labels of the plot dimensions. |
suffix |
Suffix for labels of the plot dimensions. If
|
... |
Other arguments, passed to |
Details
The biplot view showing the dimensions corresponding to the two smallest singular values is particularly useful for understanding how the predictors contribute to shrinkage in ridge regression.
This is only a biplot in the loose sense that results are shown in two spaces simultaneously – the transformed PCA/SVD space of the original predictors, and vectors representing the predictors projected into this space.
biplot.ridge
is a similar extension of plot.ridge
,
adding vectors showing the relation of the PCA/SVD dimensions to the plotted
variables.
class("ridge")
objects use the transpose of the right singular
vectors, t(x$svd.V)
for the dimension weights plotted as vectors.
Value
None
Author(s)
Michael Friendly, with contributions by Uwe Ligges
References
Friendly, M. (2013). The Generalized Ridge Trace Plot: Visualizing Bias and Precision. Journal of Computational and Graphical Statistics, 22(1), 50-68, doi:10.1080/10618600.2012.681237, https://datavis.ca/papers/genridge-jcgs.pdf
See Also
Examples
longley.y <- longley[, "Employed"]
longley.X <- data.matrix(longley[, c(2:6,1)])
lambda <- c(0, 0.005, 0.01, 0.02, 0.04, 0.08)
lridge <- ridge(longley.y, longley.X, lambda=lambda)
plridge <- pca(lridge)
plot(plridge, radius=0.5)
# same, with variable vectors
biplot(plridge, radius=0.5)
# add some other options
biplot(plridge, radius=0.5, var.col="brown", var.lwd=2, var.cex=1.2, prefix="Dimension ")
# biplots for ridge objects, showing PCA vectors
plot(lridge, radius=0.5)
biplot(lridge, radius=0.5)
biplot(lridge, radius=0.5, asp=NA)
Enhanced Contour Plots
Description
This is an enhancement to contour
, written as a
wrapper for that function. It creates a contour plot, or adds contour lines
to an existing plot, allowing the contours to be filled and returning the
list of contour lines.
Usage
contourf(
x = seq(0, 1, length.out = nrow(z)),
y = seq(0, 1, length.out = ncol(z)),
z,
nlevels = 10,
levels = pretty(zlim, nlevels),
zlim = range(z, finite = TRUE),
col = par("fg"),
color.palette = colorRampPalette(c("white", col)),
fill.col = color.palette(nlevels + 1),
fill.alpha = 0.5,
add = FALSE,
...
)
Arguments
x , y |
locations of grid lines at which the values in |
z |
a matrix containing the values to be plotted (NAs are allowed).
Note that |
nlevels |
number of contour levels desired iff levels is not supplied |
levels |
numeric vector of levels at which to draw contour lines |
zlim |
z-limits for the plot. x-limits and y-limits can be passed through ... |
col |
color for the lines drawn |
color.palette |
a color palette function to be used to assign fill colors in the plot |
fill.col |
a call to the |
fill.alpha |
transparency value for |
add |
logical. If |
... |
additional arguments passed to |
Value
Returns invisibly the list of contours lines, with components
levels
, x
, y
. See
contourLines
.
Author(s)
Michael Friendly
See Also
contourplot
from package lattice.
Examples
x <- 10*1:nrow(volcano)
y <- 10*1:ncol(volcano)
contourf(x,y,volcano, col="blue")
contourf(x,y,volcano, col="blue", nlevels=6)
# return value, unfilled, other graphic parameters
res <- contourf(x,y,volcano, col="blue", fill.col=NULL, lwd=2)
# levels used in the plot
sapply(res, function(x) x[[1]])
Diabetes Progression
Description
These data consist of observations on 442 patients, with the response of interest being a quantitative measure of disease progression one year after baseline.
There are ten baseline variables: age, sex, body-mass index (bmi
), average blood pressure (map
)
and six blood serum measurements.
Usage
data("diab")
Format
A data frame with 442 observations on the following 11 variables.
prog
disease progression, a numeric vector
age
age, a numeric vector
sex
integer, a numeric vector
bmi
body mass index, a numeric vector
map
mean arterial blood pressure, a numeric vector
tc
blood serum TC, a numeric vector
ldl
blood serum low-density lipoprotein ("bad cholersterol"), a numeric vector
hdl
blood serum high-density lipoprotein ("good cholersterol"), a numeric vector
tch
blood serum TCH, a numeric vector
ltg
blood serum lamotrigine, a numeric vector
glu
blood serum glucose, a numeric vector
Details
Efron & Hastie describe their analysis using the centered predictor variables standardized to unit L2 norm.
ridge
does not (yet) provide this scaling.
Source
The dataset was taken from the web site for Efron & Hastie (2021), Computer Age Statistical Inference, https://hastie.su.domains/CASI_files/DATA/diabetes.csv.
References
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least Angle Regression. The Annals of Statistics, 32(2), 407-499. doi:10.1214/009053604000000067
Efron, B., & Hastie, T. (2021). Computer Age Statistical Inference, Student Edition: Algorithms, Evidence, and Data Science, Cambridge University Press. doi:10.1017/9781108914062
Examples
data(diab)
## maybe str(diab) ; plot(diab) ...
Scatterplot Matrix of Bivariate Ridge Trace Plots
Description
Displays all possible pairs of bivariate ridge trace plots for a given set of predictors.
Usage
## S3 method for class 'ridge'
pairs(
x,
variables,
radius = 1,
lwd = 1,
lty = 1,
col = c("black", "red", "darkgreen", "blue", "darkcyan", "magenta", "brown",
"darkgray"),
center.pch = 16,
center.cex = 1.25,
digits = getOption("digits") - 3,
diag.cex = 2,
diag.panel = panel.label,
fill = FALSE,
fill.alpha = 0.3,
...
)
Arguments
x |
A |
variables |
Predictors in the model to be displayed in the plot: an integer or character vector, giving the indices or names of the variables. |
radius |
Radius of the ellipse-generating circle for the covariance ellipsoids. |
lwd , lty |
Line width and line type for the covariance ellipsoids. Recycled as necessary. |
col |
A numeric or character vector giving the colors used to plot the covariance ellipsoids. Recycled as necessary. |
center.pch |
Plotting character used to show the bivariate ridge estimates. Recycled as necessary. |
center.cex |
Size of the plotting character for the bivariate ridge estimates |
digits |
Number of digits to be displayed as the (min, max) values in the diagonal panels |
diag.cex |
Character size for predictor labels in diagonal panels |
diag.panel |
Function to draw diagonal panels. Not yet implemented:
just uses internal |
fill |
Logical vector: Should the covariance ellipsoids be filled? Recycled as necessary. |
fill.alpha |
Numeric vector: alpha transparency value(s) for filled ellipsoids. Recycled as necessary. |
... |
Other arguments passed down |
Value
None. Used for its side effect of plotting.
Author(s)
Michael Friendly
References
Friendly, M. (2013). The Generalized Ridge Trace Plot: Visualizing Bias and Precision. Journal of Computational and Graphical Statistics, 22(1), 50-68, doi:10.1080/10618600.2012.681237, https://www.datavis.ca/papers/genridge-jcgs.pdf
See Also
ridge
for details on ridge regression as implemented here
plot.ridge
, traceplot
for other plotting methods
Examples
longley.y <- longley[, "Employed"]
longley.X <- data.matrix(longley[, c(2:6,1)])
lambda <- c(0, 0.005, 0.01, 0.02, 0.04, 0.08)
lridge <- ridge(longley.y, longley.X, lambda=lambda)
pairs(lridge, radius=0.5, diag.cex=1.75)
data(prostate)
py <- prostate[, "lpsa"]
pX <- data.matrix(prostate[, 1:8])
pridge <- ridge(py, pX, df=8:1)
pairs(pridge)
Transform Ridge Estimates to PCA Space
Description
The function pca.ridge
transforms a ridge
object from
parameter space, where the estimated coefficients are \beta_k
with
covariance matrices \Sigma_k
, to the principal component space defined
by the right singular vectors, V
, of the singular value decomposition
of the scaled predictor matrix, X
.
In this space, the transformed coefficients are V \beta_k
, with
covariance matrices
V \Sigma_k V^T
.
This transformation provides alternative views of ridge estimates in low-rank approximations. In particular, it allows one to see where the effects of collinearity typically reside — in the smallest PCA dimensions.
Usage
pca(x, ...)
Arguments
x |
A |
... |
Other arguments passed down. Not presently used in this implementation. |
Value
An object of class c("ridge", "pcaridge")
, with the same
components as the original ridge
object.
Author(s)
Michael Friendly
References
Friendly, M. (2013). The Generalized Ridge Trace Plot: Visualizing Bias and Precision. Journal of Computational and Graphical Statistics, 22(1), 50-68, doi:10.1080/10618600.2012.681237, https://www.datavis.ca/papers/genridge-jcgs.pdf
See Also
Examples
longley.y <- longley[, "Employed"]
longley.X <- data.matrix(longley[, c(2:6,1)])
lambda <- c(0, 0.005, 0.01, 0.02, 0.04, 0.08)
lridge <- ridge(longley.y, longley.X, lambda=lambda)
plridge <- pca(lridge)
traceplot(plridge)
pairs(plridge)
# view in space of smallest singular values
plot(plridge, variables=5:6)
Plot Shrinkage vs. Variance for Ridge Precision
Description
This function uses the results of precision
to
plot a measure of shrinkage of the coefficients in ridge regression against a selected measure
of their estimated sampling variance, so as to provide a direct visualization of the tradeoff
between bias and precision.
Usage
## S3 method for class 'precision'
plot(
x,
xvar = "norm.beta",
yvar = c("det", "trace", "max.eig"),
labels = c("lambda", "df"),
label.cex = 1.25,
label.prefix,
criteria = NULL,
pch = 16,
cex = 1.5,
col,
main = NULL,
xlab,
ylab,
...
)
Arguments
x |
A data frame of class |
xvar |
The character name of the column to be used for the horizontal axis. Typically, this is the normalized sum
of squares of the coefficients ( |
yvar |
The character name of the column to be used for the vertical axis. One of
|
labels |
The character name of the column to be used for point labels. One of |
label.cex |
Character size for point labels. |
label.prefix |
Character or expression prefix for the point labels. |
criteria |
The vector of optimal shrinkage criteria from the |
pch |
Plotting character for points |
cex |
Character size for points |
col |
Point colors |
main |
Plot title |
xlab |
Label for horizontal axis |
ylab |
Label for vertical axis |
... |
Other arguments passed to |
Value
Returns nothing. Used for the side effect of plotting.
Author(s)
Michael Friendly
See Also
ridge
for details on ridge regression as implemented here.
precision
for definitions of the measures
Examples
lambda <- c(0, 0.001, 0.005, 0.01, 0.02, 0.04, 0.08)
lridge <- ridge(Employed ~ GNP + Unemployed + Armed.Forces +
Population + Year + GNP.deflator,
data=longley, lambda=lambda)
criteria <- lridge$criteria |> print()
pridge <- precision(lridge) |> print()
plot(pridge)
# also show optimal criteria
plot(pridge, criteria = criteria)
# use degrees of freedom as point labels
plot(pridge, labels = "df")
plot(pridge, labels = "df", label.prefix="df:")
# show the trace measure
plot(pridge, yvar="trace")
Bivariate Ridge Trace Plots
Description
The bivariate ridge trace plot displays 2D projections of the covariance ellipsoids for a set of ridge regression estimates indexed by a ridge tuning constant.
The centers of these ellipses show the bias induced for each parameter, and also how the change in the ridge estimate for one parameter is related to changes for other parameters.
The size and shapes of the covariance ellipses show directly the effect on precision of the estimates as a function of the ridge tuning constant.
plot.pcaridge
does these bivariate ridge trace plots for "pcaridge"
objects, defaulting to plotting the
two smallest components.
Usage
## S3 method for class 'ridge'
plot(
x,
variables = 1:2,
radius = 1,
which.lambda = 1:length(x$lambda),
labels = lambda,
pos = 3,
cex = 1.2,
lwd = 2,
lty = 1,
xlim,
ylim,
col = c("black", "red", "darkgreen", "blue", "darkcyan", "magenta", "brown",
"darkgray"),
center.pch = 16,
center.cex = 1.5,
fill = FALSE,
fill.alpha = 0.3,
ref = TRUE,
ref.col = gray(0.7),
...
)
## S3 method for class 'pcaridge'
plot(x, variables = (p - 1):p, labels = NULL, ...)
Arguments
x |
A |
variables |
Predictors in the model to be displayed in the plot: an
integer or character vector of length 2, giving the indices or names of the
variables. Defaults to the first two predictors for |
radius |
Radius of the ellipse-generating circle for the covariance
ellipsoids. The default, |
which.lambda |
A vector of indices used to select the values of
|
labels |
A vector of character strings or expressions used as labels
for the ellipses. Use |
pos , cex |
Scalars or vectors of positions (relative to the ellipse centers) and character size used to label the ellipses |
lwd , lty |
Line width and line type for the covariance ellipsoids. Recycled as necessary. |
xlim , ylim |
X, Y limits for the plot, each a vector of length 2. If missing, the range of the covariance ellipsoids is used. |
col |
A numeric or character vector giving the colors used to plot the covariance ellipsoids. Recycled as necessary. |
center.pch |
Plotting character used to show the bivariate ridge estimates. Recycled as necessary. |
center.cex |
Size of the plotting character for the bivariate ridge estimates |
fill |
Logical vector: Should the covariance ellipsoids be filled? Recycled as necessary. |
fill.alpha |
Numeric vector: alpha transparency value(s) in the range (0, 1) for filled ellipsoids. Recycled as necessary. |
ref |
Logical: whether to draw horizontal and vertical reference lines at 0. |
ref.col |
Color of reference lines. |
... |
Other arguments passed down to
|
Value
None. Used for its side effect of plotting.
Author(s)
Michael Friendly
References
Friendly, M. (2013). The Generalized Ridge Trace Plot: Visualizing Bias and Precision. Journal of Computational and Graphical Statistics, 22(1), 50-68, doi:10.1080/10618600.2012.681237, https://www.datavis.ca/papers/genridge-jcgs.pdf
See Also
ridge
for details on ridge regression as implemented
here; pairs.ridge
, traceplot
, for basic plots.
pca.ridge
for transformation of ridge regression estimates to PCA space.
biplot.pcaridge
and plot3d.ridge
for other
plotting methods
Examples
longley.y <- longley[, "Employed"]
longley.X <- data.matrix(longley[, c(2:6,1)])
lambda <- c(0, 0.005, 0.01, 0.02, 0.04, 0.08)
lambdaf <- c("", ".005", ".01", ".02", ".04", ".08")
lridge <- ridge(longley.y, longley.X, lambda=lambda)
op <- par(mfrow=c(2,2), mar=c(4, 4, 1, 1)+ 0.1)
for (i in 2:5) {
plot(lridge, variables=c(1,i), radius=0.5, cex.lab=1.5)
text(lridge$coef[1,1], lridge$coef[1,i], expression(~widehat(beta)^OLS),
cex=1.5, pos=4, offset=.1)
if (i==2) text(lridge$coef[-1,1:2], lambdaf[-1], pos=3, cex=1.25)
}
par(op)
data(prostate)
py <- prostate[, "lpsa"]
pX <- data.matrix(prostate[, 1:8])
pridge <- ridge(py, pX, df=8:1)
plot(pridge)
plot(pridge, fill=c(TRUE, rep(FALSE,7)))
3D Ridge Trace Plots
Description
The 3D ridge trace plot displays 3D projections of the covariance ellipsoids for a set of ridge regression estimates indexed by a ridge tuning constant.
The centers of these ellipses show the bias induced for each parameter, and also how the change in the ridge estimate for one parameter is related to changes for other parameters.
The size and shapes of the covariance ellipsoids show directly the effect on precision of the estimates as a function of the ridge tuning constant.
plot3d.ridge
and plot3d.pcaridge
differ only in the defaults
for the variables plotted.
Usage
plot3d(x, ...)
## S3 method for class 'pcaridge'
plot3d(x, variables = (p - 2):p, ...)
## S3 method for class 'ridge'
plot3d(
x,
variables = 1:3,
radius = 1,
which.lambda = 1:length(x$lambda),
lwd = 1,
lty = 1,
xlim,
ylim,
zlim,
xlab,
ylab,
zlab,
col = c("black", "red", "darkgreen", "blue", "darkcyan", "magenta", "brown",
"darkgray"),
labels = lambda,
ref = TRUE,
ref.col = gray(0.7),
segments = 40,
shade = TRUE,
shade.alpha = 0.1,
wire = FALSE,
aspect = 1,
add = FALSE,
...
)
Arguments
x |
A |
... |
Other arguments passed down |
variables |
Predictors in the model to be displayed in the plot: an
integer or character vector of length 3, giving the indices or names of the
variables. Defaults to the first three predictors for |
radius |
Radius of the ellipse-generating circle for the covariance
ellipsoids. The default, |
which.lambda |
A vector of indices used to select the values of
|
lwd , lty |
Line width and line type for the covariance ellipsoids. Recycled as necessary. |
xlim , ylim , zlim |
X, Y, Z limits for the plot, each a vector of length 2. If missing, the range of the covariance ellipsoids is used. |
xlab , ylab , zlab |
Labels for the X, Y, Z variables in the plot. If
missing, the names of the predictors given in |
col |
A numeric or character vector giving the colors used to plot the covariance ellipsoids. Recycled as necessary. |
labels |
A numeric or character vector giving the labels to be drawn at the centers of the covariance ellipsoids. |
ref |
Logical: whether to draw horizontal and vertical reference lines at 0. This is not yet implemented. |
ref.col |
Color of reference lines. |
segments |
Number of line segments used in drawing each dimension of a covariance ellipsoid. |
shade |
a logical scalar or vector, indicating whether the ellipsoids
should be rendered with |
shade.alpha |
a numeric value in the range [0,1], or a vector of such
values, giving the alpha transparency for ellipsoids rendered with
|
wire |
a logical scalar or vector, indicating whether the ellipsoids
should be rendered with |
aspect |
a scalar or vector of length 3, or the character string "iso",
indicating the ratios of the x, y, and z axes of the bounding box. The
default, |
add |
if |
Value
None. Used for its side-effect of plotting
Note
This is an initial implementation. The details and arguments are subject to change.
Author(s)
Michael Friendly
References
Friendly, M. (2013). The Generalized Ridge Trace Plot: Visualizing Bias and Precision. Journal of Computational and Graphical Statistics, 22(1), 50-68, doi:10.1080/10618600.2012.681237, https://www.datavis.ca/papers/genridge-jcgs.pdf
See Also
plot.ridge
, pairs.ridge
,
pca.ridge
Examples
lmod <- lm(Employed ~ GNP + Unemployed + Armed.Forces + Population +
Year + GNP.deflator, data=longley)
longley.y <- longley[, "Employed"]
longley.X <- model.matrix(lmod)[,-1]
lambda <- c(0, 0.005, 0.01, 0.02, 0.04, 0.08)
lambdaf <- c("0", ".005", ".01", ".02", ".04", ".08")
lridge <- ridge(longley.y, longley.X, lambda=lambda)
plot3d(lridge, var=c(1,4,5), radius=0.5)
# view in SVD/PCA space
plridge <- pca(lridge)
plot3d(plridge, radius=0.5)
Measures of Precision and Shrinkage for Ridge Regression
Description
The goal of precision
is to allow you to study the relationship between shrinkage of ridge
regression coefficients and their precision directly by calculating measures of each.
Three measures of (inverse) precision based on the “size” of the
covariance matrix of the parameters are calculated. Let V_k \equiv \text{Var}(\mathbf{\beta}_k)
be the covariance matrix for a given ridge constant, and let \lambda_i , i= 1,
\dots p
be its eigenvalues. Then the variance (= 1/precision) measures are:
-
"det"
:\log | V_k | = \log \prod \lambda
(withdet.fun = "log"
, the default) or|V_k|^{1/p} =(\prod \lambda)^{1/p}
(withdet.fun = "root"
) measures the linearized volume of the covariance ellipsoid and corresponds conceptually to Wilks' Lambda criterion -
"trace"
:\text{trace}( V_k ) = \sum \lambda
corresponds conceptually to Pillai's trace criterion -
"max.eig"
:\lambda_1 = \max (\lambda)
corresponds to Roy's largest root criterion.
Two measures of shrinkage are also calculated:
-
norm.beta
: the root mean square of the coefficient vector\lVert\mathbf{\beta}_k \rVert
, normalized to a maximum of 1.0 ifnormalize == TRUE
(the default). -
norm.diff
: the root mean square of the difference from the OLS estimate\lVert \mathbf{\beta}_{\text{OLS}} - \mathbf{\beta}_k \rVert
. This measure is inversely related tonorm.beta
A plot method, plot.precision
facilitates making graphs of these quantities.
Usage
precision(object, det.fun, normalize, ...)
Arguments
object |
An object of class |
det.fun |
Function to be applied to the determinants of the covariance
matrices, one of |
normalize |
If |
... |
Other arguments (currently unused) |
Value
An object of class c("precision", "data.frame")
with the following columns:
lambda |
The ridge constant |
df |
The equivalent effective degrees of freedom |
det |
The |
trace |
The trace of the covariance matrix |
max.eig |
Maximum eigen value of the covariance matrix |
norm.beta |
The root mean square of the estimated coefficients, possibly normalized |
norm.diff |
The root mean square of the difference between the OLS solution
( |
Note
Models fit by lm
and ridge
use a different scaling for
the predictors, so the results of precision
for an lm
model
will not correspond to those for ridge
with ridge constant = 0.
Author(s)
Michael Friendly
See Also
Examples
longley.y <- longley[, "Employed"]
longley.X <- data.matrix(longley[, c(2:6,1)])
lambda <- c(0, 0.005, 0.01, 0.02, 0.04, 0.08)
lridge <- ridge(longley.y, longley.X, lambda=lambda)
# same, using formula interface
lridge <- ridge(Employed ~ GNP + Unemployed + Armed.Forces + Population + Year + GNP.deflator,
data=longley, lambda=lambda)
clr <- c("black", rainbow(length(lambda)-1, start=.6, end=.1))
coef(lridge)
(pdat <- precision(lridge))
# plot log |Var(b)| vs. length(beta)
with(pdat, {
plot(norm.beta, det, type="b",
cex.lab=1.25, pch=16, cex=1.5, col=clr, lwd=2,
xlab='shrinkage: ||b|| / max(||b||)',
ylab='variance: log |Var(b)|')
text(norm.beta, det, lambda, cex=1.25, pos=c(rep(2,length(lambda)-1),4))
text(min(norm.beta), max(det), "Variance vs. Shrinkage", cex=1.5, pos=4)
})
# plot trace[Var(b)] vs. length(beta)
with(pdat, {
plot(norm.beta, trace, type="b",
cex.lab=1.25, pch=16, cex=1.5, col=clr, lwd=2,
xlab='shrinkage: ||b|| / max(||b||)',
ylab='variance: trace [Var(b)]')
text(norm.beta, trace, lambda, cex=1.25, pos=c(2, rep(4,length(lambda)-1)))
# text(min(norm.beta), max(det), "Variance vs. Shrinkage", cex=1.5, pos=4)
})
Prostate Cancer Data
Description
Data to examine the correlation between the level of prostate-specific antigen and a number of clinical measures in men who were about to receive a radical prostatectomy.
Format
A data frame with 97 observations on the following 10 variables.
- lcavol
log cancer volume
- lweight
log prostate weight
- age
in years
- lbph
log of the amount of benign prostatic hyperplasia
- svi
seminal vesicle invasion
- lcp
log of capsular penetration
- gleason
a numeric vector
- pgg45
percent of Gleason score 4 or 5
- lpsa
response
- train
a logical vector
Details
This data set came originally from the (now defunct) ElemStatLearn package.
The last column indicates which 67 observations were used as the "training set" and which 30 as the test set, as described on page 48 in the book.
Note
There was an error in this dataset in earlier versions of the package, as indicated in a footnote on page 3 of the second edition of the book. As of version 2012.04-0 this was corrected.
Source
Stamey, T., Kabalin, J., McNeal, J., Johnstone, I., Freiha, F., Redwine, E. and Yang, N (1989) Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate II. Radical prostatectomy treated patients, Journal of Urology, 16: 1076–1083.
Examples
data(prostate)
str( prostate )
cor( prostate[,1:8] )
prostate <- prostate[, -10]
prostate.mod <- lm(lpsa ~ ., data=prostate)
vif(prostate.mod)
py <- prostate[, "lpsa"]
pX <- data.matrix(prostate[, 1:8])
pridge <- ridge(py, pX, df=8:1)
pridge
# univariate ridge trace plots
traceplot(pridge)
traceplot(pridge, X="df")
# bivariate ridge trace plots
plot(pridge)
pairs(pridge)
Ridge Regression Estimates
Description
The function ridge
fits linear models by ridge regression, returning
an object of class ridge
designed to be used with the plotting
methods in this package.
It is also designed to facilitate an alternative representation of the effects of shrinkage in the space of uncorrelated (PCA/SVD) components of the predictors.
The standard formulation of ridge regression is that it regularizes the estimates of coefficients
by adding small positive constants \lambda
to the diagonal elements of \mathbf{X}^\top\mathbf{X}
in
the least squares solution to achieve a more favorable tradeoff between bias and variance (inverse of precision)
of the coefficients.
\widehat{\mathbf{\beta}}^{\text{RR}}_k = (\mathbf{X}^\top \mathbf{X} + \lambda \mathbf{I})^{-1} \mathbf{X}^\top \mathbf{y}
Ridge regression shrinkage can be parameterized in several ways.
If a vector of
lambda
values is supplied, these are used directly in the ridge regression computations.Otherwise, if a vector
df
can be supplied the equivalent values for effective degrees of freedom corresponding to shrinkage, going down from the number of predictors in the model.
In either case, both lambda
and
df
are returned in the ridge
object, but the rownames
of the
coefficients are given in terms of lambda
.
coef
extracts the estimated coefficients for each value of the shrinkage factor
vcov
extracts the estimated p \times p
covariance matrices of the coefficients for each value of the shrinkage factor.
best
extracts the optimal shrinkage values according to several criteria:
HKB: Hoerl et al. (1975); LW: Lawless & Wang (1976); GCV: Golub et al. (1975)
Usage
ridge(y, ...)
## S3 method for class 'formula'
ridge(formula, data, lambda = 0, df, svd = TRUE, contrasts = NULL, ...)
## Default S3 method:
ridge(y, X, lambda = 0, df, svd = TRUE, ...)
## S3 method for class 'ridge'
coef(object, ...)
## S3 method for class 'ridge'
print(x, digits = max(5, getOption("digits") - 5), ...)
## S3 method for class 'ridge'
vcov(object, ...)
best(object, ...)
## S3 method for class 'ridge'
best(object, ...)
Arguments
y |
A numeric vector containing the response variable. NAs not allowed. |
... |
Other arguments, passed down to methods |
formula |
For the |
data |
For the |
lambda |
A scalar or vector of ridge constants. A value of 0 corresponds to ordinary least squares. |
df |
A scalar or vector of effective degrees of freedom corresponding
to |
svd |
If |
contrasts |
a list of contrasts to be used for some or all of factor terms in the formula.
See the |
X |
A matrix of predictor variables. NA's not allowed. Should not include a column of 1's for the intercept. |
x , object |
An object of class |
digits |
For the |
Details
If an intercept is present in the model, its coefficient is not penalized. (If you want to penalize an intercept, put in your own constant term and remove the intercept.)
The predictors are centered, but not (yet) scaled in this implementation.
A number of the methods in the package assume that lambda
is a vector of shrinkage constants
increasing from lambda[1] = 0
, or equivalently, a vector of df
decreasing from p
.
Value
A list with the following components:
lambda |
The vector of ridge constants |
df |
The vector of effective degrees of freedom corresponding to |
coef |
The matrix of estimated ridge regression coefficients |
scales |
scalings used on the X matrix |
kHKB |
HKB estimate of the ridge constant |
kLW |
L-W estimate of the ridge constant |
GCV |
vector of GCV values |
kGCV |
value of |
criteria |
Collects the criteria |
If svd==TRUE
(the default), the following are also included:
svd.D |
Singular values of the |
svd.U |
Left singular vectors of the |
svd.V |
Right singular vectors of the |
A data.frame with one row for each of the HKB, LW, and GCV criteria
Author(s)
Michael Friendly
References
Hoerl, A. E., Kennard, R. W., and Baldwin, K. F. (1975), "Ridge Regression: Some Simulations," Communications in Statistics, 4, 105-123.
Lawless, J.F., and Wang, P. (1976), "A Simulation Study of Ridge and Other Regression Estimators," Communications in Statistics, 5, 307-323.
Golub G.H., Heath M., Wahba G. (1979) Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21:215–223. doi:10.2307/1268518
See Also
lm.ridge
for other implementations of ridge regression
traceplot
, plot.ridge
,
pairs.ridge
, plot3d.ridge
, for 1D, 2D, 3D plotting methods
pca.ridge
, biplot.ridge
,
biplot.pcaridge
for views in PCA/SVD space
precision.ridge
for measures of shrinkage and precision
Examples
#\donttest{
# Longley data, using number Employed as response
longley.y <- longley[, "Employed"]
longley.X <- data.matrix(longley[, c(2:6,1)])
lambda <- c(0, 0.005, 0.01, 0.02, 0.04, 0.08)
lridge <- ridge(longley.y, longley.X, lambda=lambda)
# same, using formula interface
lridge <- ridge(Employed ~ GNP + Unemployed + Armed.Forces + Population + Year + GNP.deflator,
data=longley, lambda=lambda)
coef(lridge)
# standard trace plot
traceplot(lridge)
# plot vs. equivalent df
traceplot(lridge, X="df")
pairs(lridge, radius=0.5)
#}
data(prostate)
py <- prostate[, "lpsa"]
pX <- data.matrix(prostate[, 1:8])
pridge <- ridge(py, pX, df=8:1)
pridge
plot(pridge)
pairs(pridge)
traceplot(pridge)
traceplot(pridge, X="df")
# Hospital manpower data from Table 3.8 of Myers (1990)
data(Manpower)
str(Manpower)
mmod <- lm(Hours ~ ., data=Manpower)
vif(mmod)
# ridge regression models, specified in terms of equivalent df
mridge <- ridge(Hours ~ ., data=Manpower, df=seq(5, 3.75, -.25))
vif(mridge)
# univariate ridge trace plots
traceplot(mridge)
traceplot(mridge, X="df")
# bivariate ridge trace plots
plot(mridge, radius=0.25, labels=mridge$df)
pairs(mridge, radius=0.25)
# 3D views
# ellipsoids for Load, Xray & BedDays are nearly 2D
plot3d(mridge, radius=0.2, labels=mridge$df)
# variables in model selected by AIC & BIC
plot3d(mridge, variables=c(2,3,5), radius=0.2, labels=mridge$df)
# plots in PCA/SVD space
mpridge <- pca(mridge)
traceplot(mpridge, X="df")
biplot(mpridge, radius=0.25)
Univariate Ridge Trace Plots
Description
The traceplot
function extends and simplifies the univariate ridge
trace plots for ridge regression provided in the plot
method for
lm.ridge
Usage
traceplot(
x,
X = c("lambda", "df"),
col = c("black", "red", "darkgreen", "blue", "darkcyan", "magenta", "brown",
"darkgray"),
pch = c(15:18, 7, 9, 12, 13),
xlab,
ylab = "Coefficient",
xlim,
ylim,
...
)
Arguments
x |
A |
X |
What to plot as the horizontal coordinate, one of |
col |
A numeric or character vector giving the colors used to plot the ridge trace curves. Recycled as necessary. |
pch |
Vector of plotting characters used to plot the ridge trace curves. Recycled as necessary. |
xlab |
Label for horizontal axis |
ylab |
Label for vertical axis |
xlim , ylim |
x, y limits for the plot. You may need to adjust these to allow for the variable labels. |
... |
Other arguments passed to |
Details
For ease of interpretation, the variables are labeled at the side of the
plot (left, right) where the coefficient estimates are expected to be most
widely spread. If xlim
is not specified, the range of the X
variable is extended slightly to accommodate the variable names.
Value
None. Used for its side effect of plotting.
Author(s)
Michael Friendly
References
Friendly, M. (2013). The Generalized Ridge Trace Plot: Visualizing Bias and Precision. Journal of Computational and Graphical Statistics, 22(1), 50-68, doi:10.1080/10618600.2012.681237, https://www.datavis.ca/papers/genridge-jcgs.pdf
Hoerl, A. E. and Kennard R. W. (1970). "Ridge Regression: Applications to Nonorthogonal Problems", Technometrics, 12(1), 69-82.
See Also
ridge
for details on ridge regression as implemented here
plot.ridge
, pairs.ridge
for other plotting
methods
Examples
longley.y <- longley[, "Employed"]
longley.X <- data.matrix(longley[, c(2:6,1)])
lambda <- c(0, 0.005, 0.01, 0.02, 0.04, 0.08)
lridge <- ridge(longley.y, longley.X, lambda=lambda)
traceplot(lridge)
#abline(v=lridge$kLW, lty=3)
#abline(v=lridge$kHKB, lty=3)
#text(lridge$kLW, -3, "LW")
#text(lridge$kHKB, -3, "HKB")
traceplot(lridge, X="df")
Make Colors Transparent
Description
Takes a vector of colors (as color names or rgb hex values) and adds a specified alpha transparency to each.
Usage
trans.colors(col, alpha = 0.5, names = NULL)
Arguments
col |
A character vector of colors, either as color names or rgb hex values |
alpha |
alpha transparency value(s) to apply to each color (0 means fully transparent and 1 means opaque) |
names |
optional character vector of names for the colors |
Details
Colors (col
) and alpha
need not be of the same length. The
shorter one is replicated to make them of the same length.
Value
A vector of color values of the form "#rrggbbaa"
Author(s)
Michael Friendly
See Also
Examples
trans.colors(palette(), alpha=0.5)
# alpha can be vectorized
trans.colors(palette(), alpha=seq(0, 1, length=length(palette())))
# lengths need not match: shorter one is repeated as necessary
trans.colors(palette(), alpha=c(.1, .2))
trans.colors(colors()[1:20])
# single color, with various alphas
trans.colors("red", alpha=seq(0,1, length=5))
# assign names
trans.colors("red", alpha=seq(0,1, length=5), names=paste("red", 1:5, sep=""))
Variance Inflation Factors for Ridge Regression
Description
The function vif.ridge
calculates variance inflation factors for the
predictors in a set of ridge regression models indexed by the
tuning/shrinkage factor, returning one row for each value of the \lambda
parameter.
Variance inflation factors are calculated using the simplified formulation in Fox & Monette (1992).
The plot.vif.ridge
method plots variance inflation factors for a "vif.ridge"
object
in a similar style to what is provided by traceplot
. That is, it plots the VIF for each
coefficient in the model against either the ridge \lambda
tuning constant or it's equivalent
effective degrees of freedom.
Usage
## S3 method for class 'ridge'
vif(mod, ...)
## S3 method for class 'vif.ridge'
print(x, digits = max(4, getOption("digits") - 5), ...)
## S3 method for class 'vif.ridge'
plot(
x,
X = c("lambda", "df"),
Y = c("vif", "sqrt"),
col = c("black", "red", "darkgreen", "blue", "darkcyan", "magenta", "brown",
"darkgray"),
pch = c(15:18, 7, 9, 12, 13),
xlab,
ylab,
xlim,
ylim,
...
)
Arguments
mod |
A |
... |
Other arguments passed to methods |
x |
A |
digits |
Number of digits to display in the |
X |
What to plot as the horizontal coordinate, one of |
Y |
What to plot as the vertical coordinate, one of |
col |
A numeric or character vector giving the colors used to plot the ridge trace curves. Recycled as necessary. |
pch |
Vector of plotting characters used to plot the ridge trace curves. Recycled as necessary. |
xlab |
Label for horizontal axis |
ylab |
Label for vertical axis |
xlim , ylim |
x, y limits for the plot. You may need to adjust these to allow for the variable labels. |
Value
vif
returns a "vif.ridge"
object, which is a list of four components
vif |
a data frame of the same size and
shape as |
lambda |
the vector of ridge constants from the original call to |
df |
the vector of effective degrees of freedom corresponding to |
criteria |
the optimal values of |
Author(s)
Michael Friendly
References
Fox, J. and Monette, G. (1992). Generalized collinearity diagnostics. JASA, 87, 178-183, doi:10.1080/01621459.1992.10475190.
See Also
Examples
data(longley)
lmod <- lm(Employed ~ GNP + Unemployed + Armed.Forces + Population +
Year + GNP.deflator, data=longley)
vif(lmod)
lambda <- c(0, 0.005, 0.01, 0.02, 0.04, 0.08)
lridge <- ridge(Employed ~ GNP + Unemployed + Armed.Forces +
Population + Year + GNP.deflator,
data=longley, lambda=lambda)
coef(lridge)
# get VIFs for the shrunk estimates
vridge <- vif(lridge)
vridge
names(vridge)
# plot VIFs
pch <- c(15:18, 7, 9)
clr <- c("black", rainbow(5, start=.6, end=.1))
plot(vridge,
col=clr, pch=pch, cex = 1.2,
xlim = c(-0.02, 0.08))
plot(vridge, X = "df",
col=clr, pch=pch, cex = 1.2,
xlim = c(4, 6.5))
# Better to plot sqrt(VIF). Plot against degrees of freedom
plot(vridge, X = "df", Y="sqrt",
col=clr, pch=pch, cex = 1.2,
xlim = c(4, 6.5))