Help for package BayesPostEst

Type:

Package

Title:

Generate Postestimation Quantities for Bayesian MCMC Estimation

Version:

0.3.2

Date:

2021-11-10

Description:

An implementation of functions to generate and plot postestimation quantities after estimating Bayesian regression models using Markov chain Monte Carlo (MCMC). Functionality includes the estimation of the Precision-Recall curves (see Beger, 2016 <doi:10.2139/ssrn.2765419>), the implementation of the observed values method of calculating predicted probabilities by Hanmer and Kalkan (2013) <doi:10.1111/j.1540-5907.2012.00602.x>, the implementation of the average value method of calculating predicted probabilities (see King, Tomz, and Wittenberg, 2000 <doi:10.2307/2669316>), and the generation and plotting of first differences to summarize typical effects across covariates (see Long 1997, ISBN:9780803973749; King, Tomz, and Wittenberg, 2000 <doi:10.2307/2669316>). This package can be used with MCMC output generated by any Bayesian estimation tool including 'JAGS', 'BUGS', 'MCMCpack', and 'Stan'.

URL:

https://github.com/ShanaScogin/BayesPostEst

BugReports:

https://github.com/ShanaScogin/BayesPostEst/issues

License:

GPL-3

Imports:

carData, caTools, coda (≥ 0.13), dplyr (≥ 0.5.0), ggplot2, ggridges, reshape2, rlang, stats, texreg, tidyr (≥ 0.5.1), HDInterval, ROCR, graphics, grDevices, R2jags, runjags, rstanarm, rjags, MCMCpack, R2WinBUGS, brms

Depends:

R (≥ 3.5.0)

Encoding:

UTF-8

LazyData:

TRUE

LazyLoad:

TRUE

Suggests:

datasets, knitr, rmarkdown, rstan (≥ 2.10.1), testthat, covr

VignetteBuilder:

knitr

RoxygenNote:

7.1.2

SystemRequirements:

JAGS (http://mcmc-jags.sourceforge.io)

NeedsCompilation:

Packaged:

2021-11-10 20:56:50 UTC; shanascogin

Author:

Johannes Karreth

[aut], Shana Scogin

[aut, cre], Rob Williams

[aut], Andreas Beger

[aut], Myunghee Lee [ctb], Neil Williams [ctb]

Maintainer:

Shana Scogin <shanarscogin@gmail.com>

Repository:

CRAN

Date/Publication:

2021-11-11 08:10:05 UTC

BayesPostEst Overview

Description

This package currently has nine main functions that can be used to generate and plot postestimation quantities after estimating Bayesian regression models using MCMC. The package combines functions written originally for Johannes Karreth's workshop on Bayesian modeling at the ICPSR Summer program. Currently BayesPostEst focuses mostly on generalized linear regression models for binary outcomes (logistic and probit regression). The vignette for this package has a walk-through of each function in action. Please refer to that to get an overview of all the functions, or visit the documentation for a specific function of your choice. Johannes Karreth's website (http://www.jkarreth.net) also has resources for getting started with Bayesian analysis, fitting models, and presenting results.

Main Functions

mcmcAveProb()
mcmcObsProb()
mcmcFD()
mcmcMargEff()
mcmcRocPrc()
mcmcRocPrcGen()
mcmcTab()
mcmcReg()
plot.mcmcFD()

Deprecated functions in package BayesPostEst.

Description

The functions listed below are deprecated and will be defunct in the near future. When possible, alternative functions with similar functionality are also mentioned. Help pages for deprecated functions are available at help("-deprecated").

R function to plot first differences generated from MCMC output. For more on this method, see the documentation for mcmcFD(), Long (1997, Sage Publications), and King, Tomz, and Wittenberg (2000, American Journal of Political Science 44(2): 347-361). For a description of this type of plot, see Figure 1 in Karreth (2018, International Interactions 44(3): 463-90).

Usage

mcmcFDplot(fdfull, ROPE = NULL)

Arguments

fdfull

Output generated from mcmcFD(..., full_sims = TRUE).

ROPE

defaults to NULL. If not NULL, a numeric vector of length two, defining the Region of Practical Equivalence around 0. See Kruschke (2013, Journal of Experimental Psychology 143(2): 573-603) for more on the ROPE.

Value

a density plot of the differences in probabilities. The plot is made with ggplot2 and can be passed on as an object to customize. Annotated numbers show the percent of posterior draws with the same sign as the median estimate (if ROPE = NULL) or on the same side of the ROPE as the median estimate (if ROPE is specified).

`mcmcFDplot`

For mcmcFDplot, use plot.mcmcFD.

References

Karreth, Johannes. 2018. “The Economic Leverage of International Organizations in Interstate Disputes.” International Interactions 44 (3): 463-90. https://doi.org/10.1080/03050629.2018.1389728.
King, Gary, Michael Tomz, and Jason Wittenberg. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” American Journal of Political Science 44 (2): 347–61. http://www.jstor.org/stable/2669316.
Kruschke, John K. 2013. “Bayesian Estimation Supersedes the T-Test.” Journal of Experimental Psychology: General 142 (2): 573–603. https://doi.org/10.1037/a0029146.
Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks: Sage Publications.

Examples



if (interactive()) {
## simulating data
set.seed(1234)
b0 <- 0.2 # true value for the intercept
b1 <- 0.5 # true value for first beta
b2 <- 0.7 # true value for second beta
n <- 500 # sample size
X1 <- runif(n, -1, 1)
X2 <- runif(n, -1, 1)
Z <- b0 + b1 * X1 + b2 * X2
pr <- 1 / (1 + exp(-Z)) # inv logit function
Y <- rbinom(n, 1, pr) 
df <- data.frame(cbind(X1, X2, Y))

## formatting the data for jags
datjags <- as.list(df)
datjags$N <- length(datjags$Y)

## creating jags model
model <- function()  {
  
  for(i in 1:N){
    Y[i] ~ dbern(p[i])  ## Bernoulli distribution of y_i
    logit(p[i]) <- mu[i]    ## Logit link function
    mu[i] <- b[1] + 
      b[2] * X1[i] + 
      b[3] * X2[i]
  }
  
  for(j in 1:3){
    b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity
  }
  
}

params <- c("b")
inits1 <- list("b" = rep(0, 3))
inits2 <- list("b" = rep(0, 3))
inits <- list(inits1, inits2)

## fitting the model with R2jags
set.seed(123)
fit <- R2jags::jags(data = datjags, inits = inits, 
                    parameters.to.save = params, n.chains = 2, n.iter = 2000, 
                    n.burnin = 1000, model.file = model)

## preparing data for mcmcFD()
xmat <- model.matrix(Y ~ X1 + X2, data = df)
mcmc <- coda::as.mcmc(fit)
mcmc_mat <- as.matrix(mcmc)[, 1:ncol(xmat)]

## plotting with mcmcFDplot()
full <- mcmcFD(modelmatrix = xmat,
               mcmcout = mcmc_mat,
               fullsims = TRUE)
# suppress deprecated warning for R check
suppressWarnings(mcmcFDplot(full))

}

Compute ROC and PR curve points

Description

Faster replacements for calculating ROC and PR curve data than with ROCR::prediction() and ROCR::performance()

Usage

compute_roc(yvec, pvec)

compute_pr(yvec, pvec)

Details

Replacements to use instead of a combination of ROCR::prediction() and ROCR::performance() to calculate ROC and PR curves. These functions are about 10 to 20 times faster when using mcmcRocPrc() with curves = TRUE and/or fullsims = TRUE.

See this issue on GH (ShanaScogin/BayesPostEst#25) for more general details.

And here is a note with specific performance benchmarks, compared to the old approach relying on ROCR.

Try to identify link function

Description

Try to identify link function

Usage

identify_link_function(obj)

Arguments

obj

stanfit object

Value

Either "logit" or "probit"; if neither can be identified the function will return 'NA_character_'.

Try to identify if a stanfit model is a binary choice model

Description

Try to identify if a stanfit model is a binary choice model

Usage

is_binary_model(obj)

Arguments

obj

stanfit object

Fitted JAGS interactive linear model

Description

A fitted JAGS linear model with interaction term generated with [R2jags::jags()]. See the example code below for how it was created. Used in examples and for testing.

Usage

jags_interactive

Format

A class "rjags" object created by [R2jags::jags()]

Examples


if (interactive()) {
data("sim_data_interactive")

## formatting the data for jags
datjags <- as.list(sim_data_interactive)
datjags$N <- length(datjags$Y)

## creating jags model
model <- function()  {
  
  for(i in 1:N){
    Y[i] ~ dnorm(mu[i], sigma)  ## Bernoulli distribution of y_i
    
    mu[i] <- b[1] + 
      b[2] * X1[i] + 
      b[3] * X2[i] +
      b[4] * X1[i] * X2[i]
    
  }
  
  for(j in 1:4){
    b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity
  }
  
  sigma ~ dexp(1)
  
}

params <- c("b")
inits1 <- list("b" = rep(0, 4))
inits2 <- list("b" = rep(0, 4))
inits <- list(inits1, inits2)

## fitting the model with R2jags
set.seed(123)
jags_interactive <- R2jags::jags(data = datjags, inits = inits, 
                                 parameters.to.save = params, n.chains = 2,
                                 n.iter = 2000, n.burnin = 1000,
                                 model.file = model)
                                 
}

Fitted JAGS interactive linear model with categorical moderator

Description

A fitted JAGS linear model with interaction term generated with [R2jags::jags()]. See the example code below for how it was created. Used in examples and for testing.

Usage

jags_interactive_cat

Format

A class "rjags" object created by [R2jags::jags()]

Examples


if (interactive()) {
data("sim_data_interactive_cat")

## formatting the data for jags
datjags <- as.list(sim_data_interactive_cat)
datjags$N <- length(datjags$Y)

## creating jags model
model <- function()  {
  
  for(i in 1:N){
    Y[i] ~ dnorm(mu[i], sigma)  ## Bernoulli distribution of y_i
    
    mu[i] <- b[1] + 
      b[2] * X1[i] + 
      b[3] * X3[i] +
      b[4] * X1[i] * X3[i]
    
  }
  
  for(j in 1:4){
    b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity
  }
  
  sigma ~ dexp(1)
  
}

params <- c("b")
inits1 <- list("b" = rep(0, 4))
inits2 <- list("b" = rep(0, 4))
inits <- list(inits1, inits2)

## fitting the model with R2jags
set.seed(123)
jags_interactive_cat <- R2jags::jags(data = datjags, inits = inits,
                                     parameters.to.save = params, n.chains = 2,
                                     n.iter = 2000, n.burnin = 1000,
                                     model.file = model)
                                 
}

Fitted JAGS logit model

Description

A fitted JAGS logit model generated with [R2jags::jags()]. See the example code below for how it was created. Used in examples and for testing.

Usage

jags_logit

Format

A class "rjags" object created by [R2jags::jags()]

Examples


if (interactive()) {
data("sim_data")
  
## formatting the data for jags
datjags <- as.list(sim_data)
datjags$N <- length(datjags$Y)

## creating jags model
model <- function()  {

  for(i in 1:N){
    Y[i] ~ dbern(p[i])  ## Bernoulli distribution of y_i
      logit(p[i]) <- mu[i]    ## Logit link function
      mu[i] <- b[1] +
        b[2] * X1[i] +
        b[3] * X2[i]
  }

  for(j in 1:3){
    b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity
  }

}

params <- c("b")
inits1 <- list("b" = rep(0, 3))
inits2 <- list("b" = rep(0, 3))
inits <- list(inits1, inits2)

## fitting the model with R2jags
set.seed(123)
jags_logit <- R2jags::jags(data = datjags, inits = inits,
                         parameters.to.save = params, n.chains = 2, 
                         n.iter = 2000, n.burnin = 1000, model.file = model)

}

Fitted JAGS probit model

Description

A fitted JAGS probit model generated with [R2jags::jags()]. See the example code below for how it was created. Used in examples and for testing.

Usage

jags_probit

Format

A class "rjags" object created by [R2jags::jags()]

Examples


if (interactive()) {
data("sim_data")
  
## formatting the data for jags
datjags <- as.list(sim_data)
datjags$N <- length(datjags$Y)

## creating jags model
model <- function()  {

  for(i in 1:N){
    Y[i] ~ dbern(p[i])  ## Bernoulli distribution of y_i
      probit(p[i]) <- mu[i]    ## Update with probit link function
      mu[i] <- b[1] +
        b[2] * X1[i] +
        b[3] * X2[i]
  }

  for(j in 1:3){
    b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity
  }

}

params <- c("b")
inits1 <- list("b" = rep(0, 3))
inits2 <- list("b" = rep(0, 3))
inits <- list(inits1, inits2)

## fitting the model with R2jags
set.seed(123)
jags_probit <- R2jags::jags(data = datjags, inits = inits,
                         parameters.to.save = params, n.chains = 2, 
                         n.iter = 2000, n.burnin = 1000, model.file = model)

}

Predicted Probabilities using Bayesian MCMC estimates for the "Average" Case

Description

This function calculates predicted probabilities for "average" cases after a Bayesian logit or probit model. As "average" cases, this function calculates the median value of each predictor. For an explanation of predicted probabilities for "average" cases, see e.g. King, Tomz & Wittenberg (2000, American Journal of Political Science 44(2): 347-361).

Usage

mcmcAveProb(
  modelmatrix,
  mcmcout,
  xcol,
  xrange,
  xinterest,
  link = "logit",
  ci = c(0.025, 0.975),
  fullsims = FALSE
)

Arguments

modelmatrix

model matrix, including intercept (if the intercept is among the parameters estimated in the model). Create with model.matrix(formula, data). Note: the order of columns in the model matrix must correspond to the order of columns in the matrix of posterior draws in the mcmcout argument. See the mcmcout argument for more.

mcmcout

posterior distributions of all logit coefficients, in matrix form. This can be created from rstan, MCMCpack, R2jags, etc. and transformed into a matrix using the function as.mcmc() from the coda package for jags class objects, as.matrix() from base R for mcmc, mcmc.list, stanreg, and stanfit class objects, and object$sims.matrix for bugs class objects. Note: the order of columns in this matrix must correspond to the order of columns in the model matrix. One can do this by examining the posterior distribution matrix and sorting the variables in the order of this matrix when creating the model matrix. A useful function for sorting column names containing both characters and numbers as you create the matrix of posterior distributions is mixedsort() from the gtools package.

xcol

column number of the posterior draws (mcmcout) and model matrices that corresponds to the explanatory variable for which to calculate associated Pr(y = 1). Note that the columns in these matrices must match.

xrange

name of the vector with the range of relevant values of the explanatory variable for which to calculate associated Pr(y = 1).

xinterest

semi-optional argument. Name of the explanatory variable for which to calculate associated Pr(y = 1). If xcol is supplied, this is not needed. If both are supplied, the function defaults to xcol and this argument is ignored.

link

type of generalized linear model; a character vector set to "logit" (default) or "probit".

ci

the bounds of the credible interval. Default is c(0.025, 0.975) for the 95% credible interval.

fullsims

logical indicator of whether full object (based on all MCMC draws rather than their average) will be returned. Default is FALSE. Note: The longer xrange is, the larger the full output will be if TRUE is selected.

Details

This function calculates predicted probabilities for "average" cases after a Bayesian logit or probit model. For an explanation of predicted probabilities for "average" cases, see e.g. King, Tomz & Wittenberg (2000, American Journal of Political Science 44(2): 347-361)

Value

if fullsims = FALSE (default), a tibble with 4 columns:

x: value of variable of interest, drawn from xrange
median_pp: median predicted Pr(y = 1) when variable of interest is set to x, holding all other predictors to average (median) values
lower_pp: lower bound of credible interval of predicted probability at given x
upper_pp: upper bound of credible interval of predicted probability at given x

if fullsims = TRUE, a tibble with 3 columns:

Iteration: number of the posterior draw
x: value of variable of interest, drawn from xrange
pp: average predicted Pr(y = 1) when variable of interest is set to x, holding all other predictors to average (median) values

References

King, Gary, Michael Tomz, and Jason Wittenberg. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” American Journal of Political Science 44 (2): 347–61. http://www.jstor.org/stable/2669316

Examples



if (interactive()) {
  ## simulating data
  set.seed(123)
  b0 <- 0.2 # true value for the intercept
  b1 <- 0.5 # true value for first beta
  b2 <- 0.7 # true value for second beta
  n <- 500 # sample size
  X1 <- runif(n, -1, 1)
  X2 <- runif(n, -1, 1)
  Z <- b0 + b1 * X1 + b2 * X2
  pr <- 1 / (1 + exp(-Z)) # inv logit function
  Y <- rbinom(n, 1, pr) 
  df <- data.frame(cbind(X1, X2, Y))
  
  ## formatting the data for jags
  datjags <- as.list(df)
  datjags$N <- length(datjags$Y)
  
  ## creating jags model
  model <- function()  {
  
  for(i in 1:N){
    Y[i] ~ dbern(p[i])  ## Bernoulli distribution of y_i
    logit(p[i]) <- mu[i]    ## Logit link function
    mu[i] <- b[1] + 
      b[2] * X1[i] + 
      b[3] * X2[i]
  }
  
  for(j in 1:3){
    b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity
  }
  
}

params <- c("b")
inits1 <- list("b" = rep(0, 3))
inits2 <- list("b" = rep(0, 3))
inits <- list(inits1, inits2)

## fitting the model with R2jags
library(R2jags)
set.seed(123)
fit <- jags(data = datjags, inits = inits, 
         parameters.to.save = params, n.chains = 2, n.iter = 2000, 
         n.burnin = 1000, model.file = model)

### average value approach
library(coda)
xmat <- model.matrix(Y ~ X1 + X2, data = df)
mcmc <- as.mcmc(fit)
mcmc_mat <- as.matrix(mcmc)[, 1:ncol(xmat)]
X1_sim <- seq(from = min(datjags$X1),
              to = max(datjags$X1), 
              length.out = 10)
ave_prob <- mcmcAveProb(modelmatrix = xmat,
                        mcmcout = mcmc_mat,
                        xrange = X1_sim, 
                        xcol = 2)
}

Coefficient Plots for MCMC Output

Description

Coefficient plots for MCMC output using ggplot2

Usage

mcmcCoefPlot(
  mod,
  pars = NULL,
  pointest = "mean",
  ci = 0.95,
  hpdi = FALSE,
  sort = FALSE,
  plot = TRUE,
  regex = FALSE
)

Arguments

mod

Bayesian model object generated by R2jags, rjags, R2WinBUGS, R2OpenBUGS, MCMCpack, rstan, rstanarm, and brms.

pars

a scalar or vector of the parameters you wish to include in the table. By default, mcmcCoefPlot includes all parameters saved in a model object. If a model has lots of samples and lots of saved parameters, not explicitly specifying a limited number of parameters to include via pars may take a long time or produce an unreadable plot. pars can either be a vector with the specific parameters to be included in the table e.g. pars = c("beta[1]", "beta[2]", "beta[3]"), or they can be partial names that will be matched using regular expressions e.g. pars = "beta" if regex = TRUE. Both of these will include beta[1], beta[2], and beta[3] in the plot. If pars is left blank, mcmcCoefPlot will exclude auxiliary parameters such as deviance from JAGS or lp__ from Stan.

pointest

a character indicating whether to use the mean or median for point estimates in the table.

ci

a scalar indicating the confidence level of the uncertainty intervals.

hpdi

a logical indicating whether to use highest posterior density intervals or equal tailed credible intervals to capture uncertainty; default FALSE.

sort

logical indicating whether to sort the point estimates to produce a caterpillar or dot plot; default FALSE.

plot

logical indicating whether to return a ggplot object or the underlying tidy DataFrame; default TRUE.

regex

use regular expression matching with pars?

Value

a ggplot object or a tidy DataFrame.

Author(s)

Rob Williams, jayrobwilliams@gmail.com

Examples



if (interactive()) {
## simulating data
set.seed(123456)
b0 <- 0.2 # true value for the intercept
b1 <- 0.5 # true value for first beta
b2 <- 0.7 # true value for second beta
n <- 500 # sample size
X1 <- runif(n, -1, 1)
X2 <- runif(n, -1, 1)
Z <- b0 + b1 * X1 + b2 * X2
pr <- 1 / (1 + exp(-Z)) # inv logit function
Y <- rbinom(n, 1, pr)
df <- data.frame(cbind(X1, X2, Y))

## formatting the data for jags
datjags <- as.list(df)
datjags$N <- length(datjags$Y)

## creating jags model
model <- function()  {
  
  for(i in 1:N){
    Y[i] ~ dbern(p[i])  ## Bernoulli distribution of y_i
    logit(p[i]) <- mu[i]    ## Logit link function
    mu[i] <- b[1] +
      b[2] * X1[i] +
      b[3] * X2[i]
  }
  
  for(j in 1:3){
    b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity
  }
}

params <- c("b")
inits1 <- list("b" = rep(0, 3))
inits2 <- list("b" = rep(0, 3))
inits <- list(inits1, inits2)

## fitting the model with R2jags
set.seed(123)
fit <- R2jags::jags(data = datjags, inits = inits,
                    parameters.to.save = params, n.chains = 2, n.iter = 2000,
                    n.burnin = 1000, model.file = model)

## generating coefficient plot with all non-auxiliary parameters
mcmcCoefPlot(fit)
}

First Differences of a Bayesian Logit or Probit model

Description

R function to calculate first differences after a Bayesian logit or probit model. First differences are a method to summarize effects across covariates. This quantity represents the difference in predicted probabilities for each covariate for cases with low and high values of the respective covariate. For each of these differences, all other variables are held constant at their median. For more, see Long (1997, Sage Publications) and King, Tomz, and Wittenberg (2000, American Journal of Political Science 44(2): 347-361).

Usage

mcmcFD(
  modelmatrix,
  mcmcout,
  link = "logit",
  ci = c(0.025, 0.975),
  percentiles = c(0.25, 0.75),
  fullsims = FALSE
)

Arguments

modelmatrix

mcmcout

link

type of generalized linear model; a character vector set to "logit" (default) or "probit".

ci

the bounds of the credible interval. Default is c(0.025, 0.975) for the 95% credible interval.

percentiles

values of each predictor for which the difference in Pr(y = 1) is to be calculated. Default is c(0.25, 0.75), which will calculate the difference between Pr(y = 1) for the 25th percentile and 75th percentile of the predictor. For binary predictors, the function automatically calculates the difference between Pr(y = 1) for x = 0 and x = 1.

fullsims

logical indicator of whether full object (based on all MCMC draws rather than their average) will be returned. Default is FALSE.

Value

An object of class mcmcFD. If fullsims = FALSE (default), a data frame with five columns:

median_fd: median first difference
lower_fd: lower bound of credible interval of the first difference
upper_fd: upper bound of credible interval of the first difference
VarName: name of the variable as found in modelmatrix
VarID: identifier of the variable, based on the order of columns in modelmatrix and mcmcout. Can be adjusted for plotting

If fullsims = TRUE, a matrix with as many columns as predictors in the model. Each row is the first difference for that variable based on one set of posterior draws. Column names are taken from the column names of modelmatrix.

References

King, Gary, Michael Tomz, and Jason Wittenberg. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” American Journal of Political Science 44 (2): 347–61. http://www.jstor.org/stable/2669316
Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks: Sage Publications

Examples



if (interactive()) {
## simulating data
set.seed(1234)
b0 <- 0.2 # true value for the intercept
b1 <- 0.5 # true value for first beta
b2 <- 0.7 # true value for second beta
n <- 500 # sample size
X1 <- runif(n, -1, 1)
X2 <- runif(n, -1, 1)
Z <- b0 + b1 * X1 + b2 * X2
pr <- 1 / (1 + exp(-Z)) # inv logit function
Y <- rbinom(n, 1, pr) 
df <- data.frame(cbind(X1, X2, Y))

## formatting the data for jags
datjags <- as.list(df)
datjags$N <- length(datjags$Y)

## creating jags model
model <- function()  {
  
  for(i in 1:N){
    Y[i] ~ dbern(p[i])  ## Bernoulli distribution of y_i
    logit(p[i]) <- mu[i]    ## Logit link function
    mu[i] <- b[1] + 
      b[2] * X1[i] + 
      b[3] * X2[i]
  }
  
  for(j in 1:3){
    b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity
  }
  
}

params <- c("b")
inits1 <- list("b" = rep(0, 3))
inits2 <- list("b" = rep(0, 3))
inits <- list(inits1, inits2)

## fitting the model with R2jags
set.seed(123)
fit <- R2jags::jags(data = datjags, inits = inits, 
                    parameters.to.save = params, n.chains = 2, n.iter = 2000, 
                    n.burnin = 1000, model.file = model)

## running function with logit
xmat <- model.matrix(Y ~ X1 + X2, data = df)
mcmc <- coda::as.mcmc(fit)
mcmc_mat <- as.matrix(mcmc)[, 1:ncol(xmat)]
object <- mcmcFD(modelmatrix = xmat,
                 mcmcout = mcmc_mat)
object
}

Marginal Effects Plots for MCMC Output

Description

Marginal effects plots for MCMC output using ggplot2

Usage

mcmcMargEff(
  mod,
  main,
  int,
  moderator,
  pointest = "mean",
  seq = 100,
  ci = 0.95,
  hpdi = FALSE,
  plot = TRUE,
  xlab = "Moderator",
  ylab = "Marginal Effect"
)

Arguments

mod

Bayesian model object generated by R2jags, rjags, R2WinBUGS, R2OpenBUGS, MCMCpack, rstan, rstanarm, and brms.

main

a character with the name of the parameter of interest in the interaction term.

int

a character with the name of the moderating parameter in the interaction term.

moderator

a vector of values that the moderating parameter takes on in the data.

pointest

a character indicating whether to use the mean or median for point estimates in the plot.

seq

a numeric giving the number of moderator values used to generate the marginal effects plot.

ci

a scalar indicating the confidence level of the uncertainty intervals.

hpdi

a logical indicating whether to use highest posterior density intervals or equal tailed credible intervals to capture uncertainty.

plot

logical indicating whether to return a ggplot object or the underlying tidy DataFrame. By default, mcmcMargEff returns a line and ribbon plot for continuous variables, and a dot and line plot for factor variables and discrete variables with fewer than 25 unique values.

xlab

character giving x axis label if plot = TRUE, default "Moderator"

ylab

character giving y axis label if plot = TRUE, default "Marginal Effect"

Value

a ggplot object or a tidy DataFrame.

Author(s)

Rob Williams, jayrobwilliams@gmail.com

Examples



if (interactive()) {
## simulating data
set.seed(123456)
b0 <- 0.2 # true value for the intercept
b1 <- 0.5 # true value for first beta
b2 <- 0.7 # true value for second beta
n <- 500 # sample size
X1 <- runif(n, -1, 1)
X2 <- runif(n, -1, 1)
Z <- b0 + b1 * X1 + b2 * X2

## linear model data
Y_linear <- rnorm(n, Z, 1)
df <- data.frame(cbind(X1, X2, Y = Y_linear))

## formatting the data for jags
datjags <- as.list(df)
datjags$N <- length(datjags$Y)

## creating jags model
model <- function()  {
  
  for(i in 1:N){
    Y[i] ~ dnorm(mu[i], sigma)  ## Bernoulli distribution of y_i
    
    mu[i] <- b[1] + 
      b[2] * X1[i] + 
      b[3] * X2[i] +
      b[4] * X1[i] * X2[i]
    
  }
  
  for(j in 1:4){
    b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity
  }
  
  sigma ~ dexp(1)
  
}

params <- c("b")
inits1 <- list("b" = rep(0, 4))
inits2 <- list("b" = rep(0, 4))
inits <- list(inits1, inits2)

## fitting the model with R2jags
set.seed(123)
fit <- R2jags::jags(data = datjags, inits = inits, 
                    parameters.to.save = params, n.chains = 2, n.iter = 2000, 
                    n.burnin = 1000, model.file = model)

mcmcMargEff(mod = fit,
            main = 'b[2]',
            int = 'b[4]',
            moderator = sim_data_interactive$X2,
            plot = TRUE)
}

Predicted Probabilities using Bayesian MCMC estimates for the Average of Observed Cases

Description

Implements R function to calculate the predicted probabilities for "observed" cases after a Bayesian logit or probit model, following Hanmer & Kalkan (2013) (2013, American Journal of Political Science 57(1): 263-277).

Usage

mcmcObsProb(
  modelmatrix,
  mcmcout,
  xcol,
  xrange,
  xinterest,
  link = "logit",
  ci = c(0.025, 0.975),
  fullsims = FALSE
)

Arguments

modelmatrix

mcmcout

xcol

xrange

name of the vector with the range of relevant values of the explanatory variable for which to calculate associated Pr(y = 1).

xinterest

link

type of generalized linear model; a character vector set to "logit" (default) or "probit".

ci

the bounds of the credible interval. Default is c(0.025, 0.975) for the 95% credible interval.

fullsims

Details

This function calculates predicted probabilities for "observed" cases after a Bayesian logit or probit model following Hanmer and Kalkan (2013, American Journal of Political Science 57(1): 263-277)

Value

if fullsims = FALSE (default), a tibble with 4 columns:

x: value of variable of interest, drawn from xrange
median_pp: median predicted Pr(y = 1) when variable of interest is set to x
lower_pp: lower bound of credible interval of predicted probability at given x
upper_pp: upper bound of credible interval of predicted probability at given x

if fullsims = TRUE, a tibble with 3 columns:

Iteration: number of the posterior draw
x: value of variable of interest, drawn from xrange
pp: average predicted Pr(y = 1) of all observed cases when variable of interest is set to x

References

Hanmer, Michael J., & Ozan Kalkan, K. (2013). Behind the curve: Clarifying the best approach to calculating predicted probabilities and marginal effects from limited dependent variable models. American Journal of Political Science, 57(1), 263-277. https://doi.org/10.1111/j.1540-5907.2012.00602.x

Examples



if (interactive()) {
  ## simulating data
  set.seed(12345)
  b0 <- 0.2 # true value for the intercept
  b1 <- 0.5 # true value for first beta
  b2 <- 0.7 # true value for second beta
  n <- 500 # sample size
  X1 <- runif(n, -1, 1)
  X2 <- runif(n, -1, 1)
  Z <- b0 + b1 * X1 + b2 * X2
  pr <- 1 / (1 + exp(-Z)) # inv logit function
  Y <- rbinom(n, 1, pr) 
  df <- data.frame(cbind(X1, X2, Y))
  
  ## formatting the data for jags
  datjags <- as.list(df)
  datjags$N <- length(datjags$Y)
  
  ## creating jags model
  model <- function()  {
  
  for(i in 1:N){
    Y[i] ~ dbern(p[i])  ## Bernoulli distribution of y_i
    logit(p[i]) <- mu[i]    ## Logit link function
    mu[i] <- b[1] + 
      b[2] * X1[i] + 
      b[3] * X2[i]
  }
  
  for(j in 1:3){
    b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity
  }
  
}

params <- c("b")
inits1 <- list("b" = rep(0, 3))
inits2 <- list("b" = rep(0, 3))
inits <- list(inits1, inits2)

## fitting the model with R2jags
library(R2jags)
set.seed(123)
fit <- jags(data = datjags, inits = inits, 
          parameters.to.save = params, n.chains = 2, n.iter = 2000, 
          n.burnin = 1000, model.file = model)

### observed value approach
library(coda)
xmat <- model.matrix(Y ~ X1 + X2, data = df)
mcmc <- as.mcmc(fit)
mcmc_mat <- as.matrix(mcmc)[, 1:ncol(xmat)]
X1_sim <- seq(from = min(datjags$X1),
              to = max(datjags$X1), 
              length.out = 10)
obs_prob <- mcmcObsProb(modelmatrix = xmat,
                        mcmcout = mcmc_mat,
                        xrange = X1_sim,
                        xcol = 2)
}

LaTeX or HTML regression tables for MCMC Output

Description

This function creates LaTeX or HTML regression tables for MCMC Output using the texreg function from the texreg R package.

Usage

mcmcReg(
  mod,
  pars = NULL,
  pointest = "mean",
  ci = 0.95,
  hpdi = FALSE,
  sd = FALSE,
  pr = FALSE,
  coefnames = NULL,
  gof = numeric(0),
  gofnames = character(0),
  format = "latex",
  file,
  regex = FALSE,
  ...
)

Arguments

mod

Bayesian model object generated by R2jags, rjags, R2WinBUGS, R2OpenBUGS, MCMCpack, rstan, rstanarm, and brms, or a list of model objects of the same class.

pars

a scalar or vector of the parameters you wish to include in the table. By default, mcmcReg includes all parameters saved in a model object. If a model has lots of samples and lots of saved parameters, not explicitly specifying a limited number of parameters to include via pars may take a long time. pars can either be a vector with the specific parameters to be included in the table e.g. pars = c("beta[1]", "beta[2]", "beta[3]"), or they can be partial names that will be matched using regular expressions e.g. pars = "beta" if regex = TRUE. Both of these will include beta[1], beta[2], and beta[3] in the table. When combining models with different parameters in one table, this argument also accepts a list the length of the number of models.

pointest

a character indicating whether to use the mean or median for point estimates in the table.

ci

a scalar indicating the confidence level of the uncertainty intervals.

hpdi

a logical indicating whether to use highest posterior density intervals instead of equal tailed credible intervals to capture uncertainty (default FALSE).

sd

a logical indicating whether to report the standard deviation of posterior distributions instead of an uncertainty interval (default FALSE). If TRUE, overrides ci, hpdi, and pr.

pr

a logical indicating whether to report the probability that a coefficient is in the same direction as the point estimate for that coefficient (default FALSE). If TRUE, overrides ci and hpdi.

coefnames

an optional vector or list of vectors containing parameter names for each model. If there are multiple models, the list must have the same number of elements as there are models, and the vector of names in each list element must match the number of parameters. If not supplied, the function will use the parameter names in the model object(s). Note that this replaces the standard custom.coef.names argument in texreg because there is no extract method for MCMC model objects, and many MCMC model objects do not have unique parameter names.

gof

a named list of goodness of fit statistics, or a list of such lists.

gofnames

an optional vector or list of vectors containing goodness of fit statistic names for each model. Like coefnames in this function (which replaces the custom.coef.names argument in texreg), gofnames replaces the standard custom.gof.names argument in texreg. If there are multiple models, the list must have the same number of elements as there are models, and the vector of names in each list element must match the number of goodness of fit statistics.

format

a character indicating latex or html output.

file

optional file name to write table to file instead of printing to console.

regex

use regular expression matching with pars?

...

optional arguments to texreg.

Details

If using custom.coef.map with more than one model, you should rename the parameters in the model objects to ensure that different parameters with the same subscript are not conflated by texreg e.g. beta[1] could represent age in one model and income in another, and texreg would combine the two if you do not rename beta[1] to more informative names in the model objects.

If mod is a brmsfit object or list of brmsfit objects, note that the default brms names for coefficients are b_Intercept and b, so both of these should be included in par if you wish to include the intercept in the table.

Value

A formatted regression table in LaTeX or HTML format.

Author(s)

Rob Williams, jayrobwilliams@gmail.com

Examples



if (interactive()) {
## simulating data
set.seed(123456)
b0 <- 0.2 # true value for the intercept
b1 <- 0.5 # true value for first beta
b2 <- 0.7 # true value for second beta
n <- 500 # sample size
X1 <- runif(n, -1, 1)
X2 <- runif(n, -1, 1)
Z <- b0 + b1 * X1 + b2 * X2
pr <- 1 / (1 + exp(-Z)) # inv logit function
Y <- rbinom(n, 1, pr)
df <- data.frame(cbind(X1, X2, Y))

## formatting the data for jags
datjags <- as.list(df)
datjags$N <- length(datjags$Y)

## creating jags model
model <- function()  {

  for(i in 1:N){
    Y[i] ~ dbern(p[i])  ## Bernoulli distribution of y_i
    logit(p[i]) <- mu[i]    ## Logit link function
    mu[i] <- b[1] +
      b[2] * X1[i] +
      b[3] * X2[i]
  }

  for(j in 1:3){
    b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity
  }

}

params <- c("b")
inits1 <- list("b" = rep(0, 3))
inits2 <- list("b" = rep(0, 3))
inits <- list(inits1, inits2)

## fitting the model with R2jags
set.seed(123)
fit <- R2jags::jags(data = datjags, inits = inits,
                    parameters.to.save = params,
                    n.chains = 2,
                    n.iter = 2000, n.burnin = 1000,
                    model.file = model)

## generating regression table with all parameters
mcmcReg(fit)

## generating regression table with only betas and custom coefficent names
mcmcReg(fit, pars = c('b'), coefnames = c('Variable 1',
                                          'Variable 2',
                                          'Variable 3'),
        regex = TRUE)
## generating regression tables with all betas and custom names
mcmcReg(fit, coefnames = c('Variable 1', 'Variable 2',
                           'Variable 3', 'deviance'))
}

ROC and Precision-Recall Curves using Bayesian MCMC estimates generalized

Description

This function generates ROC and Precision-Recall curves after fitting a Bayesian logit or probit regression. For fast calculation for from an "rjags" object use mcmcRocPrc

Usage

mcmcRocPrcGen(
  modelmatrix,
  mcmcout,
  modelframe,
  curves = FALSE,
  link = "logit",
  fullsims = FALSE
)

Arguments

modelmatrix

mcmcout

modelframe

model frame in matrix form. Can be created using as.matrix(model.frame(formula, data))

curves

logical indicator of whether or not to return values to plot the ROC or Precision-Recall curves. If set to FALSE (default), results are returned as a list without the extra values.

link

type of generalized linear model; a character vector set to "logit" (default) or "probit".

fullsims

logical indicator of whether full object (based on all MCMC draws rather than their average) will be returned. Default is FALSE. Note: If TRUE is chosen, the function takes notably longer to execute.

Details

This function generates ROC and precision-recall curves after fitting a Bayesian logit or probit model.

Value

This function returns a list with 4 elements:

area_under_roc: area under ROC curve (scalar)
area_under_prc: area under precision-recall curve (scalar)
prc_dat: data to plot precision-recall curve (data frame)
roc_dat: data to plot ROC curve (data frame)

References

Beger, Andreas. 2016. “Precision-Recall Curves.” Available at SSRN: https://ssrn.com/Abstract=2765419. http://dx.doi.org/10.2139/ssrn.2765419.

Examples



if (interactive()) {
# simulating data

set.seed(123456)
b0 <- 0.2 # true value for the intercept
b1 <- 0.5 # true value for first beta
b2 <- 0.7 # true value for second beta
n <- 500 # sample size
X1 <- runif(n, -1, 1)
X2 <- runif(n, -1, 1)
Z <- b0 + b1 * X1 + b2 * X2
pr <- 1 / (1 + exp(-Z)) # inv logit function
Y <- rbinom(n, 1, pr) 
df <- data.frame(cbind(X1, X2, Y))

# formatting the data for jags
datjags <- as.list(df)
datjags$N <- length(datjags$Y)

# creating jags model
model <- function()  {
  
  for(i in 1:N){
    Y[i] ~ dbern(p[i])  ## Bernoulli distribution of y_i
    logit(p[i]) <- mu[i]    ## Logit link function
    mu[i] <- b[1] + 
      b[2] * X1[i] + 
      b[3] * X2[i]
  }
  
  for(j in 1:3){
    b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity
  }
  
}

params <- c("b")
inits1 <- list("b" = rep(0, 3))
inits2 <- list("b" = rep(0, 3))
inits <- list(inits1, inits2)

## fitting the model with R2jags
set.seed(123)
fit <- R2jags::jags(data = datjags, inits = inits, 
                    parameters.to.save = params, n.chains = 2, n.iter = 2000, 
                    n.burnin = 1000, model.file = model)

# processing the data
mm <- model.matrix(Y ~ X1 + X2, data = df)
xframe <- as.matrix(model.frame(Y ~ X1 + X2, data = df))
mcmc <- coda::as.mcmc(fit)
mcmc_mat <- as.matrix(mcmc)[, 1:ncol(xframe)]

# using mcmcRocPrcGen
fit_sum <- mcmcRocPrcGen(modelmatrix = mm,
                      modelframe = xframe,
                      mcmcout = mcmc_mat,
                      curves = TRUE,
                      fullsims = FALSE)
}

Summarize Bayesian MCMC Output R function for summarizing MCMC output in a regression-style table.

Description

Summarize Bayesian MCMC Output

R function for summarizing MCMC output in a regression-style table.

Usage

mcmcTab(
  sims,
  ci = c(0.025, 0.975),
  pars = NULL,
  Pr = FALSE,
  ROPE = NULL,
  regex = FALSE
)

Arguments

sims

Bayesian model object generated by R2jags, rjags, R2WinBUGS, R2OpenBUGS, MCMCpack, rstan, and rstanarm.

ci

desired level for credible intervals; defaults to c(0.025, 0.975).

pars

character vector of parameters to be printed; defaults to NULL (all parameters are printed). If not NULL, the user can either specify the exact names of parameters to be printed (e.g. c("alpha", "beta1", "beta2")) or part of a name so that all parameters containing that name will be printed (e.g. "beta" will print beta1, beta2, etc.).

Pr

print percent of posterior draws with same sign as median; defaults to FALSE.

ROPE

defaults to NULL. If not NULL, a vector of two values defining the region of practical equivalence ("ROPE"); returns % of posterior draws to the left/right of ROPE. For this quantity to be meaningful, all parameters must be on the same scale (e.g. standardized coefficients or first differences). See Kruschke (2013, Journal of Experimental Psychology 143(2): 573-603) for more on the ROPE.

regex

use regular expression matching with pars?

Value

a data frame containing MCMC summary statistics.

References

Kruschke, John K. 2013. “Bayesian Estimation Supersedes the T-Test.” Journal of Experimental Psychology: General 142 (2): 573–603. https://doi.org/10.1037/a0029146.

Examples



if (interactive()) {
data("jags_logit")

## printing out table
object <- mcmcTab(jags_logit, 
          ci = c(0.025, 0.975), 
          pars = NULL, 
          Pr = FALSE,
          ROPE = NULL)
object
}

Constructor for mcmcRocPrc objects

Description

This function actually does the heavy lifting once we have a matrix of predicted probabilities from a model, plus the vector of observed outcomes. The reason to have it here in a single function is that we don't replicate it in each function that accomodates a JAGS, BUGS, RStan, etc. object.

Usage

new_mcmcRocPrc(pred_prob, yvec, curves, fullsims)

Arguments

pred_prob

a ⁠\[N, iter\]⁠ matrix of predicted probabilities

yvec

a numeric(N) vector of observed outcomes

curves

include curve data in output?

fullsims

collapse posterior samples into single summary?

Plot Method for First Differences from MCMC output

Description

The plot method for first differences generated from MCMC output by mcmcFD. For more on this method, see Long (1997, Sage Publications), and King, Tomz, and Wittenberg (2000, American Journal of Political Science 44(2): 347-361). For a description of this type of plot, see Figure 1 in Karreth (2018, International Interactions 44(3): 463-90).

Usage

## S3 method for class 'mcmcFD'
plot(x, ROPE = NULL, ...)

Arguments

x

Output generated from mcmcFD(..., full_sims = TRUE).

ROPE

...

optional arguments to theme from ggplot2.

Value

References

Karreth, Johannes. 2018. “The Economic Leverage of International Organizations in Interstate Disputes.” International Interactions 44 (3): 463-90. https://doi.org/10.1080/03050629.2018.1389728.
King, Gary, Michael Tomz, and Jason Wittenberg. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” American Journal of Political Science 44 (2): 347–61. http://www.jstor.org/stable/2669316.
Kruschke, John K. 2013. “Bayesian Estimation Supersedes the T-Test.” Journal of Experimental Psychology: General 142 (2): 573–603. https://doi.org/10.1037/a0029146.
Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks: Sage Publications.

Examples



if (interactive()) {
## simulating data
set.seed(1234)
b0 <- 0.2 # true value for the intercept
b1 <- 0.5 # true value for first beta
b2 <- 0.7 # true value for second beta
n <- 500 # sample size
X1 <- runif(n, -1, 1)
X2 <- runif(n, -1, 1)
Z <- b0 + b1 * X1 + b2 * X2
pr <- 1 / (1 + exp(-Z)) # inv logit function
Y <- rbinom(n, 1, pr) 
df <- data.frame(cbind(X1, X2, Y))

## formatting the data for jags
datjags <- as.list(df)
datjags$N <- length(datjags$Y)

## creating jags model
model <- function()  {
  
  for(i in 1:N){
    Y[i] ~ dbern(p[i])  ## Bernoulli distribution of y_i
    logit(p[i]) <- mu[i]    ## Logit link function
    mu[i] <- b[1] + 
      b[2] * X1[i] + 
      b[3] * X2[i]
  }
  
  for(j in 1:3){
    b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity
  }
  
}

params <- c("b")
inits1 <- list("b" = rep(0, 3))
inits2 <- list("b" = rep(0, 3))
inits <- list(inits1, inits2)

## fitting the model with R2jags
set.seed(123)
fit <- R2jags::jags(data = datjags, inits = inits, 
                    parameters.to.save = params, n.chains = 2, n.iter = 2000, 
                    n.burnin = 1000, model.file = model)

## preparing data for mcmcFD()
xmat <- model.matrix(Y ~ X1 + X2, data = df)
mcmc <- coda::as.mcmc(fit)
mcmc_mat <- as.matrix(mcmc)[, 1:ncol(xmat)]

## plotting with mcmcFDplot()
full <- mcmcFD(modelmatrix = xmat,
               mcmcout = mcmc_mat,
               fullsims = TRUE)
plot(full)

}

ROC and Precision-Recall Curves using Bayesian MCMC estimates

Description

Generate ROC and Precision-Recall curves after fitting a Bayesian logit or probit regression using rstan::stan(), rstanarm::stan_glm(), R2jags::jags(), R2WinBUGS::bugs(), MCMCpack::MCMClogit(), or other functions that provide samples from a posterior density.

Usage

## S3 method for class 'mcmcRocPrc'
print(x, ...)

## S3 method for class 'mcmcRocPrc'
plot(x, n = 40, alpha = 0.5, ...)

## S3 method for class 'mcmcRocPrc'
as.data.frame(
  x,
  row.names = NULL,
  optional = FALSE,
  what = c("auc", "roc", "prc"),
  ...
)

mcmcRocPrc(object, curves = FALSE, fullsims = FALSE, ...)

## Default S3 method:
mcmcRocPrc(object, curves, fullsims, yvec, ...)

## S3 method for class 'jags'
mcmcRocPrc(
  object,
  curves = FALSE,
  fullsims = FALSE,
  yname,
  xnames,
  posterior_samples,
  ...
)

## S3 method for class 'rjags'
mcmcRocPrc(object, curves = FALSE, fullsims = FALSE, yname, xnames, ...)

## S3 method for class 'runjags'
mcmcRocPrc(object, curves = FALSE, fullsims = FALSE, yname, xnames, ...)

## S3 method for class 'stanfit'
mcmcRocPrc(object, curves = FALSE, fullsims = FALSE, data, xnames, yname, ...)

## S3 method for class 'stanreg'
mcmcRocPrc(object, curves = FALSE, fullsims = FALSE, ...)

## S3 method for class 'brmsfit'
mcmcRocPrc(object, curves = FALSE, fullsims = FALSE, ...)

## S3 method for class 'bugs'
mcmcRocPrc(
  object,
  curves = FALSE,
  fullsims = FALSE,
  data,
  xnames,
  yname,
  type = c("logit", "probit"),
  ...
)

## S3 method for class 'mcmc'
mcmcRocPrc(
  object,
  curves = FALSE,
  fullsims = FALSE,
  data,
  xnames,
  yname,
  type = c("logit", "probit"),
  force = FALSE,
  ...
)

Arguments

x

a mcmcRocPrc() object

...

Used by methods

n

plot method: if 'fullsims = TRUE', how many sample curves to draw?

alpha

plot method: alpha value for plotting sampled curves; between 0 and 1

row.names

see [base::as.data.frame()]

optional

see [base::as.data.frame()]

what

which information to extract and convert to a data frame?

object

A fitted binary choice model, e.g. "rjags" object (see R2jags::jags()), or a ⁠[N, iter]⁠ matrix of predicted probabilites.

curves

logical indicator of whether or not to return values to plot the ROC or Precision-Recall curves. If set to FALSE (default), results are returned as a list without the extra values.

fullsims

yvec

A numeric(N) vector of observed outcomes.

yname

(character(1))
The name of the dependent variable, should match the variable name in the JAGS data object.

xnames

(base::character())
A character vector of the independent variable names, should match the corresponding names in the JAGS data object.

posterior_samples

a "mcmc" object with the posterior samples

data

the data that was used in the 'stan(data = ?, ...)' call

type

"logit" or "probit"

force

for MCMCpack models, suppress warning if the model does not appear to be a binary choice model?

Details

If only the average AUC-ROC and PR are of interest, setting curves = FALSE and fullsims = FALSE can greatly speed up calculation time. The curve data (curves = TRUE) is needed for plotting. The plot method will always plot both the ROC and PR curves, but the underlying data can easily be extracted from the output for your own plotting; see the documentation of the value returned below.

The default method works with a matrix of predicted probabilities and the vector of observed incomes as input. Other methods accommodate some of the common Bayesian modeling packages like rstan (which returns class "stanfit"), rstanarm ("stanreg"), R2jags ("jags"), R2WinBUGS ("bugs"), and MCMCpack ("mcmc"). Even if a package-specific method is not implemented, the default method can always be used as a fallback by manually calculating the matrix of predicted probabilities for each posterior sample.

Note that MCMCpack returns generic "mcmc" output that is annotated with some additional information as attributes, including the original function call. There is no inherent way to distinguish any other kind of "mcmc" object from one generated by a proper MCMCpack modeling function, but as a basic precaution, mcmcRocPrc() will check the saved call and return an error if the function called was not MCMClogit() or MCMCprobit(). This behavior can be suppressed by setting force = TRUE.

Value

Returns a list with length 2 or 4, depending on the on the "curves" and "fullsims" argument values:

"area_under_roc": numeric(); either length 1 if fullsims = FALSE, or one value for each posterior sample otherwise
"area_under_prc": numeric(); either length 1 if fullsims = FALSE, or one value for each posterior sample otherwise
"prc_dat": only if curves = TRUE; a list with length 1 if fullsims = FALSE, longer otherwise
"roc_dat": only if curves = TRUE; a list with length 1 if fullsims = FALSE, longer otherwise

References

Beger, Andreas. 2016. “Precision-Recall Curves.” Available at doi: 10.2139/ssrn.2765419

Examples


if (interactive()) {
# load simulated data and fitted model (see ?sim_data and ?jags_logit)
data("jags_logit")

# using mcmcRocPrc
fit_sum <- mcmcRocPrc(jags_logit,
                      yname = "Y",
                      xnames = c("X1", "X2"),
                      curves = TRUE,
                      fullsims = FALSE)
fit_sum                     
plot(fit_sum)

# Equivalently, we can calculate the matrix of predicted probabilities 
# ourselves; using the example from ?jags_logit:
library(R2jags)

data("sim_data")
yvec <- sim_data$Y
xmat <- sim_data[, c("X1", "X2")]

# add intercept to the X data
xmat <- as.matrix(cbind(Intercept = 1L, xmat))

beta <- as.matrix(as.mcmc(jags_logit))[, c("b[1]", "b[2]", "b[3]")]
pred_mat <- plogis(xmat %*% t(beta)) 

# the matrix of predictions has rows matching the number of rows in the data;
# the column are the predictions for each of the 2,000 posterior samples
nrow(sim_data)
dim(pred_mat)

# now we can call mcmcRocPrc; the default method works with the matrix
# of predictions and vector of outcomes as input
mcmcRocPrc(object = pred_mat, curves = TRUE, fullsims = FALSE, yvec = yvec)
}

Simulated data for examples

Description

Simulated data to fit example models against

Usage

sim_data

Format

a data.frame

Examples

## simulating data
set.seed(123456)
b0 <- 0.2 # true value for the intercept
b1 <- 0.5 # true value for first beta
b2 <- 0.7 # true value for second beta
n <- 500 # sample size
X1 <- runif(n, -1, 1)
X2 <- runif(n, -1, 1)
Z <- b0 + b1 * X1 + b2 * X2
pr <- 1 / (1 + exp(-Z)) # inv logit function
Y <- rbinom(n, 1, pr) 
sim_data <- data.frame(cbind(X1, X2, Y))

Simulated data for examples

Description

Simulated data to fit example models against

Usage

sim_data_interactive

Format

a data.frame

Examples

set.seed(123456)
b0 <- 0.2 # true value for the intercept
b1 <- 0.5 # true value for first beta
b2 <- 0.7 # true value for second beta
b3 <- -0.3 # true value for second beta
n <- 500 # sample size
X1 <- runif(n, -1, 1)
X2 <- runif(n, -1, 1)
Z_interactive <- b0 + b1 * X1 + b2 * X2 + b3 * (X1 * X2)
Y_interactive <- rnorm(n, Z_interactive, 1)
sim_data_interactive <- data.frame(cbind(X1, X2, Y = Y_interactive))

Simulated data for examples

Description

Simulated data to fit example models against

Usage

sim_data_interactive_cat

Format

a data.frame

Examples

set.seed(123456)
b0 <- 0.2 # true value for the intercept
b1 <- 0.5 # true value for first beta
b2 <- 0.7 # true value for second beta
b3 <- -0.3 # true value for second beta
n <- 500 # sample size
X1 <- runif(n, -1, 1)
X3 <- rbinom(n, 5, .23)
Z_interactive_cat <- b0 + b1 * X1 + b2 * X3 + b3 * (X1 * X3)
Y_interactive_cat <- rnorm(n, Z_interactive_cat, 1)
sim_data_interactive_cat <- data.frame(cbind(X1, X3, Y = Y_interactive_cat))

BayesPostEst Overview

Description

Main Functions

Deprecated functions in package BayesPostEst.

Description

Usage

Arguments

Value

mcmcFDplot

References

See Also

Examples

Compute ROC and PR curve points

Description

Usage

Details

Try to identify link function

Description

Usage

Arguments

Value

Try to identify if a stanfit model is a binary choice model

Description

Usage

Arguments

Fitted JAGS interactive linear model

Description

Usage

Format

Examples

Fitted JAGS interactive linear model with categorical moderator

Description

Usage

Format

Examples

Fitted JAGS logit model

Description

Usage

Format

Examples

Fitted JAGS probit model

Description

Usage

Format

Examples

Predicted Probabilities using Bayesian MCMC estimates for the "Average" Case

Description

Usage

Arguments

Details

Value

References

Examples

Coefficient Plots for MCMC Output

Description

Usage

Arguments

Value

Author(s)

Examples

First Differences of a Bayesian Logit or Probit model

Description

Usage

Arguments

Value

References

Examples

Marginal Effects Plots for MCMC Output

Description

Usage

Arguments

Value

Author(s)

Examples

Predicted Probabilities using Bayesian MCMC estimates for the Average of Observed Cases

Description

Usage

Arguments

Details

Value

`mcmcFDplot`