Help for package ProbBayes

Type:

Package

Title:

Probability and Bayesian Modeling

Version:

1.1

Author:

Jim Albert <albert@bgsu.edu>

Maintainer:

Jim Albert <albert@bgsu.edu>

Depends:

LearnBayes, ggplot2, gridExtra, shiny

Suggests:

knitr, rmarkdown

URL:

https://github.com/bayesball/ProbBayes

License:

GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]

Packaged:

2020-02-27 13:44:56 UTC; jamesalbert

Description:

Functions and datasets to accompany J. Albert and J. Hu, "Probability and Bayesian Modeling", CRC Press, (2019, ISBN: 1138492566).

Encoding:

UTF-8

LazyData:

true

NeedsCompilation:

Repository:

CRAN

Date/Publication:

2020-03-06 09:40:07 UTC

Trend Estimates of Bird Populations

Description

Trend Estimates for 28 Grassland Bird Species

Usage

  BBS_survey

Format

A data frame with 28 observations on the following 4 variables.

Species_Name: name of bird species
Trend: trend estimate
SE: standard error of estimate
N_Site: number of observations at site

Source

North American Breeding Bird Survey

Expeditures of U.S. Households

Description

Expeditures of U.S. Households

Usage

  CEsample

Format

A data frame with 1000 observations on the following 3 variables.

UrbanRural: urban/rural status of CU - 1 = urban and 2 = rural
TotalIncomeLastYear: amount of CU income before taxes in the last 12 months
TotalExpLastQ: CU's total expenditure in the last quarter

Source

U.S. Bureau of Labor Statistics

Shiny App to Choose a Beta Curve

Description

Interactively choose beta curve by selecting the .5 and .9 quantiles

Usage

  ChooseBeta()

Value

None

Author(s)

Jim Albert

Personal Computer Data

Description

Variables on a sample of personal computers

Usage

  ComputerPriceSample

Format

A data frame with 500 observations on the following 5 variables.

Price: sales price
Speed: clock speed in MHz
HardDrive: size of hard drive in MB
Ram: size of Ram in MB
Premium: premium status of manufacturer

Source

Unknown

Personality and Volunteering

Description

Data from study to learn about personality determinants of volunteering

Usage

  Cowles

Format

A data frame with 1421 observations on the following 5 variables.

subject: subject number
neuroticism: measurement of neuroticism
extraversion: measurement of extraversion
sex: male or female
volunteer: no or yes

Source

Unknown.

Risk-adjusted mortality outcomes for all NYC hospitals

Description

Reported deaths from heart attack for hospitals in New York City

Usage

  DeathHeartAttackDataNYCfull

Format

A data frame with 45 observations on the following 5 variables.

Hospital: name of hospital
Borough: borough in New York City
Type: type of hospital
Cases: number of heart attach cases
Deaths: number of deaths

Source

New York State Department of Health

Risk-adjusted mortality outcomes for Manhattan hospitals

Description

Reported deaths from heart attack for hospitals in Manhattan in New York City

Usage

  DeathHeartAttackManhattan

Format

A data frame with 13 observations on the following 4 variables.

Hospital: name of hospital
Type: type of hospital
Cases: number of heart attach cases
Deaths: number of deaths

Source

New York State Department of Health

Graduate School Admission

Description

Study to see what variables are helpful in determining admission to Graduate School

Usage

  GradSchoolAdmission

Format

A data frame with 400 observations on the following 3 variables.

Admission: student was admitted (1) or not admitted (0)
GRE: GRE score
GPA: grade point average

Source

Unknown.

Homework Hours for Five Schools

Description

Weekly hours spent on homework for students from five schools

Usage

  HWhours5schools

Format

A data frame with 116 observations on the following 2 variables.

school: school number of student
hours: weekly hours spent on homework

Source

Unknown.

Frequency use of "can" for Federalist Papers

Description

Frequency use of "can" for Federalist Papers written by Alexander Hamilton

Usage

  Hamilton_can

Format

A data frame with 49 observations on the following 6 variables.

Name: name of Federalist paper
Total: total number of words
word: word that is counted
N: frequency of the word
Rate: fraction of words with that word
Authorship: author of paper

Source

http://www.gutenberg.org/ebooks/18

JAGS Script for Common Models

Description

Model script for JAGS to fit a particular Bayesian model. Currently the possible models are "beta_binomial", "hier_normal", "hier_trajectory", "normal", "regression", "regression_cond_means", and "trajectory".

Usage

  JAGS_script(model)

Arguments

model

name of the model

Value

A character string containing the model script

Korean Drama Ratings

Description

Ratings of Korean dramas prodcast during different days of the week and didfferent producers

Usage

  KDramaData

Format

A data frame with 101 observations on the following 5 variables.

Drama: name of drama
Schedule: indicator of what day the drama was broadcast
Producer: indicator of the producer of the drama
Rating: rating of the drama
Date: date of rating

Source

AGB Nielsen Media Research Group

U.S. Women Labor Participation

Description

U.S. women labor participation and family income

Usage

  LaborParticipation

Format

A data frame with 753 observations on the following 2 variables.

Participation: labor participation of the wife
FamilyIncome: family income exclusive of wife's income in $1000

Source

University of Michigan Panel Study of Income Dynamics

Frequency use of "can" for Federalist Papers

Description

Frequency use of "can" for Federalist Papers written by James Madison

Usage

  Madison_can

Format

A data frame with 49 observations on the following 6 variables.

Name: name of Federalist paper
Total: total number of words
word: word that is counted
N: frequency of the word
Rate: fraction of words with that word
Authorship: author of paper

Source

http://www.gutenberg.org/ebooks/18

Professor Salary Study

Description

Study on inputs that impact a salary of a professor

Usage

  ProfessorSalary

Format

A data frame with 397 observations on the following 7 variables.

subject: subject id
rank: professor rank
discipline: A is theoretical and B is applied
yrs.since.phd: number of years since receipt of doctorate
yrs.service: number of years of service
sex: Female or Male
salary: nine-month salary in dollars

Source

Unknown.

Scores on Achievement Exam

Description

Scores on a 20-question T/F exam

Usage

  ScoreData

Format

A data frame with 30 observations on the following 2 variables.

Person: subject id
Score: number correct in 20-question exam

Source

Data randomly generated.

Movie Ratings

Description

Ratings for a set of 2010 animation movies

Usage

  animation_ratings

Format

A data frame with 55 observations on the following 6 variables.

userId: user ID
movieId: movie ID
rating: numerical rating
timestamp: time when the rating was recorded
title: name of the movie
Group_Number: numerical ID of movie

Source

MovieLens by GroupLens Research

Arm span and height measurements

Description

Arm span and height measurements for a sample of students

Usage

  arm_height

Format

A data frame with 20 observations on the following 2 variables.

arm: length of arm span in cm
height: height in cm

Source

Sample of college students

Bar plot of numeric or character data

Description

Constructs frequency bar plot of a vector of numeric data or a vector of character data

Usage

  bar_plot(y, ...)

Arguments

y

vector of outcomes

...

title of the graph

Value

A ggplot2 object containing the bar graph.

Author(s)

Jim Albert

Examples

  s <- spinner_data(c(1, 2, 2, 1), nsim=100)
  bar_plot(s, "Spinner Data")
  y <- c(rep("a", 10), rep("b", 5),
         rep("c", 8), rep("d", 4))
  bar_plot(y)

Batting Statistics for 2018 Season

Description

Batting statistics collected for all players during the first month and remainder of 2018 baseball season

Usage

  batting_2018

Format

A data frame with 549 observations on the following 5 variables.

Name: name of player
AB.x: number of at bats in first month
H.x: number of hits in first month
AB.y: number of at bats in remainder of season
H.y: number of hits in remainder of season

Source

Data collected from Retrosheet.org.

Computes Posterior Probabilities for Discrete Models

Description

Given a data table with columns Prior and Likelihood, computes posterior probabilities

Usage

  bayesian_crank(d)

Arguments

d

data frame with columns Prior and Likelihood

Value

data frame with new columns Product and Posterior

Author(s)

Jim Albert

Examples

  df <- data.frame(p=c(.1, .3, .5, .7, .9),
                   Prior=rep(1/5, 5))
  y <- 5
  n <- 10
  df$Likelihood <- dbinom(y, prob=df$p, size=n)
  df <- bayesian_crank(df)

Displays Areas Under a Beta Curve

Description

Computes and Displays Areas Under a Beta Curve

Usage

  beta_area(lo, hi, shape_par, Color = "orange")

Arguments

lo

lower bound of interval

hi

upper bound of interval

shape_par

vector of shape parameters of the beta curve

Color

color of shading in the graph

Value

ggplot2 object containing the graphical display.

Author(s)

Jim Albert

Examples

  lo <- .2
  hi <- .4
  shape_par <- c(2, 5)
  beta_area(lo, hi, shape_par)

Simulate random data from a beta curve

Description

Simulate random data from a beta curve

Usage

  beta_data(shape_par, nsim=1000)

Arguments

shape_par

vector of shape parameters of the beta curve

nsim

number of simulations

Value

A vector of random draws from the beta distribution

Author(s)

Jim Albert

Examples

  shape_par <- c(12, 8)
  beta_data(shape_par, 10)

Draw a Beta Curve

Description

Draw a Beta Curve

Usage

  beta_draw(shape_pars)

Arguments

shape_pars

vector of shape parameters of the beta curve

Value

ggplot2 object containing the graphical display.

Author(s)

Jim Albert

Examples

  shape_pars <- c(2, 5)
  beta_draw(shape_pars)

Probability Interval for a Beta Curve

Description

Computes Probability Interval for a Beta Curve

Usage

  beta_interval(prob, shape_par, Color = "orange")

Arguments

prob

value of coverage probability

shape_par

vector of shape parameters of the beta curve

Color

color of shading in the graph

Value

ggplot2 object containing the graphical display.

Author(s)

Jim Albert

Examples

  shape_par <- c(2, 5)
  beta_interval(.5, shape_par)

Plot of Two Beta Curves

Description

Plot of Prior and Posterior Beta Curves

Usage

  beta_prior_post(prior_shapes, post_shapes)

Arguments

prior_shapes

vector of shape parameters of the beta prior

post_shapes

vector of shape parameters of the beta posterior

Value

ggplot2 object containing the graphical display.

Author(s)

Jim Albert

Examples

 prior_shapes <- c(4, 6)
 post_shapes <- c(19, 16)
 beta_prior_post(prior_shapes, post_shapes)

Displays a Quantile of a Beta Curve

Description

Displays a Quantile of a Beta Curve

Usage

  beta_quantile(prob, shape_par, Color = "orange")

Arguments

prob

probability value of interest

shape_par

vector of shape parameters of the beta curve

Color

color of shading in the graph

Value

ggplot2 object containing the graphical display.

Author(s)

Jim Albert

Examples

  # find the .50 quantile (the median)
  prob <- 0.5
  shape_par <- c(2, 5)
  beta_quantile(prob, shape_par)
  # find the .90 quantile (90th percentile)
  prob <- 0.9
  beta_quantile(prob, shape_par)

Text Statistics for Books

Description

Text statistics for a collection of books sold at Amazon.com

Usage

  book_stats

Format

A data frame with 21 observations on the following 3 variables.

Book: name of book
Complex.Words: percentage of words in the book with three or more syllables
Fog.Index: number of years of formal education required to read and understand a passage of text

Source

Data collected from Amazon.com website.

Buffalo snowfall data

Description

Total snowfall in inches for 20 Januarys in Buffalo, New York

Usage

  buffalo_jan

Format

A data frame with 20 observations on the following 2 variables.

SEASON: Season
JAN: inches of total snowfall

Source

National Weather Service, www.weather.gov

Career Trajectory Data for Baseball Players

Description

Season on-base statistics for collection of MLB baseball players who were born in 1978

Usage

  career_1978

Format

A data frame with 399 observations on the following 6 variables.

nameLast: last name of player
Player: id of player
Age: age of player
AgeD: deviation of age from 30
PA: number of plate appearances
OB: number of on-base events

Source

Data collected from Lahman database.

Centers title in a ggplot2 graphic

Description

Centers and increases font size of a ggplot2 graphic title

Usage

centertitle(Color = "blue")

Arguments

Color

color of the text in the ggplot2 title

Value

ggplot2 theme code to center the title

Author(s)

Jim Albert

Examples

df <- data.frame(p=c(.1, .3, .5, .7, .9),
                 Prior=rep(1/5, 5))
ggplot(df, aes(p, Prior)) +
geom_point() +
ggtitle("My Prior") +
centertitle()

Plot of Distribution of Two Proportions

Description

Constructs a graph of the probability distribution of two proportions

Usage

  draw_two_p(prob_matrix, ...)

Arguments

prob_matrix

matrix of probabilities of two proportions with the rows and columns labeled by the values

...

other arguments such as the title of the plot

Value

ggplot2 object containing the graphical display.

Author(s)

Jim Albert

Examples

  prob_matrix <- testing_prior()
  draw_two_p(prob_matrix, title="Testing Prior")

Hypergeometric sampling density

Description

Hypergeometric sampling density

Usage

  dsampling(sample_b, pop_N, pop_B, sample_n)

Arguments

sample_b

number of black balls in sample

pop_N

number of balls in population

pop_B

number of black balls in population

sample_n

number of balls in sample

Value

Value of hypergeometric sampling probability

Author(s)

Jim Albert

Examples

  pop_N <- 10
  pop_B <- 4
  sample_n <- 3
  sample_b <- 2
  dsampling(sample_b, pop_N, pop_B, sample_n)

Computes likelihoods for spinner outcomes

Description

Computes likelihoods for spinner outcomes

Usage

  dspinner(x, Prob)

Arguments

x

vector of spinner observations

Prob

matrix of spinner probabilities where each row corresponds to a different spinner

Value

column vector consisting of the likelihoods for the different spinners

Author(s)

Jim Albert

Examples

  Prob <- matrix(c(.25, .25, .25, .25,
                   .50, .125, .125, .5,
                   .25, .5, .25, 0), 3, 4, byrow=TRUE)
  x <- c(1, 2, 1, 3, 4)
  dspinner(x, Prob)

Electricity Bills

Description

Electricity bills collected for all months for five years

Usage

  electricbills

Format

A data frame with 62 observations on the following 3 variables.

Year: year
Month: number of month
Amount: electicity bill in dollars

Source

Data collected for one household in Ohio

Frequency use of words for Federalist Papers

Description

Frequency use of words for Federalist Papers written by either Alexander Hamilton or James Madison

Usage

  federalist_word_study

Format

A data frame with 56853 observations on the following 7 variables.

Name: name of Federalist paper
Total: total number of words
word: word that is counted
N: frequency of the word
Rate: fraction of words with that word
Authorship: author of paper
Disputed: is authorship disputed?

Source

http://www.gutenberg.org/ebooks/18

Times to Serve for Roger Federer

Description

Measurements of time to serve for 20 serves of the tennis player Roger Federer

Usage

  federer_time_to_serve

Format

A data frame with 20 observations on the following one variable.

time: time to serve in seconds

Source

https://github.com/JeffSackmann

Fire Calls for Zip Code Areas

Description

The number of fire calls and building fires for ten zip codes in Montgomery County, Pennsylvania

Usage

  fire_calls

Format

A data frame with 10 observations on the following 3 variables.

Zip_Code: zip code
Fire_Calls: number of fire calls
Building_Fires: number of building fires

Source

kaggle.com

Football Field Goals Dataset

Description

Field goal attempt data for three seasons of professional football

Usage

  football_field_goals

Format

A data frame with 3025 observations on the following 5 variables.

Team: name of team
Year: football season
Kicker: last name of kicker
Distance: distance in feet of attempt
Success: attempt was successful (1) or not (0)

Source

Data collected by Michael Lopez.

Gas bill data

Description

Measurements of average temperature and natural gas bill for each month in 2017

Usage

  gas2017

Format

A data frame with 12 observations on the following 3 variables.

Month: abbreviation of month
Temp: average temperature
Bill: natural gas bill in dollars

Source

Personal data collected by a homeowner in Ohio

Gibbs sampling of the beta-binomial distribution

Description

Implements Gibbs sampling of the beta-binomial distribution

Usage

  gibbs_betabin(n, a, b, p = 0.5, iter = 1000)

Arguments

n

binomial sample size

a

first beta shape parameter

b

second beta shape parameter

p

starting value of proportion in algorithm

iter

number of iterations

Value

matrix of simulated draws from the algorithm

Author(s)

Jim Albert

Examples

sp <- gibbs_betabin(20, 5, 5, 100)

Gibbs sampling of a bivariate discrete distribution

Description

Implements Gibbs sampling for an arbitrary bivariate discrete distribution

Usage

  gibbs_discrete(p, i = 1, iter = 1000)

Arguments

p

matrix defining the probabiity distribution

i

starting row of the matrix

iter

number of cycles of algorithm

Value

matrix of simulated draws from algorithm

Author(s)

Jim Albert

Examples

p <- matrix(c(4, 3, 2, 1,
              3, 4, 3, 2,
              2, 3, 4, 3,
              1, 2, 3, 4) / 40, 4, 4, byrow = TRUE)
out <- gibbs_discrete(p, 1, 100)

Gibbs sampling of the normal sampling posterior

Description

Implements Gibbs sampling for normal sampling with independent priors on the mean and precision

Usage

  gibbs_normal(s, P = 0.002, iter = 1000)

Arguments

s

a list with components y, the observed data, mu0, the prior mean of mu, sigma0, the prior standard deviation of mu, a, the shape parameter of the gamma prior on P, b, the rate parameter of the gamma prior on P

P

starting value of the precision parameter

iter

number of iterations

Value

matrix of simulated draws of (mu, P) from the algorithm

Author(s)

Jim Albert

Examples

s <- list(y = rnorm(20, 5, 2),
  mu0 = 10, sigma0 = 3, a = 1, b = 1)
out <- gibbs_normal(s, P = 0.01, iter=100)

House price data

Description

Measurements of house size and selling price for a collection of homes in a city in Ohio

Usage

  house_prices

Format

A data frame with 24 observations on the following 2 variables.

price: selling price in $1000
size: square footage of house

Source

Zillow.com

Increases font size of text

Description

Increases font size on all text in a ggplot2 graphic

Usage

  increasefont(Size = 18)

Arguments

Size

font size of all textual elements in a ggplot2 graphic

Value

ggplot2 theme code to increase the font size

Author(s)

Jim Albert

Examples

df <- data.frame(p=c(.1, .3, .5, .7, .9),
                 Prior=rep(1/5, 5))
ggplot(df, aes(p, Prior)) +
geom_point() + increasefont()

Graph of several normal curves

Description

Graph of several normal curves

Usage

  many_normal_plots(list_normal_par)

Arguments

list_normal_par

list of vectors, where each vector is a mean and standard deviation for a normal distribution

Value

ggplot2 object containing the graphical display.

Author(s)

Jim Albert

Examples

 list_normal_par <- list(c(100, 15),
     c(110, 15), c(120, 15))
 many_normal_plots(list_normal_par)

Graphs a collection of spinners

Description

Graphs a collection of spinners

Usage

  many_spinner_plots(list_regions)

Arguments

list_regions

list of vectors of integer areas for the spins 1, 2, ...

Value

A ggplot2 object containing the spinner displays

Author(s)

Jim Albert

Examples

  regions1 <- c(1, 1, 1)
  regions2 <- c(2, 1, 2, 1)
  many_spinner_plots(list(regions1, regions2))

Annual Marriage Counts in Italy

Description

Annual marriage counts per 1000 of the population in Italy from 1936 to 1951

Usage

  marriage_counts

Format

A data frame with 16 observations on the following 2 variables.

Year: year
Count: count of marriages per 1000 people

Source

Unknown.

Nutritional data for McDonalds Sandwiches

Description

Serving size and calories for a selection of sandwiches from McDonalds

Usage

  mcdonalds

Format

A data frame with 11 observations on the following 3 variables.

Sandwich: name of sandwich
Size: serving size in grams
Calories: calories of sandwich

Source

McDonalds restaurant

Metropolis sampling of a continuous distribution

Description

Implements Metropolis sampling for an arbitrary continuous probability distribution

Usage

  metropolis(logpost, current, C, iter, ...)

Arguments

logpost

function definition of the log probability function

current

starting value of algorithm

C

half-width of proposal interval

iter

number of iterations

...

other inputs needed in logpost function

Value

S

vector of simulated values

accept_rate

acceptance rate of algorithm

Author(s)

Jim Albert

Examples

lpost <- function(theta, s){
  dnorm(s$ybar, theta, s$se, log = TRUE) +
    dcauchy(theta, s$loc, s$scale, log = TRUE)
}
s <- list(ybar = 20,
          se = 0.4,
          loc = 10,
          scale = 2)
post <- metropolis(lpost, 10, 20, 100, s)

Movies Sales Data

Description

Weekend and gross sales for a selection of movies released in 2017

Usage

  movies2017

Format

A data frame with 10 observations on the following 3 variables.

Movie: name of movie
Weekend: opening weekend sales in millions of dollars
Gross: gross sales in millions of dollars

Source

Internet Movie Database

Basketball Shooting Data for Point Guards

Description

Field goal and free throw shooting data for a collection of great NBA point guards

Usage

  nba_guards

Format

A data frame with 230 observations on the following 6 variables.

Player: name of player
Age: age of player
FG: field goals
FGA: field goal attempts
FT: free throws
FTA: free throw attempts

Source

Data collected from Basketball-Reference.com.

Displays Area Under a Normal Curve

Description

Computes and Displays Area Under a Normal Curve

Usage

  normal_area(lo, hi, normal_pars, Color = "orange")

Arguments

lo

lower bound of interval

hi

upper bound of interval

normal_pars

vector of mean and standard deviation of the normal curve

Color

color of shading in plot

Value

ggplot2 object containing the graphical display.

Author(s)

Jim Albert

Examples

  lo <- 10
  hi <- 20
  normal_pars <- c(25, 10)
  normal_area(lo, hi, normal_pars)

Draws a Normal Curve

Description

Draws a Normal Curve

Usage

  normal_draw(normal_pars, Color = "red")

Arguments

normal_pars

vector of mean and standard deviation of the normal curve

Color

color of line in plot

Value

ggplot2 object containing the graphical display.

Author(s)

Jim Albert

Examples

  normal_pars <- c(2, 1)
  normal_draw(normal_pars)

Probability Interval for a Normal Curve

Description

Computes "equal-tails" probability interval for a normal curve

Usage

  normal_interval(prob, normal_pars, Color = "orange")

Arguments

prob

value of coverage probability

normal_pars

vector of mean and standard deviation of the normal curve

Color

color of shading in plot

Value

ggplot2 object containing the graphical display.

Author(s)

Jim Albert

Examples

  normal_pars <- c(2, 0.5)
  prob <- 0.5
  normal_interval(prob, normal_pars)

Displays a Quantile of a Normal Curve

Description

Displays a Quantile of a Normal Curve

Usage

  normal_quantile(prob, normal_pars, Color = "orange")

Arguments

prob

probability value of interest

normal_pars

vector of mean and standard deviation of the normal curve

Color

color of shading in plot

Value

ggplot2 object containing the graphical display.

Author(s)

Jim Albert

Examples

  normal_pars <- c(100, 10)
  prob <- 0.7
  normal_quantile(prob, normal_pars)

Updates a Normal Prior with Normal Data

Description

Finds the parameters of the normal posterior with normal data and a normal prior

Usage

  normal_update(prior, data, teach=FALSE)

Arguments

prior

vector with components mean and sd of the normal prior

data

vector with components the sample mean and the standard error of the estimate

teach

logical variable indicating the form of the output

Value

If teach = TRUE, returns data frame that displays the mean, precision, and standard deviation for the prior, data, and posterior. If teach = FALSE, returns a vector with mean and standard deviation of the posterior.

Author(s)

Jim Albert

Examples

  prior <- c(100, 10)
  data <- c(110, 15)
  normal_update(prior, data)
  normal_update(prior, data, teach=TRUE)

Winning Times in the 100 Meter Butterfly Race

Description

Winning times in seconds for the men's and women's 100m butterfly race for the Olympics from 1964 through 2016.

Usage

  olympic_butterfly

Format

A data frame with 28 observations on the following 3 variables.

Year: year of Olympics
Gender: gender
Time: winning time in seconds

Source

https://www.olympic.org/swimming/

Graphs prior and posterior probabilities

Description

Graphs prior and posterior probabilities from a discrete Bayesian model

Usage

  prior_post_plot(d, Color = "orange")

Arguments

d

data frame where the first column are the model values, and columns named Prior and Posterior

Color

fill color for the bars

Value

ggplot2 object containing the graphical display.

Author(s)

Jim Albert

Examples

d <- data.frame(p=c(.1, .3, .5, .7, .9),
                 Prior=rep(1/5, 5))
y <- 5
n <- 10
d$Likelihood <- dbinom(y, prob=d$p, size=n)
d <- bayesian_crank(d)
prior_post_plot(d, "red")

Constructs a graph of a probability distribution

Description

Constructs a graph of a discrete probability distribution

Usage

  prob_plot(d, Color = "red", Size = 1.5)

Arguments

d

data frame where the first two columns are the variable and associated probabilities

Color

color of line in plot

Size

width of line in plot

Value

A ggplot2 object containing the plot display

Author(s)

Jim Albert

Examples

  d <- data.frame(x=1:5,
         Probability=c(.1, .2, .3, .3, .1))
  prob_plot(d)

Prices of One Carat Diamonds

Description

Prices of a sample of one carat diamonds

Usage

  pt100price

Format

A data frame with 25 observations on the following 2 variables.

diamond: index of diamond
price: price divided by 100

Source

Unknown.

Prices of 0.99 Carat Diamonds

Description

Prices of a sample of 0.99 carat diamonds

Usage

  pt99price

Format

A data frame with 23 observations on the following 2 variables.

diamond: index of diamond
price: price divided by 100

Source

Unknown.

Baseball Win-Loss Records

Description

Final standings of the MLB baseball teams in the 2018 season

Usage

  pythag2018

Format

A data frame with 30 observations on the following 7 variables.

Team: team abbreviation
League: league abbreviation
W: number of wins
L: number of losses
Pct: proportion of wins
R: average runs scored
RA: average runs allowed

Source

Lahman database

Metropolis sampling of a discrete distribution

Description

Implements Metropolis sampling for an arbitrary discrete probability distribution

Usage

  random_walk(pd, start, num_steps)

Arguments

pd

function containing discrete probability function on the integers 1, 2, ...

start

starting value of algorithm

num_steps

number of iterations of algorithm

Value

A vector of simulated values

Author(s)

Jim Albert

Examples

# random walk through a binomial distribution
pd <- function(x){
  dbinom(x, size = 10, prob = 0.5)
}
start <- 4
num_steps <- 50
out <- random_walk(pd, start, num_steps)

Sleeping Times

Description

Sample of sleeping times for a single night for a sample of college students

Usage

  sleeping_times

Format

A data frame with 14 observations on the following single variable.

hours: number of hours of sleep

Source

Personal collection

Implements Bayes' rule for a spinner problem

Description

Computes and plots the posterior distribution of spinners given a sequence of spins

Usage

  spinner_bayes(list_regions,
                prior,
                data,
                plot=TRUE)

Arguments

list_regions

list of vectors of integer areas for the spins 1, 2, ...

prior

a vector containing the prior probabilities for the spinners

data

a vector containing the spin values where 1, 2, 3, ... are the possible spins

plot

if plot=TRUE, a comparative graph of the prior and posterior probabilities is displayed

Value

A data frame with variables Spinner, Prior, Likelihood, Product, and Posterior

Author(s)

Jim Albert

Examples

  regions1 <- c(1, 1, 1)
  regions2 <- c(2, 1, 2, 1)
  data <- c(1, 1, 1, 2)
  spinner_bayes(list(regions1, regions2),
                prior=c(0.5, 0.5),
                data)

Simulate random data from a spinner

Description

Simulate random data from a spinner

Usage

  spinner_data(regions, nsim=1000)

Arguments

regions

vector of integer values for the spins 1, 2, ...

nsim

number of spins

Value

A vector of random spins from the spinner

Author(s)

Jim Albert

Examples

  regions <- c(2, 1, 1, 2)
  spinner_data(regions, nsim=20)

Computes likelihood matrix for many spinners

Description

Computes likelihood matrix for many spinners

Usage

  spinner_likelihoods(regions)

Arguments

regions

list of vectors of integer areas for the spins 1, 2, ...

Value

A matrix where each row corresponds to the outcome probabilities for one spinner.

Author(s)

Jim Albert

Examples

  sp1 <- c(2, 1, 1)
  sp2 <- c(1, 1, 1, 1)
  regions <- list(sp1, sp2)
  spinner_likelihoods(regions)

Constructs a spinner

Description

Constructs a spinner with different regions

Usage

  spinner_plot(probs, ...)

Arguments

probs

vector of probabilities for the spins 1, 2, ...

...

optional vector of values and title

Value

A ggplot2 object containing the spinner display

Author(s)

Jim Albert

Examples

  probs <- rep(.2, 5)
  spinner_plot(probs,
         values=c("A", "B", "C", "D", "E"),
         title="My Spinner")
  # probs does not need to be normalized
  spinner_plot(c(1, 2, 1, 2))

Display probability distribution for a spinner

Description

Display probability distribution for a spinner

Usage

  spinner_probs(regions)

Arguments

regions

vector of positive values for the spins 1, 2, ...

Value

Dataframe with variables Region and Prob

Author(s)

Jim Albert

Examples

  regions <- c(2, 1, 1, 2)
  spinner_probs(regions)

Taxi Fares

Description

Sample of taxi fares from a particular city

Usage

  taxi_fares

Format

A data frame with 20 observations on the following single variable.

fare: taxi cab fare

Source

Personal collection

Tennis Times to Serve

Description

Data on time to serve for six professional tennis players

Usage

  tennis_serve

Format

A data frame with 6 observations on the following 3 variables.

Player: last name of player
n: number of serves
ybar: mean time to serve

Source

https://github.com/JeffSackmann

Testing prior for two proportions

Description

Constructs a discrete distribution for two proportions under a testing or uniform hypotheses

Usage

  testing_prior(lo=.1, hi=.9, n_values=9,
        pequal=0.5, uniform=FALSE)

Arguments

lo

minimum value of each proportion

hi

maximum value of each proportion

n_values

number of values of each proportion

pequal

probability of the equality of the two proportions

uniform

indicates if a uniform prior is desired

Value

matrix of probabilities where the rows and columns are labeled by the values of the proportions

Author(s)

Jim Albert

Examples

  # testing prior where each proportion is
  # .1, .3, .5, .7, .9
  Prob <- testing_prior(.1, .9, 5)
  # uniform prior over same proportion values
  Prob <- testing_prior(.1, .9, 5, uniform=TRUE)

Mike Trout Statcast Data

Description

Launch speed and distance traveled for a sample of balls hit by the baseball player Mike Trout

Usage

  trout20

Format

A data frame with 25 observations on the following 2 variables.

launch_speed: launch speed in mph
hit_distance_sc: distance in feet

Source

Major League Baseball Advanced Media

Summaries of a probability matrix

Description

Computes posterior of difference P2 - P1 of a probability matrix of two proportions

Usage

  two_p_summarize(prob_matrix)

Arguments

prob_matrix

probability matrix where the rows and columns are labeled with the values of the proportions

Value

data frame with variables diff21 and Prob where diff21 = P2 - P1

Author(s)

Jim Albert

Examples

  # use uniform prior over values .2, .3, .4
  prob_matrix <- testing_prior(.2, .4, 3, uniform=TRUE)
  two_p_summarize(prob_matrix)

Posterior updating of two proportions

Description

Computes posterior distribution of two proportions with a discrete prior

Usage

  two_p_update(prior, s1f1, s2f2)

Arguments

prior

prior probability matrix where the rows and columns are labeled with the values of the proportions

s1f1

number of successes and number of failures from first sample

s2f2

number of successes and number of failures from second sample

Value

posterior probability matrix

Author(s)

Jim Albert

Examples

  prior <- testing_prior()
  s1f1 <- c(3, 10)
  s2f2 <- c(8, 20)
  two_p_update(prior, s1f1, s2f2)

Times to Serve for Two Tennis Players

Description

Measurements of time to serve serves of the tennis players Roger Federer and Rafael Nadal

Usage

  two_players_time_to_serve

Format

A data frame with 100 observations on the following 2 variables.

Player: last name of player
time: time to serve in seconds

Source

https://github.com/JeffSackmann

Website tracking data

Description

Number of visits to a blog website for different weeks and days of the week

Usage

  web_visits

Format

A data frame with 28 observations on the following 3 variables.

Week: week number
Day: day ofthe week
Count: number of website visits

Source

Personal data collected from Wordpress.com