Type: | Package |
Title: | Functions for Simple a/B Split Test and Multi-Armed Bandit Analysis |
Version: | 0.5.1 |
Date: | 2022-06-028 |
Imports: | boot, gam (≥ 1.09) |
Author: | Thomas Lotze and Markus Loecher |
Maintainer: | Markus Loecher <markus.loecher@gmail.com> |
Description: | A set of functions for doing analysis of A/B split test data and web metrics in general. |
License: | GPL-3 |
NeedsCompilation: | no |
Repository: | CRAN |
Packaged: | 2022-06-28 16:06:35 UTC; loecherm |
Date/Publication: | 2022-06-29 11:20:12 UTC |
Functions for simple A/B split test and multi-armed bandit analysis
Description
A set of functions for doing analysis of A/B split test data and web metrics in general.
Details
Package: | bandit |
Type: | Package |
Title: | Functions for simple A/B split test and multi-armed bandit analysis |
Version: | 0.5.0 |
Date: | 2014-05-03 |
Imports: | boot, gam (>= 1.09) |
Author: | Thomas Lotze and Markus Loecher |
Maintainer: | Thomas Lotze <thomaslotze@thomaslotze.com> |
License: | GPL-3 |
Author(s)
Thomas Lotze and Markus Loecher
best_binomial_bandit
Description
Compute the Bayesian probabilities for each arm being the best binomial bandit.
Usage
best_binomial_bandit(x, n, alpha=1, beta=1)
Arguments
x |
as in prop.test, a vector of the number of successes |
n |
as in prop.test, a vector of the number of trials |
alpha |
shape parameter alpha for the prior beta distribution. |
beta |
shape parameter beta for the prior beta distribution. |
Value
a vector of probabilities for each arm being the best binomial bandit; this can be used for future randomized allocation
Author(s)
Thomas Lotze <thomaslotze@thomaslotze.com> and Markus Loecher
References
Steven L. Scott, A modern Bayesian look at the multi-armed bandit, Appl. Stochastic Models Bus. Ind. 2010; 26:639-658. (http://www.economics.uci.edu/~ivan/asmb.874.pdf)
See Also
Examples
x=c(10,20,30,50)
n=c(100,102,120,130)
arm_probabilities = best_binomial_bandit(x,n)
print(arm_probabilities)
paste("The best arm is likely ", which.max(arm_probabilities), ", with ",
round(100*max(arm_probabilities), 2), " percent probability of being the best.", sep="")
best_binomial_bandit(c(2,20),c(100,1000))
best_binomial_bandit(c(2,20),c(100,1000), alpha = 2, beta = 5)
#quick look at the various shapes of the beta distribution as we change the shape params:
AlphaBeta = cbind(alpha=c(0.5,5,1,2,2),beta=c(0.5,1,3,2,5))
M = nrow(AlphaBeta)
y= matrix(0,100,ncol=M)
x = seq(0,1,length=100)
for (i in 1:M) y[,i] = dbeta(x,AlphaBeta[i,1],AlphaBeta[i,2])
matplot(x,y,type="l", ylim = c(0,3.5), lty=1, lwd=2)
param_strings = paste("a=", AlphaBeta[,"alpha"], ", b=", AlphaBeta[,"beta"], sep="")
legend("top", legend = param_strings, col=1:M, lty=1)
best_binomial_bandit_sim
Description
Compute the Bayesian probabilities for each arm being the best binomial bandit, using simulation.
Usage
best_binomial_bandit_sim(x, n, alpha = 1, beta = 1, ndraws = 5000)
Arguments
x |
as in prop.test, a vector of the number of successes |
n |
as in prop.test, a vector of the number of trials |
alpha |
shape parameter alpha for the prior beta distribution. |
beta |
shape parameter beta for the prior beta distribution. |
ndraws |
number of random draws from the posterior |
Value
a vector of probabilities for each arm being the best binomial bandit; this can be used for future randomized allocation
Author(s)
Thomas Lotze and Markus Loecher
References
Steven L. Scott, A modern Bayesian look at the multi-armed bandit, Appl. Stochastic Models Bus. Ind. 2010; 26:639-658.
(http://www.economics.uci.edu/~ivan/asmb.874.pdf)
See Also
Examples
x=c(10,20,30,33)
n=c(100,102,120,130)
best_binomial_bandit_sim(x,n, ndraws=1000)
round(best_binomial_bandit(x,n),3)
best_binomial_bandit_sim(c(2,20),c(100,1000))
best_binomial_bandit_sim(c(2,20),c(100,1000), alpha = 2, beta = 5)
#quick look at the various shapes of the beta distribution as we change the shape params:
AlphaBeta = cbind(alpha=c(0.5,5,1,2,2),beta=c(0.5,1,3,2,5))
M = nrow(AlphaBeta)
y= matrix(0,100,ncol=M)
x = seq(0,1,length=100)
for (i in 1:M) y[,i] = dbeta(x,AlphaBeta[i,1],AlphaBeta[i,2])
matplot(x,y,type="l", ylim = c(0,3.5), lty=1, lwd=2)
param_strings = paste("a=", AlphaBeta[,"alpha"], ", b=", AlphaBeta[,"beta"], sep="")
legend("top", legend = param_strings, col=1:M, lty=1)
best_poisson_bandit
Description
Compute the Bayesian probabilities for each arm being the best poisson bandit.
Usage
best_poisson_bandit(x, n = NULL)
Arguments
x |
as in prop.test, a vector of the number of successes; it may alternatively be a list of vectors of the results of each trial, if n is not provided |
n |
as in prop.test, a vector of the number of trials; if it is not provided, x must be a list of vectors of the results of each trial |
Value
a vector of probabilities for each arm being the best poisson bandit; this can be used for future randomized allocation
Author(s)
Thomas Lotze <thomaslotze@thomaslotze.com>
References
Steven L. Scott, A modern Bayesian look at the multi-armed bandit, Appl. Stochastic Models Bus. Ind. 2010; 26:639-658. (http://www.economics.uci.edu/~ivan/asmb.874.pdf)
See Also
Examples
p1 = rpois(100, lambda=10)
p2 = rpois(100, lambda=9)
x = sapply(list(p1, p2), sum)
n = sapply(list(p1, p2), length)
best_poisson_bandit(x,n)
deseasonalized_trend
Description
A convenience function to analyze a timeseries and return an estimate (via gam, using day of week factors and smoothed timestamp) of whether, after accounting for day-of-week, there is a significant time-based influence and what that influence is.
Usage
deseasonalized_trend(df, w=NULL)
Arguments
df |
a data frame containing timestamp and value entries |
w |
number of attempts (n for binomial data) |
Value
a list with the following items:
pval |
pval given by anova on gam, to indicate whether s(timestamp) is significant |
smoothed_prediction |
a smoothed prediction over time (on Wednesdays), to give a human-understandable idea of what the change over time has been |
Author(s)
Thomas Lotze <thomaslotze@thomaslotze.com>
Examples
timestamps = as.numeric(as.POSIXct(seq(as.Date("2012-01-01"),as.Date("2012-05-03"),by=1)))
df=data.frame(timestamp = timestamps, value = rnorm(length(timestamps)))
dt = deseasonalized_trend(df)
if (dt$pval < 0.01) {
print("Significant time-based factor")
plot(df$timestamp, dt$smoothed_prediction)
} else {
print("No significant time-based factor")
}
df=data.frame(timestamp = timestamps,
value = sapply(timestamps, function(t) {rpois(1, lambda=t-min(timestamps))}))
dt = deseasonalized_trend(df)
if (dt$pval < 0.01) {
print("Significant time-based factor")
plot(df$timestamp, dt$smoothed_prediction)
} else {
print("No significant time-based factor")
}
summarize_metrics
Description
A convenience function to perform overall metric analysis: mean, median, CI.
Usage
distribution_estimate(v, successes=NULL, num_quantiles=101, observed=FALSE)
Arguments
v |
a vector of values to be analyzed (for nonbinary data), or number of trials (for binary data) |
successes |
number of successes (for binary data) |
num_quantiles |
number of quantiles to split into |
observed |
whether to generate the observed distribution (rather than the estimated distribution of the mean); default FALSE |
Value
a data frame with the following columns:
quantiles |
the estimated quantiles (0,0.01,0.02,...,1) for the mean, using a Beta-binomial estimate of p for binomial data, a bootstrapped quantile distribution for real-valued numbers |
x |
x values for plotting a lineplot of the estimated distribution |
y |
y values for plotting a lineplot of the estimated distribution |
mids |
mid values for plotting a barplot of the estimated distribution |
lefts |
left values for plotting a barplot of the estimated distribution |
rights |
right values for plotting a barplot of the estimated distribution |
widths |
width values for plotting a barplot of the estimated distribution |
heights |
height values for plotting a barplot of the estimated distribution |
probabilities |
probabilities indicating how much probability is contained in each barplot |
Author(s)
Thomas Lotze <thomaslotze@thomaslotze.com>
Examples
metric_list = list(rbinom(n=100,size=1,prob=0.5),
rbinom(n=100,size=1,prob=0.7),
rpois(n=100, lambda=5))
distribution_estimate(length(metric_list[[1]]), sum(metric_list[[1]]))
distribution_estimate(length(metric_list[[2]]), sum(metric_list[[2]]))
de = distribution_estimate(metric_list[[3]])
plot(de$x, de$y, type="l")
barplot(de$heights, de$widths)
distribution_estimate(metric_list[[3]], observed=TRUE)
prob_winner
Description
Function to compute probability that each arm is the winner, given simulated posterior results
Usage
prob_winner(post)
Arguments
post |
the simulated results from the posterior, provided by sim_post |
Author(s)
Thomas Lotze and Markus Loecher
Examples
x=c(10,20,30,50)
n=c(100,102,120,130)
betaPost = sim_post(x,n)
prob_winner(betaPost)
significance_analysis
Description
A convenience function to perform overall proportion comparison using prop.test, before doing pairwise comparisons, to see what outcomes seem to be better than others.
Usage
significance_analysis(x, n)
Arguments
x |
as in prop.test, a vector of the number of successes |
n |
as in prop.test, a vector of the number of trials |
Value
a data frame with the following columns:
successes |
x |
totals |
n |
estimated_proportion |
x/n |
lower |
0.95 confidence interval on the estimated amount by which this alternative outperforms the next-lower alternative |
upper |
0.95 confidence interval on the estimated amount by which this alternative outperforms the next-lower alternative |
significance |
p-value for the test that this alternative outperforms the next-lower alternative |
order |
order, by highest success proportion |
best |
1 if it is part of the 'highest performing group' – those groups which were not significantly different from the best group |
p_best |
Bayesian posterior probability that this alternative is the best binomial bandit |
Note
This is intended for use in A/B split testing – so sizes of n should be roughly equal. Also, note that alternatives which have the same rank are grouped together for analysis with the 'next-lower' alternative, so you may want to check to see if ranks are equal.
Author(s)
Thomas Lotze <thomaslotze@thomaslotze.com>
See Also
Examples
x = c(10,20,30,50)
n = c(100,102,120,130)
sa = significance_analysis(x,n)
sa[rev(order(sa$estimated_proportion)), ]
x = c(37,41,30,43,39,30,31,35,50,30)
n = rep(50, length(x))
sa = significance_analysis(x,n)
sa[rev(order(sa$estimated_proportion)), ]
x = c(37,41,30,43,39,30,31,37,50,30)
n = rep(50, length(x))
sa = significance_analysis(x,n)
sa[rev(order(sa$estimated_proportion)), ]
sim_post
Description
Simulate the posterior distribution the Bayesian probabilities for each arm being the best binomial bandit
Usage
sim_post(x, n, alpha = 1, beta = 1, ndraws = 5000)
Arguments
x |
as in prop.test, a vector of the number of successes |
n |
as in prop.test, a vector of the number of trials |
alpha |
shape parameter alpha for the prior beta distribution. |
beta |
shape parameter beta for the prior beta distribution. |
ndraws |
number of random draws from the posterior |
Author(s)
Thomas Lotze and Markus Loecher
Examples
x=c(10,20,30,50)
n=c(100,102,120,130)
sim_post(x,n)
summarize_metrics
Description
A convenience function to perform overall metric analysis: mean, median, CI.
Usage
summarize_metrics(v, successes=NULL)
Arguments
v |
a vector of values to be analyzed (for nonbinary data), or number of trials (for binary data) |
successes |
number of successes (for binary data) |
Value
a list with the following items:
mean |
mean |
median |
median |
lower |
0.95 confidence interval on the mean |
upper |
0.95 confidence interval on the mean |
num_obs |
number of observations of this metric |
total |
the sum of all values of this metric (mean*num_obs) |
Author(s)
Thomas Lotze <thomaslotze@thomaslotze.com>
Examples
metric_list = list(rbinom(n=100,size=1,prob=0.5),
rbinom(n=100,size=1,prob=0.7),
rpois(n=100, lambda=5))
summarize_metrics(length(metric_list[[1]]), sum(metric_list[[1]]))
summarize_metrics(length(metric_list[[2]]), sum(metric_list[[2]]))
summarize_metrics(metric_list[[3]])
value_remaining
Description
Compute the "value_remaining" in the binomial bandits
Usage
value_remaining(x, n, alpha = 1, beta = 1, ndraws = 10000)
Arguments
x |
as in prop.test, a vector of the number of successes |
n |
as in prop.test, a vector of the number of trials |
alpha |
shape parameter alpha for the prior beta distribution. |
beta |
shape parameter beta for the prior beta distribution. |
ndraws |
number of random draws from the posterior |
Value
value_remaining distribution; the distribution of improvement amounts that another arm might have over the current best arm
Author(s)
Thomas Lotze and Markus Loecher
Examples
x=c(10,20,30,80)
n=c(100,102,120,240)
vr = value_remaining(x, n)
hist(vr)
best_arm = which.max(best_binomial_bandit(x, n))
# "potential value" remaining in the experiment
potential_value = quantile(vr, 0.95)
paste("Were still unsure about the CvR for the best arm (arm ", best_arm,
"), but whatever it is, one of the other arms might beat it by as much as ",
round(potential_value*100, 4), " percent.", sep="")