Help for package NU.Learning

Version:

1.5

Date:

2023-09-15

Title:

Nonparametric and Unsupervised Learning from Cross-Sectional Observational Data

Author:

Bob Obenchain [aut, cre], Stan Young [ctb]

Maintainer:

Bob Obenchain <wizbob@att.net>

Depends:

R (≥ 3.5.0), cluster, lattice

Description:

Especially when cross-sectional data are observational, effects of treatment selection bias and confounding are best revealed by using Nonparametric and Unsupervised methods to "Design" the analysis of the given data ...rather than the collection of "designed data". Specifically, the "effect-size distribution" that best quantifies a potentially causal relationship between a numeric y-Outcome variable and either a binary t-Treatment or continuous e-Exposure variable needs to consist of BLOCKS of relatively well-matched experimental units (e.g. patients) that have the most similar X-confounder characteristics. Since our NU Learning approach will form BLOCKS by "clustering" experimental units in confounder X-space, the implicit statistical model for learning is One-Way ANOVA. Within Block measures of effect-size are then either [a] LOCAL Treatment Differences (LTDs) between Within-Cluster y-Outcome Means ("new" minus "control") when treatment choice is Binary or else [b] LOCAL Rank Correlations (LRCs) when the e-Exposure variable is numeric with (hopefully many) more than two levels. An Instrumental Variable (IV) method is also provided so that Local Average y-Outcomes (LAOs) within BLOCKS may also contribute information for effect-size inferences when X-Covariates are assumed to influence Treatment choice or Exposure level but otherwise have no direct effects on y-Outcomes. Finally, a "Most-Like-Me" function provides histograms of effect-size distributions to aid Doctor-Patient (or Researcher-Society) communications about Heterogeneous Outcomes. Obenchain and Young (2013) <doi:10.1080/15598608.2013.772821>; Obenchain, Young and Krstic (2019) <doi:10.1016/j.yrtph.2019.104418>.

License:

GPL-2

URL:

https://www.r-project.org, http://localcontrolstatistics.org

NeedsCompilation:

Packaged:

2023-09-11 23:49:36 UTC; bobo

Repository:

CRAN

Date/Publication:

2023-09-30 22:52:43 UTC

NU.Learning: Nonparametric and Unsupervised Adjustment for Bias and Confounding

Description

NU.Learning forms Local Treatment Differences (LTDs) or Local Rank Correlations (LRCs) within Clusters of experimental units (patients, etc.) who have been relatively well-matched on their baseline X-confounder characteristics. The resulting distribution of LTD/LRC effect-size estimates can be interpreted much like a Bayesian posterior. Yet these distributions have been formed, via Nonparametric and Unsupervised Preprocessing, in purely Objective Ways.

Details

Package:	NU.Learning
Type:	Package
Version:	1.5
Date:	2023-09-15
License:	GPL-2

UNSUPERVISED LOCAL TREATMENT DIFFERENCES or LOCAL RANK CORRELATIONS:

Multiple calls to ltdagg(K) or lrcagg(K) for varying numbers of clusters, K, are typically made after first invoking NUcluster() to hierarchically cluster patients in X-space and invoking NUsetup() to specify a numeric y-Outcome variable and a numeric treatment choice or exposure level measure, trex.

UNSUPERVISED INSTRUMENTAL VARIABLES = LOCAL AVERAGE y-OUTCOME EFFECTS:

An OBSERVED Propensity Score (PS) is defined here to be either (i) the local (within-cluster) fraction of experimental units (patients) receiving trex==1 (new) rather than trex==0 (control) or else (ii) a measure of "relative exposure" when the numeric trex measure has (many) more than 2 observed levels. Multiple calls to ivadj(K) for varying numbers of clusters, K, then yield alternative scatters of Local Average Outcomes (LAOs) for Clusters when plotted against their PS estimates and, thus, different possible linear fits or smooth.splines() yielding potentially different inferences about across-cluster Treatment or Exposure Effects.

CONFIRMATION and SENSITIVITY ANALYSES of LOCAL EFFECT-SIZE DISTRIBUTIONS:

For a given value of K = Number of Clusters requested, the output object from ltdagg(K) or lrcagg(K) can be input to confirm() to use (nonparametric) permutation theory to display visual evidence (empirical CDF comparisons) concerning the Question: "Does x-matching Truly Matter?" The NULL hypothesis here is that the x-Covariates used in Clustering / Matching of Experimental Units are actually IGNORABLE. Evidence against this hypothesis is provided when the observed LOCAL Effect-Size Distribution clearly deviates from the purely RANDOM, NULL distribution computed (to any desired precision) by randomly PERMUTING cluster ID labels across experimental units. Furthermore, the statistical significance of differences between the observed and random NULL distributions can be estimated using KSperm(), which simulates the random permutation distribution of the Kolmogorov-Smirnov D-statistic when many tied values occur in both distributions being compared. Finally, the NUcompare() function helps users of NU.Learning decide which Number of Clusters, K, optimizes Variance-Bias trade-offs. Larger values of K tend to yield smaller clusters with better matches and, thus, potentially reduced BIAS. On the other hand, smaller values of K usually yield local effect-size estimates with much lower Variability (higher Precision).

"Most-Like-Me" HISTOGRAMS for DOCTOR-PATIENT discussions of PERSONALIZED MEDICINE:

For a specified vector, xvec, of numerical values of the X-confounder variables used in the current CLUSTERING of eUnits, display histograms of observed LTD or LRC effect-sizes for (i) all available patients and (ii) for the specified number, NN, of "Nearest-Neighbors" in X-confounder space of the TARGET eUnit ...i.e. xvec defines "Me".

Author(s)

Bob Obenchain <wizbob@att.net>

References

McClellan M, McNeil BJ, Newhouse JP. (1994) Does More Intensive Treatment of Myocardial Infarction in the Elderly Reduce Mortality?: Analysis Using Instrumental Variables. JAMA 272: 859-866.

Obenchain RL. (2010) The Local Control Approach using JMP. Chapter 7 of Analysis of Observational Health Care Data using SAS, Cary, NC:SAS Press, pages 151-192.

Obenchain RL, Young SS. (2013) Advancing Statistical Thinking in Observational Health Care Research. J. Stat. Theory and Practice, 7: 456-469, doi:10.1080/15598608.2013.772821.

Lopiano KK, Obenchain RL, Young SS. (2014) Fair treatment comparisons in observational research. Statistical Analysis and Data Mining, 7: 376-384, doi:10.1002/sam.11235.

Obenchain RL. NU.Learning-vignette. (2023) NU.Learning_in_R.pdf http://localcontrolstatistics.org

Rosenbaum PR, Rubin RB. (1983) The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika 70: 41-55.

Rosenbaum PR, Rubin RB. (1984) Reducing Bias in Observational Studies Using Subclassification on a Propensity Score. JASA 79: 516-524.

Rubin DB. (1980) Bias reduction using Mahalanobis metric matching. Biometrics 36: 293-298.

Stuart EA. (2010) Matching Methods for Causal Inference: A Review and a Look Forward. Statistical Science 25: 1-21.

Simulate a p-value for the significance of the Kolmogorov-Smirnov D-statistic from confirm().

Description

For a given confirm() output object, KSperm() simulates the NULL distribution of LTDs or LRCs resulting from Purely Random Clusterings of experimental units within the parent data.frame. This NULL distribution is discrete because Local Effect-Size estimates are TIED within-clusters. The observed D-Statistic from confirm() is compared with new NULL order statistics computed by KSperm(), again using stats::ks.test. When KSperm() is called immediately after confirm() and the seed value used in confirm() is known, then both the simulated p-value and the additional NULL KS-D order statistics generated by KSperm() will all be reproducible.

Usage

KSperm(x, reps=100)

Arguments

x

An output object from confirm().

reps

This is the number of new NULL KS-D statistics to generated. Each experimental unit is used at most once within each full replication. No clusters will be empty, but some may be "uninformative".

Details

The observed value of the Kolmogorov-Smirnov D-statistic from confirm() is used here, but its "p.value" from ks.test() is not because it is badly biased downwards. This bias results because the distribution of LTDs or LRCs across clusters is always discrete, due to TIED values within clusters that typically also vary in size. Thus, KSperm() generates "reps" additional, independent, NULL values of KS-D and saves their order statistics. Finally, KSperm() compares the Observed KS-D from confirm() with its simulated NULL order statistics to estimate an appropriately "adjusted" p-value, pv.adj. Note that the simulated pv.adj value estimate cannot be less than 1/(reps).

Value

An output list object of class KSperm:

hiclus

Hierarchical clustering object created by the designated method.

dframe

Name of data.frame containing X, t & Y variables.

trtm

Name of numerical treatment/exposure variable.

yvar

Name of numerical y-Outcome variable.

Type

1 ==> LTDs, otherwise LRCs.

reps

Number of overall Replications, each with the same number, K, of requested clusters.

nclus

Number of clusters requested.

units

Number of experimental units or patients.

obsD

Observed numerical value of KS D-statistic from confirm()

Dvec

Vector of order statistics for simulated NULL KS D-statistics.

pv.adj

Simulated p-value adjusted for TIES within discrete LTD/LRC distributions.

Author(s)

Bob Obenchain <wizbob@att.net>

References

Obenchain RL. (2010) Local Control Approach using JMP. Chapter 7 of Analysis of Observational Health Care Data using SAS, Cary, NC:SAS Press, pages 151-192.

Obenchain RL. (2019) NU.Learning_in_R.pdf http://localcontrolstatistics.org

Internal LocalControlStrategy functions.

Description

Internal functions for the NU.Learning-package. Plot and print functions require the output object from a NU.Learning function as their first argument.

Author(s)

Bob Obenchain <wizbob@att.net>

Hierarchical Clustering of experimental units (such as patients) in X-covariate Space

Description

Form the full, hierarchical clustering tree (dendrogram) for all units (regardless of Treatment/Exposure status) using Mahalonobis distances computed from specified baseline X-covariate characteristics.

Usage

  NUcluster(dframe, xvars, method="ward.D")

Arguments

dframe

Name of data.frame containing baseline X covariates.

xvars

List of names of X variable(s).

method

Hierarchical Clustering Method of "diana", "ward.D", "ward.D2", "complete", "average", "mcquitty", "median" or "centroid".

Details

The first step in applying NU.Learning to data is to hierarchically cluster experimental units in baseline X-covariate space ...thereby creating "Blocks" of relatively well-matched units. NUcluster first calls stats::prcomp() to calculate Mahalanobis distances using standardized and orthogonal Principal Coordinates. NUcluster then uses either the divisive cluster::diana() method or one of seven agglomerative methods from stats::hclust() to compute a dendrogram tree. The hclust function is based on Fortran code contributed to STATLIB by F. Murtagh.

Value

An output list object of class NUcluster, derived from cluster::diana or stats::hclust.

dframe

Name of data.frame containing all baseline X-covariates.

xvars

List of 1 or more X-variable names.

method

Hierarchical Clustering Method: "diana", "ward.D", "ward.D2", "complete", "average", "mcquitty", "median" or "centroid".

hclobj

Hierarchical clustering object created by the designated method.

Author(s)

Bob Obenchain <wizbob@att.net>

References

Kaufman L, Rousseeuw PJ. (1990) Finding Groups in Data. An Introduction to Cluster Analysis. New York: John Wiley and Sons.

Kereiakes DJ, Obenchain RL, Barber BL, et al. (2000) Abciximab provides cost effective survival advantage in high volume interventional practice. Am Heart J 140: 603-610.

Murtagh F. (1985) Multidimensional Clustering Algorithms. COMPSTAT Lectures 4.

Obenchain RL. (2010) Local Control Approach using JMP. Chapter 7 of Analysis of Observational Health Care Data using SAS, Cary, NC:SAS Press, pages 151-192.

Rubin DB. (1980) Bias reduction using Mahalanobis metric matching. Biometrics 36: 293-298.

Examples

data(radon)
xvars  = c("obesity", "over65", "cursmoke")
hclobj = NUcluster(radon, xvars)  # ...using default method = "ward.D"
plot(hclobj)

Display NU Sensitivity Graphic for help in choice of K = Number of Clusters

Description

This function displays Box-Whisker diagrams that compare Treatment Effect-Size distributions for different values of K = Number of Clusters requested in X-covariate space. After an initial call to NUsetup(), the analyst typically makes multiple calls to either ltdagg() or lrcagg() for different values of K. The analyst then invokes NUcompare() to see how choice of K changes the location, spread and/or skewness of the distribution of Treatment Effect-Size estimates across Clusters. Variance-Bias trade-offs occur as K increases; large values of K may reduce Bias, but they definitely inflate the Variance of LTD and LRC distributions.

Usage

  NUcompare(envir)

Arguments

envir

R environment output by an earlier call to NUsetup().

Details

The third phase of NU.Learning is called EXPLORE and uses graphical Sensitivity Analyses to show how Treatment Effect-Size distributions change with choice of NU parameter settings. Choice of K = Number of Clusters requested is guided, primarily, by NUcompare() graphics. Equally important are the analyst's choices of (i) which [and how many] of the available baseline X-covariates to "adjust for" and (ii) which clustering algorithm and dissimilarity metric to use. Unfortunately, changing these latter choices requires the analyst to essentially "start over" ...i.e. invoking NUcluster() with changed arguments, followed by an invocation of NUsetup() with a different 1st argument. To change only one's choice of y-Outcome variable and/or the Treatment/Exposure variable, a new NUsetup() invocation is all that is needed.

Value

NULL

Author(s)

Bob Obenchain <wizbob@att.net>

References

Obenchain RL. (2010) Local Control Approach using JMP. Chapter 7 of Analysis of Observational Health Care Data using SAS, Cary, NC:SAS Press, pages 151-192.

Obenchain RL. (2015) NU_Confirm_Guidelines.pdf http://localcontrolstatistics.org

Obenchain RL. (2023) NU.Learning_in_R.pdf http://localcontrolstatistics.org

Rubin DB. (1980) Bias reduction using Mahalanobis metric matching. Biometrics 36: 293-298.

Tukey JW. (1977) Exploratory Data Analysis, New York: Addison-Wesley, Section 2C.

Examples

  
  # Running takes more than 7 seconds...
  data(pci15k)
  xvars   = c("stent", "height", "female", "diabetic", "acutemi", "ejfract", "ves1proc")
  hclobj  = NUcluster(pci15k, xvars)
  NU.env  = NUsetup(hclobj, pci15k, thin, surv6mo)
  surv050 = ltdagg( 50, NU.env)
  surv100 = ltdagg(100, NU.env)
  surv200 = ltdagg(200, NU.env)
  NUcompare(NU.env)

Specify KEY parameters used in NU.Learning to "design" analyses of Observational Data.

Description

Invoke NUsetup() to specify the name of the Hierarchical Clustering object output by NUcluster() and the name of the data.frame containing all desired X-covariates, the Treatment/Exposure variable and the Y-Outcome variable. It is ESSENTIAL to save the Environment output by NUsetup() as a named object within the user's .GlobalEnv space.

Usage

NUsetup(hclobj, dframe, trex, yvar)

Arguments

hclobj

Name of a NUcluster() output object created using a cluster::diana or stats::hclust method.

dframe

Name of the data.frame containing all X-covariates, the Treatment/Exposure variable and the Y-Outcome variable.

trex

Name of the numerical Treatment/Exposure variable.

yvar

Name of the numerical Y-Outcome variable.

Value

The environment output by NUsetup() must be saved to the user's .GlobalEnv space. It's contents will be automatically updated by calls to other NU.Learning functions:

aggdf

data.frame with 4 columns and 1 row for each call to ltdagg() or lrcagg().

aggdf$Label

Factor value of "LTD" or "LRC".

aggdf$Blocks

K = integer Number of Clusters requested.

aggdf$LTDmean or aggdf$LRCmean

numerical value of cluster mean of LTD or LRC estimates.

aggdf$LTDstde or aggdf$LRCstde

numerical value of the within-cluster standard deviation.

boxdf

data.frame of 2 variables ...for input to boxplot() by NUcompare().

boxdf$NUstat

LTD or LRC estimate for a single experimental unit from ltdagg() or lrcagg().

boxdf$K

Number of Cluters used in forming the LTD or LRC estimate for each Experimental Unit.

Kmax

Maximum Number of Clusters so that Average Size will be >= 12 experimental units.

LTDmax or LRCmax

Maximum Treatment Effect-Size estimate across Clusters.

LTDmin or LRCmin

Minimum Treatment Effect-Size estimate across Clusters.

NumLevels

Integer number of distinct Levels of the Treatment/Exposure variable: trex.

pars

Character data.frame with 4 columns and 1 row.

pars[1, 1]

Name of the diana or hclust object created by NUcluster().

pars[1, 2]

Name of data.frame containing the X, Treatment/Exposure and Y variables.

pars[1, 3]

Name of Treatment/Exposure variable within data.frame pars[1,2].

pars[1, 4]

Name of Y-outcome variable within data.frame pars[1,2].

Author(s)

Bob Obenchain <wizbob@att.net>

References

Obenchain RL. (2010) Local Control Approach using JMP. Chapter 7 of Analysis of Observational Health Care Data using SAS, Cary, NC:SAS Press, pages 151-192.

Obenchain RL. (2023) NU.Learning_in_R.pdf http://localcontrolstatistics.org

Examples

  
  # Running takes about 7 seconds...
  data(pci15k)
  xvars  = c("stent", "height", "female", "diabetic", "acutemi", "ejfract", "ves1proc")
  hclobj = NUcluster(pci15k, xvars)
  NUe    = NUsetup(hclobj, pci15k, thin, surv6mo)
  ls.str(NUe)

Confirm that Clustering in Covariate X-space yields an "adjusted" LTD/LRC effect-size Distribution

Description

For a given Number of Clusters, K, confirm() compares the observed distribution of LTDs or LRCs from relatively well-matched experimental units with the corresponding distribution from Purely Random Clusterings of experimental units. The larger are differences between the (blue) observed empirical CDF of effect-sizes and the (red) Purely Random CDF, the more potentially IMPORTANT are the "adjustments" resulting from focussing upon clustering (matching) of experimental units in X-space.

Usage

confirm(x, reps=100, seed=12345)

Arguments

x

An output object from ltdagg() or lrcagg() for a specified number of clusters, K.

reps

Number of simulation Replications, each with the same number, K, and sizes, N1, N2, ..., NK of Purely Random clusters.

seed

This (arbitrary) integer argument will be passed to the R set.seed() function. Knowing the value of this seed makes the output from confirm() reproducible.

Details

Making calls to confirm() for ltdagg() or lrcagg() objects resulting from different choices of K = Numbers of Clusters help the analyst decide which observed LTD or LRC effect-size distributions are (or are not) meaningfully different from Purely Random. When the X-covariates used in NUcluster() are truly "ignorable," then [i] all X-based clusters will be Purely Random, and [ii] both the number (K) and the sizes (N1, N2, ...,NK) of clusters formed will be meaningless and arbitrary. Thus the NU Strategy confirm() function simulates the empirical CDF for LTDs or LRCs resulting from purely random permutations of the Cluster ID numbers (1, 2, ...,K) assigned by ltdagg() or lrcagg(). Each permutation yields K artificial "clusters" of sizes N1, N2, ..., NK. Simulation results are accumulated for the total number of random permutations specified in the "reps=" argument of confirm(). Calls to print.confirm() and plot.confirm() provide information on comparisons of empirical CDFs for the Observed and Purely Random LTD/LRC distributions, including calculation of an observed two-sample Kolmogorov-Smirnov D-statistic using stats::ks.test. This is a non-standard use of ks.test() because the distributions being compared are DISCRETE; both contain many within-cluster TIED effect-size estimates. The p-value computed by ks.test() is not reported or saved because it is badly biased downwards due to TIED estimates. Researchers wishing to simulate a p-value for the observed KS D-statistic that is adjusted for TIES can invoke KSperm(confirm()).

Value

An output list object of class confirm:

hiclus

Hierarchical clustering object created by the designated method.

dframe

Name of data.frame containing X, trex & Y variables.

trtm

Name of numerical trex variable.

yvar

Name of numerical Y-outcome variable.

reps

Number of overall Replications, each with the same numbers of requested clusters.

seed

Integer argument passed to set.seed(). Knowing which seed value was used in the call to confirm() makes not only the NULL distribution of observed LTDs or LRCs reproducible but also makes the NULL distribution of D-statistics (adjusted for ties) from a subsequent call to KSperm() reproducible.

nclus

Number of clusters requested.

units

Number of experimental units or patients.

Type

1 ==> LTDs, otherwise LRCs.

NUmean

Weighted Local Mean across Clusters.

NUstde

Weighted Std. Error across Clusters.

RPmean

Weighted Random Permutation Mean across Clusters.

RPstde

Weighted Random Permutation Std. Error across Clusters.

KSobsD

Output from print(ks.test()).

NUdist

data.frame of 5 key variables for all experimental units.

dfconf

data.frame of lstat = LTD or LRC values of max(length) = reps*units.

Author(s)

Bob Obenchain <wizbob@att.net>

References

Obenchain RL. (2010) The Local Control Approach using JMP. Chapter 7 of Analysis of Observational Health Care Data using SAS, Cary, NC:SAS Press, pages 151-192.

Obenchain RL. (2023) NU.Learning_in_R.pdf http://localcontrolstatistics.org

Instrumental Variable LAO Fitting and Smoothing

Description

For a given number of patient clusters in baseline X-covariate space and a specified Y-outcome variable, smooth the distribution of Local Average Outcomes (LAOs) plotted versus Within-Cluster Propensity-like Scores: the Treatment Selection Fraction or the Relative Exposure Level.

Usage

  ivadj(x)

Arguments

x

An output object from ltdagg() or lrcagg() using K Clusters in X-covariate space.

Details

Multiple invocations of ivadj(ltdagg()) or ivadj(lrgagg()) using varying numbers of clusters, K, can be made. Each invocation of ivadj() displays a linear lm() fit and a smooth.spline() fit to the scatter of LAO estimates plotted versus their within-cluster propensity-like score estimates.

Value

An output list object of class ivadj:

hclobj

Name of clustering object output by NUcluster().

dframe

Name of data.frame containing X, trtm & Y variables.

trtm

Name of the numeric treatment variable.

yvar

Name of the numeric outcome Y variable.

K

Number of Clusters Requested.

actclust

Number of Clusters actually produced.

Author(s)

Bob Obenchain <wizbob@att.net>

References

McClellan M, McNeil BJ, Newhouse JP. (1994) Does More Intensive Treatment of Myocardial Infarction in the Elderly Reduce Mortality?: Analysis Using Instrumental Variables. JAMA 272: 859-866.

Obenchain RL. (2010) Local Control Approach using JMP. Chapter 7 of Analysis of Observational Health Care Data using SAS, Cary, NC:SAS Press, pages 151-192.

Obenchain RL. (2023) NU.Learning_in_R.pdf http://localcontrolstatistics.org

Rosenbaum PR, Rubin RB. (1983) The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika 70: 41-55.

Examples

  
  # Running takes about 7 seconds...
  data(pci15k)
  xvars = c("stent", "height", "female", "diabetic", "acutemi", "ejfract", "ves1proc")
  hclobj = NUcluster(pci15k, xvars)
  NU.env = NUsetup(hclobj, pci15k, thin, surv6mo)
  surv050 = ltdagg(50, NU.env)
  iv050 = ivadj(surv050)
  iv050
  plot(iv050)

Calculate the observed Distribution of LRCs in NU.Learning

Description

For a given number, K, of Clusters of Experimental Units in baseline X-covariate space, lrcagg() calculates the observed distribution of "Local Rank Correlations" (LRCs) across Clusters ...where each LRC = cor(trex, Y, method = "spearman") within a Cluster, trex is a numeric measure of Exposure, and Y is a numeric measure of Outcome.

Usage

  lrcagg(K, envir)

Arguments

K

Number of Clusters in baseline X-covariate space.

envir

R environment output by a previous call to NUsetup().

Details

Multiple calls to lrcagg(K) for varying numbers of clusters, K, are typically made after first invoking NUcluster() to hierarchically cluster patients in X-space and then invoking NUsetup() to specify a Y Outcome variable and a continuous, numerical treatment Exposure: trex. lrcagg() computes an observed LRC Distribution, updates information stored in its envir object, and outputs an object that is typically saved in the user's .GlobalEnv to allow subsequent use by print.lrcagg(), plot.lrcagg(), confirm() or KSperm(). Uninformative Clusters (those containing only 1 or 2 experimental units) contribute NA values to the LRCtabl$LRC and LRCdist$LRC objects within the lrcagg() output list.

Value

An output list of 12 objects, of class lrcagg:

hclobj

Name of clustering dendrogram object created by NUcluster().

dframe

Name of data.frame containing X, trex & Y variables.

trex

Name of numerical treatment/exposure level variable.

yvar

Name of outcome Y variable.

K

Number of Clusters Requested.

actclust

Number of Clusters delivered.

LRCtabl

data.frame with 5 columns and K rows for Clusters.

LRCtabl$c

Cluster ID Factor, "1", "2", ..., "K".

LRCtabl$LRC

Numerical value of Local Treatment Difference for a Cluster.

LRCtabl$w

Integer value of "weight" = Cluster Size.

LRCtabl$LAO

Numerical value of within-cluster Local Average Outcome (Y-value).

LRCtabl$PS

Numerical value of Local Relative Propensity for Exposure, 0.0 to 1.0.

LRCdist

data.frame with 5 columns and same number of rows as the data: dframe.

LRCdist$c.K

Cluster ID Variable of the form: "c.K"

LRCdist$ID

Observation ID Variable for the rows of the input dframe.

LRCdist$y

Numerical values of Y-Outcomes for Experimental Units.

LRCdist$t

Numerical values of Treatment-Exposure Levels for Experimental Units.

LRCdist$LRC

Numerical values of the LRC for the Cluster containing each Unit.

infoclus

Integer value of Number of Informative Clusters.

infounits

Integer value of Number of Units within Informative Clusters.

LRCmean

Numerical value of mean(LRCdist$LRC) = Weighted Average of LRCtabl$LRC values.

LRCstde

Numerical value of sqrt(var(LRCdist$LRC)) = Weighted Standard Deviation of LRCtabl$LRC values.

Author(s)

Bob Obenchain <wizbob@att.net>

References

Obenchain RL. (2010) The Local Control Approach using JMP. Chapter 7 of Analysis of Observational Health Care Data using SAS, Cary, NC:SAS Press, pages 151-192.

Obenchain RL. (2019) NU.Learning_in_R.pdf http://localcontrolstatistics.org

Examples

    data(radon)
    xvars = c("obesity", "over65", "cursmoke")
    hclobj = NUcluster(radon, xvars)
    e = NUsetup(hclobj, radon, lnradon, lcanmort)
    lrc050 = lrcagg(50, e)
    lrc050
    plot(lrc050, e)

Calculate the Observed Distribution of LTDs in NU.Learning

Description

For a given number, K, of Clusters of Experimental Units in baseline X-covariate space, ltdagg() calculates the observed distribution of "Local Treatment Differences" (LTDs) of the form LTD = (( mean(Y) for units receiving trtm==1 ) - ( mean(Y) for units receiving trtm==0 )).

Usage

  ltdagg(K, envir)

Arguments

K

Number of Clusters in baseline X-covariate space.

envir

R environment output by a previous call to NUsetup().

Details

Multiple calls to ltdagg(K) for varying numbers of clusters, K, are typically made after first invoking NUcluster() to hierarchically cluster patients in X-space and then invoking NUsetup() to specify a Y Outcome variable and a two-level, numerical treatment variable: trtm. ltdagg() computes an observed LTD Distribution, updates information stored in its envir object, and outputs an object that is typically saved in the user's .GlobalEnv to allow subsequent use by print.ltdagg(), plot.ltdagg(), confirm() or KSperm(). Uninformative Clusters (those containing either only trtm==1 units or else only trtm==0 units) contribute NA values to the LTDtabl$LTD and LTDdist$LTD objects within the ltdagg() output list object.

Value

An output list of 12 objects, of class ltdagg:

hiclus

Name of clustering object created by NUcluster().

dframe

Name of data.frame containing X, trtm & Y variables.

trtm

Name of treatment factor variable.

yvar

Name of outcome Y variable.

K

Number of Clusters Requested.

actclust

Number of Clusters delivered.

LTDtabl

data.frame with 5 columns and K rows for Clusters.

LTDtabl$c

Cluster ID Factor, "1", "2", ..., "K".

LTDtabl$LTD

Numerical value of Local Treatment Difference for a Cluster.

LTDtabl$w

Integer value of "weight" = Cluster Size.

LTDtabl$LAO

Numerical value of within-cluster Local Average Outcome (Y-value).

LTDtabl$PS

Numerical value of Propensity Score = Local Fraction of Experimental Units receiving trtm==1; 0.0 <= PS <= 1.0.

LTDdist

data.frame with 5 columns and same number of rows as the data: dframe.

LTDdist$c.K

Factor values within c("1", "2", ..., "K").

LTDdist$ID

Observation ID Variable for the rows of the input dframe.

LTDdist$y

Numerical value of the Y-Outcome for an Experimental Unit.

LTDdist$t

Numerical value of trtm (0 or 1) for an Experimental Unit.

LTDdist$LTD

Numerical value of the LTD for the Cluster containing each Exp. Unit.

infoclus

Integer value of Number of Informative Clusters.

infounits

Integer value of Number of Units within Informative Clusters.

LTDmean

Numerical value of mean(LTDdist$LTD) = Weighted Average of LTDtabl$LTD values.

LTDstde

Numerical value of sqrt(var(LTDdist$LTD)) = Weighted Standard Deviation of LTDtabl$LTD values.

Author(s)

Bob Obenchain <wizbob@att.net>

References

Obenchain RL. (2010) Local Control Approach using JMP. Chapter 7 of Analysis of Observational Health Care Data using SAS, Cary, NC:SAS Press, pages 151-192.

Obenchain RL. (2019) NU.Learning_in_R.pdf http://localcontrolstatistics.org

Examples

  
  # Running takes more than 7 seconds...
  data(pci15k)
  xvars = c("stent", "height", "female", "diabetic", "acutemi", "ejfract", "ves1proc")
  hclobj = NUcluster(pci15k, xvars)
  NUe = NUsetup(hclobj, pci15k, thin, surv6mo)
  surv050 = ltdagg(50, NUe)
  surv050
  plot(surv050, NUe)

Create a <<Most-Like-Me>> data.frame for a specified X-Confounder vector: xvec

Description

For a Given X-confounder Vector (xvec), sort all experimental units (eUnits) in an ltdagg() or lrcagg() output object into the strictly non-decreasing order of their distances from this X-Vector, which defines the TARGET eUnit: "Me". Plots of mlme() objects and displays of mlme.stats() are then used to Visualize and Summarize "Mini-" << LOCAL effect-size Distributions >> for different Numbers of "Nearest Neighbor" eUnits.

Usage

  mlme(envir, hcl, NUagg, xvec )

Arguments

envir

Environment output by a call to the NUsetup() function.

hcl

Name of a NUcluster() output object created using a cluster::diana or stats::hclust method.

NUagg

A data.frame object output by ltdagg() or lrcagg() containing LOCAL effect-size Estimates for eUnits within Clusters defined in X-covariate space.

xvec

A suitable vector of the Numerical values for the X-Confounder variables, used in the current CLUSTERING, that define the eUnit: "Me".

Details

For example, in demo(radon), the eUnits are 2881 US "Counties", and the NUagg object is of type lrcagg() because radon exposure is a continuous variable. But, in demo(pci15k), the eUnits are 15487 "Patients," and the NUagg object is of type ltdagg() because treatment choice (thin) is Binary (0 = "No", 1 = "Yes").

Value

An output list object of class mlme:

xvec

The xvec vector input to mlme().

Type

Either "LTD" or "LRC".

xvars

Names of the X-Confounder variables specified in NUsetup().

varx

The vector of Variances of the xvars variabes, used in rescaling distances.

outdf

The output data.frame of sorted "Nearest Neighbor" candidate eUnits.

Author(s)

Bob Obenchain <wizbob@att.net>

References

Obenchain RL. NU.Learning-vignette. (2023) NU.Learning_in_R.pdf http://localcontrolstatistics.org

Examples

  
  # Running takes about 7 seconds...
  data(pci15k)
  xvars = c("stent", "height", "female", "diabetic", "acutemi", "ejfract", "ves1proc")
  hclobj = NUcluster(pci15k, xvars)
  NU.env = NUsetup(hclobj, pci15k, thin, surv6mo)
  surv0500 = ltdagg(500, NU.env)
  xvec11870 = c( 0, 162, 1, 1, 0, 57, 1) 
  mlmeC5H = mlme(envir = NU.env, hcl = hclobj, NUagg = surv0500, xvec = xvec11870 ) 	
  plot(mlmeC5H) # using default "NN" and "breaks" settings...

Print Summary Statistics for One or More "Most-Like-Me" Histogram Pairs.

Description

Print Summary Statistics for Local effect-size (LTD or LRC) Distributions associated with given Numbers of "Nearest-Neighbors" in X-confounder Space.

Usage

mlme.stats(x, NN = 50, ...)

Arguments

x

An object output by mlme.data().

NN

Number(s) of "Nearest Neighbors" displayed in Histogram(s). NN can be either a single integer like NN = 40 or a combination of integers like NN = c( 50, 250, 2500 ).

...

Other arguments passed on to print().

Value

NULL

Author(s)

Bob Obenchain <wizbob@att.net>

Six-month Survival, Cardiac cost and Baseline Covariate data for 15,487 PCI patients.

Description

Using observational data on 996 patients who received a Percutaneous Coronary Intervention (PCI) at Ohio Heart Health, Lindner Center, Christ Hospital, Cincinnati (Kereiakes et al, 2000), we generated this much larger dataset via "plasmode simulation."

Usage

data(pci15k)

Format

A data frame of 11 variables on 15,487 patients; no NAs.

patid: Patient ID number: 1 to 15487.
surv6mo: Binary PCI Survival variable: 1 => Survival for at least 6 months following PCI, 0 => Survival for less than 6 months.
cardcost: Cardiac related costs incurred within 6 months of patient's initial PCI; numeric value in 1998 dollars; costs were truncated by death for the 404 patients with surv6mo == 0.
thin: Numeric treatment selection indicator: thin = 0 implies usual PCI care alone; thin = 1 implies usual PCI care augmented by either planned or rescue treatment with a new blood thinning agent.
stent: Coronary stent deployment; numeric, with 1 meaning YES and 0 meaning NO.
height: Height in centimeters; numeric integer from 133 to 198.
female: Female gender; numeric, with 1 meaning YES and 0 meaning NO.
diabetic: Diabetes mellitus diagnosis; numeric, with 1 meaning YES and 0 meaning NO.
acutemi: Acute myocardial infarction within the previous 7 days; numeric, with 1 meaning YES and 0 meaning NO.
ejfract: Left ejection fraction; numeric value from 17 percent to 77 percent.
ves1proc: Number of vessels involved in the patient's initial PCI procedure; numeric integer from 0 to 5.

References

Kereiakes DJ, Obenchain RL, Barber BL, et al. Abciximab provides cost effective survival advantage in high volume interventional practice. Am Heart J 2000; 140: 603-610.

Gadbury GL, Xiang Q, Yang L, Barnes S, Page GP, Allison DB. Evaluating Statistical Methods Using Plasmode Data Sets in the Age of Massive Public Databases: An Illustration Using False Discovery Rates. PLOS Genetics 2008; 4: 1-8, e1000098 (Open Access).

Obenchain RL. (2023) NU.Learning_in_R.pdf http://localcontrolstatistics.org

Examples

    data(pci15k)
    str(pci15k)

Display an Instrumental Variable (LAO) plot with Linear and smooth.spline Fits

Description

For a given number of patient clusters, K, in baseline X-covariate space and a specified Y-outcome variable, display the distribution of Local Average Outcomes (LAOs) plotted versus Within-Cluster Propensity-like Scores: Treatment Selection Fractions or Relative Exposure Levels.

Usage

  ## S3 method for class 'ivadj'
plot(x, maxsiz = 0.15, ...)

Arguments

x

An object output by ivadj() for K given Clusters in baseline X-covariate space.

maxsiz

Radius of the Circle plotting symbol for the largest Cluster. Usually < 0.6

...

Other arguments passed on to plot().

Value

NULL

Author(s)

Bob Obenchain <wizbob@att.net>

Display Visualizations of an Observed LRC Distribution in NU.Learning

Description

Display a Histogram, Box-Whisker Diagram and/or empirical Cumulative Distribution Function depicting the Observed Local Rank Correlation (LRC) Distribution across K Clusters.

Usage

  ## S3 method for class 'lrcagg'
plot(x, envir, show="all", breaks="Sturges", ...)

Arguments

x

An object output by lrcagg() for K = Number of Clusters in baseline X-covariate space.

envir

R environment output by a previous call to NUsetup().

show

Choice of "all", "seq", "hist", "boxp", or "ecdf".

breaks

Parameter setting for hist(); May be an integer value ...like 25 or 50.

...

Other arguments passed on to plot().

Value

NULL

Author(s)

Bob Obenchain <wizbob@att.net>

Display Visualizations of an Observed LTD Distribution in NU.Learning

Description

Display a Histogram, Box-Whisker Diagram and/or empirical Cumulative Distribution Function depicting the Observed Local Treatment Difference (LTD) Distribution across K Clusters.

Usage

  ## S3 method for class 'ltdagg'
plot(x, envir, show="all", breaks="Sturges", ...)

Arguments

x

An object output by ltdagg() for K = Number of Clusters in baseline X-covariate space.

envir

R environment output by a previous call to NUsetup().

show

Choice of "all", "seq", "hist", "boxp", or "ecdf".

breaks

Parameter setting for hist(); May be an integer value ...like 25 or 50.

...

Other arguments passed on to plot().

Value

NULL

Author(s)

Bob Obenchain <wizbob@att.net>

Display a Pair (or Pairs) of Histograms showing LOCAL effect-sizes for Patients "Most-Like-Me".

Description

Display Pair(s) of Histograms of Local effect-size (LTD or LRC) Distributions for a specified Number (or combinations of Numbers) of "Nearest-Neighbors in X-confounder Space.

Usage

  ## S3 method for class 'mlme'
plot(x, NN=50, breaks=50, ...)

Arguments

x

An object output by mlme().

NN

Number(s) of Nearest Neighbors displayed in Bottom Histogram(s). NN can be a single integer like NN = 40 or a combination of integers like NN = c( 50, 250, 2500 ).

breaks

Integer number of breaks in the Top Histogram for the full LTD or LRC distribution. Because the Bottom Histogram may include only a few Nearest Neighbors, it is always displayed using breaks = "Sturges".

...

Other arguments passed on to plot().

Value

NULL

Author(s)

Bob Obenchain <wizbob@att.net>

Particulate Matter, Mortality and Other data for 2980 US Counties

Description

This data.frame combines 122 variables from the 5 sources referenced below. Several PM variables appear to be predictions from EPA “CMAQ” models rather than values from validated measuring instruments. NU.Learning concepts are illustrated in demo(pmdata) using Clustering of 2973 Counties and Parishes within the contiguous 48 US States and Washington, D.C.

Usage

data(pmdata)

Format

This data.frame contains 122 variables for 2,980 US counties. A total of 738 "NA"s imply that only about two tenths of one percent of these 363,560 values are missing.

fips: Federal Information Processing Standard code; 4 or 5 digits; 2980 unique values
C50: Cluster ID Number between 1 and 50. Total of 50 unique values
LRC50: Local (Spearman) Rank Correlation between Bvoc and AACRmort within Cluster
County: County or Parish name is a Factor variable (character code)
State: State name is a 2-Character Factor code; 49 unique levels
Deaths: CDC: Total number of Deaths in the County in 2016
Population: CDC: Total population of County in 2016
CRmort: CDC: Crude Rate of Circulatory-Respiratory Mortality for the County in 2016
CrudeL95: CDC: Lower 95% confidence limit for Crude Rate of CR Mortality
CrudeU95: CDC: Upper 95% confidence limit for Crude Rate of CR Mortality
CrudeSE: CDC: Standard Error for Crude Rate of CR Mortality
AACRmort: CDC: Age Adjusted Rate of Circulatory-Respiratory Mortality in 2016
AACRL95: CDC: Lower 95% confidence limit for Age Adjusted Rate of CR Mortality
AACRU95: CDC: Upper 95% confidence limit for Age Adjusted Rate of CR Mortality
AACR_SE: CDC: Standard Error for Age Adjusted Rate of CR Mortality
TotDeathPct: CDC: Total Death Percentage - Circulatory-Respiratory Mortality in 2016
lat: EPA: County Latitude used in EPA "CMAQ" model calculations
long: EPA: County Longitude used in EPA "CMAQ" model calculations
RHpct: EPA: Relative Humidity Percentage for 2016
SFCtmpC: EPA: Surface Temperature in Degrees Centigrade for 2016
NO2: EPA: Nitrogen Dioxide level (NO2.ppbV) for 2016
O3: EPA: Ozone level (O3.ppbV) for 2016
pmCL: EPA: Chlorine level in Particulate Matter (PM25_CL.ugm3) for 2016
pmEC: EPA: Ethylene Carbonate level in Particulate Matter (PM25_EC.ugm3) for 2016
pmNA: EPA: Sodium level in Particulate Matter (PM25_NA.ugm3) for 2016
pmMG: EPA: Magnesium level in Particulate Matter (PM25_MG.ugm3) for 2016
pmK: EPA: Potassium level in Particulate Matter (PM25_K.ugm3) for 2016
pmCA: EPA: Calcium level in Particulate Matter (PM25_CA.ugm3) for 2016
pmNH4: EPA: Ammonium level in Particulate Matter (PM25_NH4.ugm3) for 2016
pmNO3: EPA: Nitrate level in Particulate Matter (PM25_NO3.ugm3) for 2016
pmOC: EPA: Organic Compounds in pmTOT [fine particulate matter] (PM25_OC.ugm3) for 2016
pmOM: EPA: OM compounds in pmTOT (PM25_OM.ugm3) for 2016
pmOTHR: EPA: Other Compounds in pmTOT (PM25_OTHR.ugm3) for 2016
pmSO4: EPA: Sulfate Compounds in pmTOT (PM25_SO4.ugm3) for 2016
pmFE: EPA: Ferrous Compounds in pmTOT (PM25_FE.ugm3) for 2016
pmSI: EPA: Silicon Compounds in pmTOT (PM25_SI.ugm3) for 2016
pmTI: EPA: Titanium Compounds in pmTOT (PM25_TI.ugm3) for 2016
pmMN: EPA: Manganese Compounds in pmTOT (PM25_MN.ugm3) for 2016
pmAL: EPA: Aluminum Compounds in pmTOT (PM25_AL.ugm3) for 2016
pmUNSPCRS: EPA: UNSPCRS Compounds in pmTOT (PM25_UNSPCRS.ugm3) for 2016
pmPOA: EPA: Primary Organic Aerosols in pmTOT (PM25_POA.ugm3) for 2016
pmSOA: EPA: Secondary Organic Aerosols in pmTOT (PM25_SOA.ugm3) for 2016; pmSOA = Avoc+Bvoc
pmGLY: EPA: Glycemic Secondary Organic Aerosols in pmTOT (PM25_GLYSOA.ugm3) for 2016
pmOLGB: EPA: OLGB compounds in pmTOT (PM25_OLGB.ugm3) for 2016
pmISOP: EPA: ISOP compounds in pmTOT (PM25_ISOP.ugm3) for 2016
pmEPOX: EPA: EPOX compounds in pmTOT (PM25_EPOX.ugm3) for 2016
pmSQT: EPA: SQT compounds in pmTOT (PM25_SQT.ugm3) for 2016
pmMTN: EPA: MTN compounds in pmTOT (PM25_MTN.ugm3) for 2016
pmMT: EPA: MT compounds in pmTOT (PM25_MT.ugm3) for 2016
pmTOT: EPA: Total (fine) Particulate Matter (PM25_TOT.ugm3) for 2016
SFCtmpK: EPA: Surface Temperature in Degrees Kelvin for 2016
CardioRes: CDC: Cardio Respiratory Rate (rate I00J98.per100000.cdc) for 2016
POPcdc: CDC: County Population (population.cdc) for 2016
POP5yracs: CDC: 5yracs Population (population.people.5yracs) for 2016
PREMdeath: Premature Deaths per 100K Residents ...UWPHI for 2018
POFHealth: Poor or Fair Health rate (Poor.or.fair.health) ...UWPHI for 2018
PPHdays: Poor Physical Health days (Poor.physical.health.days) ...UWPHI for 2018
PMHdays: Poor Mental Health days (Poor.mental.health.days) ...UWPHI for 2018
LBW: Low Birth Weight rate (Low.birthweight) ...UWPHI for 2018
ASmoke: Adult Smoking Percentage (Adult.smoking) ...UWPHI for 2018
AObes: Adult Obesity Percentage (Adult.obesity) ...UWPHI for 2018
FEnv: Food Environment Index (Food.environment.index) ...UWPHI for 2018
PhysInAct: Physical Inactivity (Physical.inactivity) ...UWPHI for 2018
ExercOPS: Access to Exercise Opportunities ...UWPHI for 2018
ExsDrink: Excessive Drinking Rate (Excessive.drinking) ...UWPHI for 2018
AIDrivD: Alcohol Impaired Driving Deaths ...UWPHI for 2018
STInfect: Sexually Transmitted Infections ...UWPHI for 2018
TBirths: Teenage Births (Teen.births) ...UWPHI for 2018
Uninsur: Uninsured Residences (Uninsured) ...UWPHI for 2018
PCDocs: Primary Care Physicians (Primary.care.physicians) ...UWPHI for 2018
Dentists: Dentists (Dentists) ...UWPHI for 2018
PrevntHS: Preventable Hospital Stays (Preventable.hospital.stays) ...UWPHI for 2018
DiabMNT: Diabetes Monitoring (Diabetes.monitoring) ...UWPHI for 2018
MammoSC: Mammography Screening (Mammography.screening) ...UWPHI for 2018
SomCOL: Some College Education (Some.college) ...UWPHI for 2018
UnEMP: Unemployment Rate (Unemployment) ...UWPHI for 2018
ChildPOV: Children Living in Poverty (Children.in.poverty) ...UWPHI for 2018
IncomIEQ: Income Inequality (Income.inequality) ...UWPHI for 2018
ChildSPH: Children In Single-Parent Households ...UWPHI for 2018
SocASOC: Social Associations (Social.associations) ...UWPHI for 2018
VioCRM: Violent Crime Rate (Violent.crime) ...UWPHI for 2018
InjyDths: Injury Death Rate (Injury.deaths) ...UWPHI for 2018
AirPolpm: Air Pollution Particulate Matter ...UWPHI for 2018
DrnkWtVi: Drinking Water Violations (Drinking.water.violations) ...UWPHI for 2018
SevrHOUS: Severe Housing Problems (Severe.housing.problems) ...UWPHI for 2018
DrivATW: Driving Alone to Work (Driving.alone.to.work) ...UWPHI for 2018
LComutA: Long Commute - Driving Alone to Work ...UWPHI for 2018
PAAM: Premature Age Adjusted Mortality ...UWPHI for 2018
FrqPhysD: Frequent Physical Distress ...UWPHI for 2018
FrqMentD: Frequent Mental Distress ...UWPHI for 2018
DiabPrev: Diabetes Prevalence (Diabetes.prevalence) ...UWPHI for 2018
FoodInSec: Food Insecurity (Food.insecurity) ...UWPHI for 2018
LimAHFood: Limited Access to Healthy Foods ...UWPHI for 2018
DrugOdDM: Drug Overdose Deaths Model predictions ...UWPHI for 2018
InsufSlp: Insufficient Sleep (Insufficient.sleep) ...UWPHI for 2018
UnInsAds: Uninsured Adults (Uninsured.adults) ...UWPHI for 2018
UnInsCls: Uninsured Children (Uninsured.children) ...UWPHI for 2018
HCareCost: Health Care Costs (Health.care.costs) ...UWPHI for 2018
OthPrimCP: Other Primary Care Providers ...UWPHI for 2018
MHHIncome: Median Household Income ...UWPHI for 2018
ChildFRPL: Children Eligible for Free or Reduced-Price Lunch ...UWPHI for 2018
Population: County Population (Population) ...UWPHI for 2018
AGELess18: Residents below 18 Years of Age ...UWPHI for 2018
A65oOVR: Residents 65 or Older (X..65.and.older) ...UWPHI for 2018
NHispAfA: Non-Hispanic African-American Residents ...UWPHI for 2018
AmINalsN: American Indian or Alaskan Natives ...UWPHI for 2018
Asian: Asian Residents (X..Asian) ...UWPHI for 2018
NHawOPI: Native Hawaiian and Other Pacific Islanders ...UWPHI for 2018
Hispanic: Hispanic Residents (X..Hispanic) ...UWPHI for 2018
NHispWht: Non-Hispanic White Residents ...UWPHI for 2018
LoProEngl: Low Proficiency in English (not.proficient.in.English) ...UWPHI for 2018
Females: Female Residents ...UWPHI for 2018
Rural: Rural Residents ...UWPHI for 2018
pmOA: EPA: Organic Aerosols in pmTOT (PM25_OA.ugm3) for 2016
Avoc: EPA: Anthroprogenic [man-made] Volatile Organic Compounds in pmTOT for 2016
pmSEAspry: EPA: Sea Spray components in pmTOT (PM25_SOAAVOC.ugm3) for 2016
pmDUST: EPA: Dust components in pmTOT (PM25_DUST.ugm3) for 2016
pmNH4NO3: EPA: Ammonium Nitrate components in pmTOT (PM25_NH4NO3.ugm3) for 2016
pmSOOT: EPA: Soot components in pmTOT (PM25_SOOT.ugm3) for 2016
isop: EPA: SOA Isoprenes (PM25_SOAISOPRENE.ugm3) for 2016
terp: EPA: SOA Terpenes (PM25_SOATERPENE.ugm3) for 2016
Bvoc: EPA: Biogenic (natural) Volatile Organic Compounds for 2016; Bvoc = isop + terp

References

Obenchain RL. and Young SS. (2022), EPA Particulate Matter Data - Analyses using Local Control Strategy. (24 pages, 22 figures) https://doi.org/10.48550/arXiv.2209.05461

Pye, H., Ward-Caviness, C., Murphy, B., Appel, K., and Seltzer, K. (2021). Secondary organic aerosol association with cardiorespiratory disease mortality in the united states. Nature Communications, 12.7215 https://doi.org/10.1038/s41467-021-27484-1

Pye, H. [EPA] (2021), Data For Secondary Organic Aerosol and Cardiorespiratory Disease Mortality. https://doi.org/10.5281/zenodo.5713903

University of Wisconsin, Population Health Institute. https://uwphi.pophealth.wisc.edu [UWPHI] UWPHI@med.wisc.edu

Young SS. and Obenchain RL. (2022), "EPA particulate matter data...Analyses using Local Control Strategy" https://doi.org/10.5061/dryad.63xsj3v58

Examples

    data(pmdata)
    str(pmdata, list.len=122)

Print Summary Statistics on Local effect-size Estimates for Patients "Most-Like-Me".

Description

Display "Most-Like-Me" Summary Statistics for LOCAL effect-size (LTD or LRC) Distributions of "Nearest-Neighbors" in X-confounder Space.

Usage

  ## S3 method for class 'mlme'
print(x, ...)

Arguments

x

An object output by mlme().

...

Other arguments passed on to print().

Value

NULL

Author(s)

Bob Obenchain <wizbob@att.net>

Radon exposure and lung cancer mortality data for 2,881 US counties in 46 States.

Description

Federal EPA and state government agencies have been reporting observational data at the US County level since about 1980. The data given here include 5 potential X-confounder variables of the relationship between lung cancer mortality and radon exposure; they were amassed and checked by Goran Krstic, Fraser Health Authority, Vancouver, BC, Canada.

Usage

data(radon)

Format

A data frame of 11 variables for 2881 US counties. One Missing Value; row 778 for Shannon County, SD, fips == 46113, has hhincome == NA.

fips: County FIPS code. Codes are 4 or 5 digit integers; 2881 unique values.
state: State Factor variable (2-character codes); 46 unique levels.
county: County or Parish Factor variable (character codes); 1703 unique levels.
lcanmort: Lung Cancer Mortality rate (deaths per 100,000 person-years), 1980-2004.
radon: County Radon Exposure level in picocuries per liter (pCi/L) for some unspecified period within 1986-1992; rounded to nearest single decimal place.
lnradon: Natural logarithm of County Radon Exposure level. Radon levels reported as 0.0 for 10 US counties are Windsorized here to ln(0.05), which is roughly -3.
obesity: Percentage of County Residents considered Obese (age adjusted), 2008.
over65: Percentage of County Residents of Age 65 and over, 2000 Census.
cursmoke: Percentage of County Residents who Currently Smoke, 1997-2003.
evrsmoke: Percentage of County Residents who Ever Smoked, 1997-2003.
hhincome: Average Median HouseHold Income in Thousands ($1,000), 1989-2004.

References

Krstic G, Obenchain RL. (2016) Radon dataset documentation and downloads. http://localcontrolstatistics.org

Obenchain RL. (2018) RADON_short.pdf http://localcontrolstatistics.org 40 PPT Slides and Commentary in Notes Pages format.

Examples

    data(radon)
    str(radon)

Create a data.frame for use in Prediction of a LTD/LRC effect-size Distribution

Description

reveal.data() forms a data.frame by sorting and appending the LTD or LRC exposure effect-size measures from ltdagg() or lrcagg() – as well as a Cluster membership-number variable – to a copy of the data.frame specified in NUsetup(). In the fourth and final REVEAL Phase of NU.Learning, a stretch-goal is to predict variation in LTD/LRC effect-size distributions using the known (baseline) X-covariate characteristics of experimental units. For example, the data.frame output by reveal.data() is suitable for input to party::ctree() as well as to a number of other "less Visual" prediction methods available in R.

Usage

reveal.data(x, clus.var="Clus", effe.var="eSiz")

Arguments

x

An output object resulting from a call to ltdagg() or lrcagg().

clus.var

Quoted NAME for the Cluster-Number variable.

effe.var

Quoted NAME for the LTD/LRC effect-size variable.

Value

The desired data.frame:

outdf

A data.frame containing clus.var, effe.var plus (X, trex & Y) variables.

Author(s)

Bob Obenchain <wizbob@att.net>

References

Obenchain RL. (2023) NU.Learning_in_R.pdf http://localcontrolstatistics.org

NU.Learning: Nonparametric and Unsupervised Adjustment for Bias and Confounding

Description

Details

Author(s)

References

Simulate a p-value for the significance of the Kolmogorov-Smirnov D-statistic from confirm().

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Internal LocalControlStrategy functions.

Description

Author(s)

Hierarchical Clustering of experimental units (such as patients) in X-covariate Space

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Display NU Sensitivity Graphic for help in choice of K = Number of Clusters

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Specify KEY parameters used in NU.Learning to "design" analyses of Observational Data.

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Confirm that Clustering in Covariate X-space yields an "adjusted" LTD/LRC effect-size Distribution

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Instrumental Variable LAO Fitting and Smoothing

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Calculate the observed Distribution of LRCs in NU.Learning

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Calculate the Observed Distribution of LTDs in NU.Learning

Description

Usage

Arguments

Details