Title: | Forward Selection using Concordance/C-Index |
Version: | 0.2.0 |
Description: | Performs forward model selection, using the C-index/concordance in survival analysis models. |
License: | GPL-3 |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
Imports: | survival, dplyr, stats, magrittr, tibble |
URL: | https://github.com/muschellij2/cforward |
BugReports: | https://github.com/muschellij2/cforward/issues |
Depends: | R (≥ 2.10) |
Suggests: | testthat |
NeedsCompilation: | no |
Packaged: | 2025-04-01 15:41:46 UTC; johnmuschelli |
Author: | John Muschelli |
Maintainer: | John Muschelli <muschellij2@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-04-01 16:50:06 UTC |
Pipe operator
Description
See magrittr::%>%
for details.
Usage
lhs %>% rhs
Forward Selection Based on C-Index/Concordance
Description
Forward Selection Based on C-Index/Concordance
Usage
cforward(
data,
event_time = "event_time_years",
event_status = "mortstat",
weight_column = "WTMEC4YR_norm",
variables = NULL,
included_variables = NULL,
n_folds = 10,
seed = 1989,
max_model_size = 50,
c_threshold = NULL,
verbose = TRUE,
cfit_args = list(),
save_memory = FALSE,
...
)
cforward_one(
data,
event_time = "event_time_years",
event_status = "mortstat",
weight_column = "WTMEC4YR_norm",
variables,
included_variables = NULL,
verbose = TRUE,
cfit_args = list(),
save_memory = FALSE,
...
)
make_folds(data, event_status = "mortstat", n_folds = 10, verbose = TRUE)
Arguments
data |
A data set to perform model selection and cross-validation. |
event_time |
Character vector of length 1 with event times, passed to
|
event_status |
Character vector of length 1 with event status, passed to
|
weight_column |
Character vector of length 1 with weights for
model. If no weights are available, set to |
variables |
Character vector of variables to perform selection.
Must be in |
included_variables |
Character vector of variables
forced to have in the model. Must be in |
n_folds |
Number of folds for Cross-validation. If you want to run on the full data, set to 1 |
seed |
Seed set before folds are created. |
max_model_size |
maximum number of variables in the model. Selection will stop if reached. Note, this does not correspond to the number of coefficients, due to categorical variables. |
c_threshold |
threshold for concordance. If the difference in the best concordance and this one does not reach a certain threshold, break. |
verbose |
print diagnostic messages |
cfit_args |
Arguments passed to |
save_memory |
save only a minimal amount of information, discard the fitted models |
... |
Additional arguments to pass to |
Value
A list of lists, with elements of:
- full_concordance
Concordance when fit on the full data
- models
Cox model from full data set fit, stripped of large memory elements
- cv_concordance
Cross-validated Concordance
- included_variables
Variables included in the model, other than those being selection upon
Examples
variables = c("gender",
"age_years_interview", "education_adult")
res = cforward(nhanes_example,
event_time = "event_time_years",
event_status = "mortstat",
weight_column = "WTMEC4YR_norm",
variables = variables,
included_variables = NULL,
n_folds = 5,
c_threshold = 0.02,
seed = 1989,
max_model_size = 50,
verbose = TRUE)
conc = sapply(res, `[[`, "best_concordance")
res = cforward(nhanes_example,
event_time = "event_time_years",
event_status = "mortstat",
weight_column = "WTMEC4YR_norm",
variables = variables,
included_variables = NULL,
n_folds = 5,
seed = 1989,
max_model_size = 50,
verbose = TRUE)
conc = sapply(res, `[[`, "best_concordance")
threshold = 0.01
included_variables = names(conc)[c(1, diff(conc)) > threshold]
new_variables = c("diabetes", "stroke")
second_level = cforward(nhanes_example,
event_time = "event_time_years",
event_status = "mortstat",
weight_column = "WTMEC4YR_norm",
variables = new_variables,
included_variables = included_variables,
n_folds = 5,
seed = 1989,
max_model_size = 50,
verbose = TRUE)
second_conc = sapply(second_level, `[[`, "best_concordance")
result = second_level[[which.max(second_conc)]]
final_model = result$models[[which.max(result$cv_concordance)]]
Estimate Out-of-Sample Concordance
Description
Estimate Out-of-Sample Concordance
Usage
estimate_concordance(
train,
test = train,
event_time = "event_time_years",
event_status = "mortstat",
weight_column = "WTMEC4YR_norm",
all_variables = NULL,
cfit_args = list(),
...
)
Arguments
train |
A data set to perform model training. |
test |
A data set to estimate concordance, from fit model with |
event_time |
Character vector of length 1 with event times, passed to
|
event_status |
Character vector of length 1 with event status, passed to
|
weight_column |
Character vector of length 1 with weights for
model. If no weights are available, set to |
all_variables |
Character vector of variables to put in the
model. All must be in |
cfit_args |
Arguments passed to |
... |
Additional arguments to pass to |
Value
A list of concordance and the model fit with the training data
Example Data from National Health and Nutrition Examination Survey ('NHANES')
Description
Example Data from National Health and Nutrition Examination Survey ('NHANES')
Usage
nhanes_example
Format
A data.frame
with 7 columns, which are:
- SEQN
ID of participant
- mortstat
mortality status, 1-died, 0 - censored
- event_time_years
time observed
- WTMEC4YR_norm
weights normalized for survey
- gender
gender
- age_years_interview
age in years at interview
- education_adult
educational status