The recodes() functions makes it very easy to recode one or more variables in the your data frame. The format is
newdata <- recodes(olddata, variables, from values, to values)
Consider the following data set (below). Lets make the following changes.
| sex | race | outcome | Q1 | Q2 | age | rating |
|---|---|---|---|---|---|---|
| 1 | b | better | 20 | 15 | 12 | 1 |
| 2 | w | worse | 30 | 23 | 20 | 2 |
| 1 | a | same | 44 | 18 | 33 | 5 |
| 2 | b | same | 15 | 86 | 55 | 3 |
| 2 | w | better | 50 | 99 | 30 | 4 |
| 2 | h | worse | 99 | 35 | 100 | 5 |
For sex, set 1 to “Male” and 2 to “Female”.
df <- recodes(data=df, vars="sex",
from=c(1,2), to=c("Male", "Female"))| sex | race | outcome | Q1 | Q2 | age | rating |
|---|---|---|---|---|---|---|
| Male | b | better | 20 | 15 | 12 | 1 |
| Female | w | worse | 30 | 23 | 20 | 2 |
| Male | a | same | 44 | 18 | 33 | 5 |
| Female | b | same | 15 | 86 | 55 | 3 |
| Female | w | better | 50 | 99 | 30 | 4 |
| Female | h | worse | 99 | 35 | 100 | 5 |
Recode race to “White” vs. “Other”.
df <- recodes(data=df, vars="race",
from=c("w", "b", "a", "h"),
to=c("White", "Other", "Other", "Other"))| sex | race | outcome | Q1 | Q2 | age | rating |
|---|---|---|---|---|---|---|
| Male | Other | better | 20 | 15 | 12 | 1 |
| Female | White | worse | 30 | 23 | 20 | 2 |
| Male | Other | same | 44 | 18 | 33 | 5 |
| Female | Other | same | 15 | 86 | 55 | 3 |
| Female | White | better | 50 | 99 | 30 | 4 |
| Female | Other | worse | 99 | 35 | 100 | 5 |
Recode outcome to 1 (better) vs. 0 (not better).
df <- recodes(data=df, vars="outcome",
from=c("better", "same", "worse"),
to=c(1, 0, 0))| sex | race | outcome | Q1 | Q2 | age | rating |
|---|---|---|---|---|---|---|
| Male | Other | 1 | 20 | 15 | 12 | 1 |
| Female | White | 0 | 30 | 23 | 20 | 2 |
| Male | Other | 0 | 44 | 18 | 33 | 5 |
| Female | Other | 0 | 15 | 86 | 55 | 3 |
| Female | White | 1 | 50 | 99 | 30 | 4 |
| Female | Other | 0 | 99 | 35 | 100 | 5 |
For Q1 and Q2 set values of 86 and 99 to missing.
df <- recodes(data=df, vars=c("Q1", "Q2"),
from=c(86, 99), to=NA)
#> Note: 'from' is longer than 'to', so 'to' was recycled.| sex | race | outcome | Q1 | Q2 | age | rating |
|---|---|---|---|---|---|---|
| Male | Other | 1 | 20 | 15 | 12 | 1 |
| Female | White | 0 | 30 | 23 | 20 | 2 |
| Male | Other | 0 | 44 | 18 | 33 | 5 |
| Female | Other | 0 | 15 | NA | 55 | 3 |
| Female | White | 1 | 50 | NA | 30 | 4 |
| Female | Other | 0 | NA | 35 | 100 | 5 |
For age, set values
You can use expressions in your from fields. When they are TRUE, the corresponding to values will be applied. We will use the dollar sign ($) to represent the variable (age in this case). The symbols ( |, & ) mean OR and AND respectively.
df <- recodes(data=df, vars="age",
from=c("$ < 20 | $ > 90",
"$ >= 20 & $ <= 30",
"$ > 30 & $ <= 50",
"$ > 50 & $ <= 90"),
to=c(NA, "Younger", "Middle Aged", "Older"))We can also write this as
df <- recodes(data=df, vars="age",
from=c("$ < 20", "$ <= 30", "$ <= 50", "$ <= 90", "$ > 90"),
to= c(NA, "Younger", "Middle Aged", "Older", "NA"))This works because once the age value for an observations meets a criteria that is TRUE (working left to right), it is recoded. It isn’t changed again by later criteria in the same recodes statement.
| sex | race | outcome | Q1 | Q2 | age | rating |
|---|---|---|---|---|---|---|
| Male | Other | 1 | 20 | 15 | NA | 1 |
| Female | White | 0 | 30 | 23 | Younger | 2 |
| Male | Other | 0 | 44 | 18 | Middle Aged | 5 |
| Female | Other | 0 | 15 | NA | Older | 3 |
| Female | White | 1 | 50 | NA | Younger | 4 |
| Female | Other | 0 | NA | 35 | NA | 5 |
Finally, for the rating variable, reverse the scoring so that 1 to 5 becomes 5 to 1.
df <- recodes(data=df, vars="rating", from=1:5, to=5:1)| sex | race | outcome | Q1 | Q2 | age | rating |
|---|---|---|---|---|---|---|
| Male | Other | 1 | 20 | 15 | NA | 5 |
| Female | White | 0 | 30 | 23 | Younger | 4 |
| Male | Other | 0 | 44 | 18 | Middle Aged | 1 |
| Female | Other | 0 | 15 | NA | Older | 3 |
| Female | White | 1 | 50 | NA | Younger | 2 |
| Female | Other | 0 | NA | 35 | NA | 1 |
Remember that recodes returns a data frame, not a variable.
df <- recodes(data=df, vars="rating", from=1:5, to=5:1) is correct.
df$rating <- recodes(data=df, vars="rating", from=1:5, to=5:1) is not.
This allows you to apply the same recoding scheme to more than one variable at a time (e.g., Q1 and Q2 above).
And that’s it (APPLAUSE, APPLAUSE)!