library(tidyverse)
#this part can exist in a separate script
<-tribble(
parms~newvar, ~rules,
"cyl.rec", c("cyl==4~1","cyl==6~2","cyl==8~3"),
"mpg.rec", c("mpg<15~'very bad'","mpg<20~'bad'","mpg<25~'good'","TRUE~'very good'")
)
Oftentimes the best way to keep code working is to just not touch it. And while even the best, most stable code can’t escape tweaking forever, there are some types of changes that can at least be made without even opening an otherwise stable and static codebase, assuming it’s been set up to allow that.
Suppose for example that we have an estimation pipeline that runs every year. In most years there are no changes to the methods or the structure of inputs/outputs, but every year there are some unavoidable changes to recode specifications. In this scenario, we have to be able to update the process but ideally in a way that minimizes both the effort required to QC the changes and the probability that something breaks. We can minimize the breakage potential by not opening the code at all, and we can minimize QC time by extracting only the affected code into a parameter file.
Here are two ways to do that.
Option 1 - A Separate Script
In this option, we can store the recode logic in a separate R script. Here we define two new recodes, cyl.rec
and mpg.rec
based on the mtcars
data frame. The rules are stored in vectors with each vector position containing, as strings, individual case_when()
conditions and assignments.
We then have a static codebase that walks over the parameter file, creating recodes according to whatever code is found there.
To achieve this, we utilize purrr::pwalk()
to iterate over the parameter file parms
, applying for each row an anonymous function that creates the recode corresponding to that row.
The recode is created by injecting parms$newvar
as the new variable name, and splicing (via !!!
) the vector of conditions from parms$rules
into the body of case_when()
. Notably, for each iteration, cars
is read in from the global environment, the recode is created, and cars
is written to the global environment. Alternatively, we could create within the function body a data frame containing only the newly-defined column, capture them across iterations in a list (using purrr::pmap()
instead of purrr::pwalk()
) and column bind the list along with cars
. I’ve done it both ways, but I prefer the global environment overwrite approach used below.
#this part represents a static codebase that would follow a source() call
# to the parameter file-generating script
<-mtcars %>%
carsrownames_to_column("car")
pwalk(
parmsfunction(newvar,rules,df=cars){
,
<-deparse(substitute(df))
df.name
%>%
df mutate(!!newvar:=case_when(!!!rlang::parse_exprs(rules))) %>%
assign(df.name,.,envir=globalenv())
}
)
select(cars,car,cyl,cyl.rec,mpg,mpg.rec)
car cyl cyl.rec mpg mpg.rec
1 Mazda RX4 6 2 21.0 good
2 Mazda RX4 Wag 6 2 21.0 good
3 Datsun 710 4 1 22.8 good
4 Hornet 4 Drive 6 2 21.4 good
5 Hornet Sportabout 8 3 18.7 bad
6 Valiant 6 2 18.1 bad
7 Duster 360 8 3 14.3 very bad
8 Merc 240D 4 1 24.4 good
9 Merc 230 4 1 22.8 good
10 Merc 280 6 2 19.2 bad
11 Merc 280C 6 2 17.8 bad
12 Merc 450SE 8 3 16.4 bad
13 Merc 450SL 8 3 17.3 bad
14 Merc 450SLC 8 3 15.2 bad
15 Cadillac Fleetwood 8 3 10.4 very bad
16 Lincoln Continental 8 3 10.4 very bad
17 Chrysler Imperial 8 3 14.7 very bad
18 Fiat 128 4 1 32.4 very good
19 Honda Civic 4 1 30.4 very good
20 Toyota Corolla 4 1 33.9 very good
21 Toyota Corona 4 1 21.5 good
22 Dodge Challenger 8 3 15.5 bad
23 AMC Javelin 8 3 15.2 bad
24 Camaro Z28 8 3 13.3 very bad
25 Pontiac Firebird 8 3 19.2 bad
26 Fiat X1-9 4 1 27.3 very good
27 Porsche 914-2 4 1 26.0 very good
28 Lotus Europa 4 1 30.4 very good
29 Ford Pantera L 8 3 15.8 bad
30 Ferrari Dino 6 2 19.7 bad
31 Maserati Bora 8 3 15.0 bad
32 Volvo 142E 4 1 21.4 good
Option 2 - Code Stored as Text in a Separate File (like a csv)
Option 2 does the same thing—creating recodes metaprogrammatically by storing the code as data—but may be a better fit if we want to store the code in text-based, tabular format rather than in an R script. This can be useful, for example, if we want someone who is a subject-matter expert but not an R programmer to write or review the recode code (in this case we could even break down the conditions in the parameter file further to strip out the case_when()
syntax and reassemble as necessary in the static codebase).
#this part can exist in .csv or .xlsx file
<-tribble(
parms.alt~newvar, ~rules,
"cyl.rec", "cyl==4~1",
"cyl.rec", "cyl==6~2",
"cyl.rec", "cyl==8~3",
"mpg.rec", "mpg<15~'very bad'",
"mpg.rec", "mpg<20~'bad'",
"mpg.rec", "mpg<25~'good'",
"mpg.rec", "TRUE~'very good'"
)
The main difference on the static codebase side is that we group the parameter file by newvar
and use group_walk()
to apply our anonymous function after extracting the rules
vector manually.
#this part represents a static codebase that would follow an ingestion step
# that reads in the parameter file from wherever it's stored
<-mtcars %>%
carsrownames_to_column("car")
%>%
parms.alt group_by(newvar) %>%
group_walk(
function(rules,group,df=cars){
<-deparse(substitute(df))
df.name
<-pull(group,newvar)
newvar<-pull(rules,rules)
rules
%>%
df mutate(!!newvar:=case_when(!!!rlang::parse_exprs(rules))) %>%
assign(df.name,.,envir=globalenv())
}
)
select(cars,car,cyl,cyl.rec,mpg,mpg.rec)
car cyl cyl.rec mpg mpg.rec
1 Mazda RX4 6 2 21.0 good
2 Mazda RX4 Wag 6 2 21.0 good
3 Datsun 710 4 1 22.8 good
4 Hornet 4 Drive 6 2 21.4 good
5 Hornet Sportabout 8 3 18.7 bad
6 Valiant 6 2 18.1 bad
7 Duster 360 8 3 14.3 very bad
8 Merc 240D 4 1 24.4 good
9 Merc 230 4 1 22.8 good
10 Merc 280 6 2 19.2 bad
11 Merc 280C 6 2 17.8 bad
12 Merc 450SE 8 3 16.4 bad
13 Merc 450SL 8 3 17.3 bad
14 Merc 450SLC 8 3 15.2 bad
15 Cadillac Fleetwood 8 3 10.4 very bad
16 Lincoln Continental 8 3 10.4 very bad
17 Chrysler Imperial 8 3 14.7 very bad
18 Fiat 128 4 1 32.4 very good
19 Honda Civic 4 1 30.4 very good
20 Toyota Corolla 4 1 33.9 very good
21 Toyota Corona 4 1 21.5 good
22 Dodge Challenger 8 3 15.5 bad
23 AMC Javelin 8 3 15.2 bad
24 Camaro Z28 8 3 13.3 very bad
25 Pontiac Firebird 8 3 19.2 bad
26 Fiat X1-9 4 1 27.3 very good
27 Porsche 914-2 4 1 26.0 very good
28 Lotus Europa 4 1 30.4 very good
29 Ford Pantera L 8 3 15.8 bad
30 Ferrari Dino 6 2 19.7 bad
31 Maserati Bora 8 3 15.0 bad
32 Volvo 142E 4 1 21.4 good
In either case, the recode changes are easy to QC and we eliminate the chance that we could break stable code by not even having to open it.
Citation
@online{couzens2025,
author = {Couzens, Lance},
title = {Decoupling {Dynamic} {Code} from a {Static} {R} {Codebase}},
date = {2025-03-20},
url = {https://mostlyunoriginal.github.io/posts/2025-03-20-2-Options-Parameterizing-R-w-Code/},
langid = {en}
}