Oftentimes the best way to keep code working is to just not touch it. And while even the best, most stable code can’t escape tweaking forever, there are some types of changes that can at least be made without even opening an otherwise stable and static codebase, assuming it’s been set up to allow that.

Suppose for example that we have an estimation pipeline that runs every year. In most years there are no changes to the methods or the structure of inputs/outputs, but every year there are some unavoidable changes to recode specifications. In this scenario, we have to be able to update the process but ideally in a way that minimizes both the effort required to QC the changes and the probability that something breaks. We can minimize the breakage potential by not opening the code at all, and we can minimize QC time by extracting only the affected code into a parameter file.

Here are two ways to do that.

Option 1 - A Separate Script

In this option, we can store the recode logic in a separate R script. Here we define two new recodes, cyl.rec and mpg.rec based on the mtcars data frame. The rules are stored in vectors with each vector position containing, as strings, individual case_when() conditions and assignments.

library(tidyverse)

#this part can exist in a separate script
parms<-tribble(
  ~newvar,     ~rules,
  "cyl.rec",   c("cyl==4~1","cyl==6~2","cyl==8~3"),
  "mpg.rec",   c("mpg<15~'very bad'","mpg<20~'bad'","mpg<25~'good'","TRUE~'very good'")
)

We then have a static codebase that walks over the parameter file, creating recodes according to whatever code is found there.

To achieve this, we utilize purrr::pwalk() to iterate over the parameter file parms, applying for each row an anonymous function that creates the recode corresponding to that row.

The recode is created by injecting parms$newvar as the new variable name, and splicing (via !!!) the vector of conditions from parms$rules into the body of case_when(). Notably, for each iteration, cars is read in from the global environment, the recode is created, and cars is written to the global environment. Alternatively, we could create within the function body a data frame containing only the newly-defined column, capture them across iterations in a list (using purrr::pmap() instead of purrr::pwalk()) and column bind the list along with cars. I’ve done it both ways, but I prefer the global environment overwrite approach used below.

#this part represents a static codebase that would follow a source() call 
# to the parameter file-generating script

cars<-mtcars %>%
  rownames_to_column("car")

pwalk(
  parms
  ,function(newvar,rules,df=cars){

    df.name<-deparse(substitute(df))
    
    df %>%
      mutate(!!newvar:=case_when(!!!rlang::parse_exprs(rules))) %>% 
      assign(df.name,.,envir=globalenv())
    
  }
)

select(cars,car,cyl,cyl.rec,mpg,mpg.rec)

                   car cyl cyl.rec  mpg   mpg.rec
1            Mazda RX4   6       2 21.0      good
2        Mazda RX4 Wag   6       2 21.0      good
3           Datsun 710   4       1 22.8      good
4       Hornet 4 Drive   6       2 21.4      good
5    Hornet Sportabout   8       3 18.7       bad
6              Valiant   6       2 18.1       bad
7           Duster 360   8       3 14.3  very bad
8            Merc 240D   4       1 24.4      good
9             Merc 230   4       1 22.8      good
10            Merc 280   6       2 19.2       bad
11           Merc 280C   6       2 17.8       bad
12          Merc 450SE   8       3 16.4       bad
13          Merc 450SL   8       3 17.3       bad
14         Merc 450SLC   8       3 15.2       bad
15  Cadillac Fleetwood   8       3 10.4  very bad
16 Lincoln Continental   8       3 10.4  very bad
17   Chrysler Imperial   8       3 14.7  very bad
18            Fiat 128   4       1 32.4 very good
19         Honda Civic   4       1 30.4 very good
20      Toyota Corolla   4       1 33.9 very good
21       Toyota Corona   4       1 21.5      good
22    Dodge Challenger   8       3 15.5       bad
23         AMC Javelin   8       3 15.2       bad
24          Camaro Z28   8       3 13.3  very bad
25    Pontiac Firebird   8       3 19.2       bad
26           Fiat X1-9   4       1 27.3 very good
27       Porsche 914-2   4       1 26.0 very good
28        Lotus Europa   4       1 30.4 very good
29      Ford Pantera L   8       3 15.8       bad
30        Ferrari Dino   6       2 19.7       bad
31       Maserati Bora   8       3 15.0       bad
32          Volvo 142E   4       1 21.4      good

Option 2 - Code Stored as Text in a Separate File (like a csv)

Option 2 does the same thing—creating recodes metaprogrammatically by storing the code as data—but may be a better fit if we want to store the code in text-based, tabular format rather than in an R script. This can be useful, for example, if we want someone who is a subject-matter expert but not an R programmer to write or review the recode code (in this case we could even break down the conditions in the parameter file further to strip out the case_when() syntax and reassemble as necessary in the static codebase).

#this part can exist in .csv or .xlsx file
parms.alt<-tribble(
  ~newvar,     ~rules,
  "cyl.rec",   "cyl==4~1",
  "cyl.rec",   "cyl==6~2",
  "cyl.rec",   "cyl==8~3",
  "mpg.rec",   "mpg<15~'very bad'",
  "mpg.rec",   "mpg<20~'bad'",
  "mpg.rec",   "mpg<25~'good'",
  "mpg.rec",   "TRUE~'very good'"
)

The main difference on the static codebase side is that we group the parameter file by newvar and use group_walk() to apply our anonymous function after extracting the rules vector manually.

#this part represents a static codebase that would follow an ingestion step
# that reads in the parameter file from wherever it's stored

cars<-mtcars %>%
  rownames_to_column("car")

parms.alt %>%
  group_by(newvar) %>%
  group_walk(
    function(rules,group,df=cars){
      
      df.name<-deparse(substitute(df))
      
      newvar<-pull(group,newvar)
      rules<-pull(rules,rules)

      df %>%
        mutate(!!newvar:=case_when(!!!rlang::parse_exprs(rules))) %>%
        assign(df.name,.,envir=globalenv())
      
    }
  )

select(cars,car,cyl,cyl.rec,mpg,mpg.rec)

                   car cyl cyl.rec  mpg   mpg.rec
1            Mazda RX4   6       2 21.0      good
2        Mazda RX4 Wag   6       2 21.0      good
3           Datsun 710   4       1 22.8      good
4       Hornet 4 Drive   6       2 21.4      good
5    Hornet Sportabout   8       3 18.7       bad
6              Valiant   6       2 18.1       bad
7           Duster 360   8       3 14.3  very bad
8            Merc 240D   4       1 24.4      good
9             Merc 230   4       1 22.8      good
10            Merc 280   6       2 19.2       bad
11           Merc 280C   6       2 17.8       bad
12          Merc 450SE   8       3 16.4       bad
13          Merc 450SL   8       3 17.3       bad
14         Merc 450SLC   8       3 15.2       bad
15  Cadillac Fleetwood   8       3 10.4  very bad
16 Lincoln Continental   8       3 10.4  very bad
17   Chrysler Imperial   8       3 14.7  very bad
18            Fiat 128   4       1 32.4 very good
19         Honda Civic   4       1 30.4 very good
20      Toyota Corolla   4       1 33.9 very good
21       Toyota Corona   4       1 21.5      good
22    Dodge Challenger   8       3 15.5       bad
23         AMC Javelin   8       3 15.2       bad
24          Camaro Z28   8       3 13.3  very bad
25    Pontiac Firebird   8       3 19.2       bad
26           Fiat X1-9   4       1 27.3 very good
27       Porsche 914-2   4       1 26.0 very good
28        Lotus Europa   4       1 30.4 very good
29      Ford Pantera L   8       3 15.8       bad
30        Ferrari Dino   6       2 19.7       bad
31       Maserati Bora   8       3 15.0       bad
32          Volvo 142E   4       1 21.4      good

In either case, the recode changes are easy to QC and we eliminate the chance that we could break stable code by not even having to open it.

Citation

BibTeX citation:

@online{couzens2025,
  author = {Couzens, Lance},
  title = {Decoupling {Dynamic} {Code} from a {Static} {R} {Codebase}},
  date = {2025-03-20},
  url = {https://mostlyunoriginal.github.io/posts/2025-03-20-2-Options-Parameterizing-R-w-Code/},
  langid = {en}
}

For attribution, please cite this work as:

Couzens, Lance. 2025. “Decoupling Dynamic Code from a Static R Codebase.” March 20, 2025. https://mostlyunoriginal.github.io/posts/2025-03-20-2-Options-Parameterizing-R-w-Code/.