Running an R Script with BOA#

This notebook demonstrates how to:

Write a basic Wrapper script in R and have BOA launch your optimization using the BOA CLI interface. See instructions for creating a model wrapper for more details about creating a wrapper script. You can also look at instructions for configurations files for more details on creating a configuration file.

Configuration File Overview#

config.yaml

optimization_options:
    objective_options:
        objectives:
            - name: metric
    experiment:
        name: "r_streamlined_run"
    trials: 15

parameters:
    x0:
        'bounds': [ 0, 1 ]
        'type': 'range'
        'value_type': 'float'
    x1:
        'bounds': [ 0, 1]
        'type': 'range'
        'value_type': 'float'
    x2:
        'bounds': [ 0, 1 ]
        'type': 'range'
        'value_type': 'float'
    x3:
        'bounds': [ 0, 1]
        'type': 'range'
        'value_type': 'float'
    x4:
        'bounds': [ 0, 1 ]
        'type': 'range'
        'value_type': 'float'
    x5:
        'bounds': [ 0, 1]
        'type': 'range'
        'value_type': 'float'

script_options:
    # notice here that this is a shell command
    # this is what BOA will do to launch your script
    # it will also pass as a command line argument the current trial directory
    # that is being parameterized

    # This can either be a relative path or absolute path
    # (by default when BOA launches from a config file
    # it uses the config file directory as your working directory)
    # here config.yaml and run_model.R are in the same directory
    run_model: Rscript run_model.R


# options only needed by the model and not BOA
# You can put anything here that your model might need
# We don't need anything extra so we leave it commented out
# model_options:
    # the_question: 42

Wrapper Run Script#

run_model.R

# load in any libraries and modules we need
library(jsonlite)
source("../r_utils/hartman6.R")

# This is where we read in from BOA the command line argument.
# If in your script, you use any other command line arguments,
# generally BOA's trial_dir should be the last command line arugment,
# so taking the last one should generally be safe.
args <- commandArgs(trailingOnly=TRUE)
trial_dir <- args[length(args)]

# this this trial_dir folder there are 2 files supplied by BOA,
# a parameters.json that has just the parameters, and a trial.json
# that includes the parameters and a lot more in case you need it.
# Most people will only need the parameters.json
param_path <- file.path(trial_dir, "parameters.json")
data <- read_json(path=param_path)

# The parameter keys config from whatever you  named them in your
# config file, which you are free to change.
x0 <- data$x0
x1 <- data$x1
x2 <- data$x2
x3 <- data$x3
x4 <- data$x4
x5 <- data$x5
X <- c(x0, x1, x2, x3, x4, x5)

# This is where we actually run our "model".
# Here we are using a synthetic function called hartman6
# But you could substitute it for your own model in
# a number of ways.
res <- hartman6(X)

# In this case, we directly ran the model, so we are getting back a number
# or nan, so we know if it succeeded or failed. If you are submitting a job
# to an HPC (a super computer) queue, this might work, or you might have to
# rely on another method. Other options could be relying on log file output
# or information from querying the queue itself,
# though those may be better as stand alone `Set Trial Status Scripts`
if (!is.na(res)) {

    # if it was a success, we don't even need to write out trial status,
    # it is assumed a success if we write out data and don't fail
    out_data <- list(
        metric=res
        # trial_status=unbox("COMPLETED")  #  this is optional if it succeeds
    )

} else {

    # If we fail, then we do need to include a trial status, and mark it as failed.
    out_data <- list(
        trial_status=unbox("FAILED")
    )
}

json_data <- toJSON(out_data, pretty = TRUE)
write(json_data, file.path(trial_dir, "output.json"))

We also use a function called hartman6 which is a 6 dimensional version of the synthetic hartman function as the stand in for our model function. The code is below. You would substitute this for any call your model, be it local call to your own R model, a system call to a fortran model wrapped in your R script, or perhaps a some code that launches a HPC job and collects the results.

hartman6.R

hartman6 <- function(X) {
     out <- tryCatch(
     {
          alpha <- c(1.0, 1.2, 3.0, 3.2)
          A <- c(10, 3, 17, 3.5, 1.7, 8,
                 0.05, 10, 17, 0.1, 8, 14,
                 3, 3.5, 1.7, 10, 17, 8,
                 17, 8, 0.05, 10, 0.1, 14)
          A <- matrix(A, 4, 6, byrow=TRUE)
          P <- 10^(-4) * c(1312, 1696, 5569, 124, 8283, 5886,
                           2329, 4135, 8307, 3736, 1004, 9991,
                           2348, 1451, 3522, 2883, 3047, 6650,
                           4047, 8828, 8732, 5743, 1091, 381)
          P <- matrix(P, 4, 6, byrow=TRUE)

          Xmat <- matrix(rep(X,times=4), 4, 6, byrow=TRUE)
          inner_sum <- rowSums(A[,1:6]*(Xmat-P[,1:6])^2)
          outer_sum <- sum(alpha * exp(-inner_sum))
          y <- -outer_sum
          return(y)
     },
       error=function(cond) {
            return(NA)
        }
     )
    return(out)
}

Running our script#

To run our script we just need to path the config file to BOA’s CLI

boa --config-file path/to/config.yaml

or

boa -c path/to/config.yaml
[WARNING 07-13 14:30:49] ax.service.utils.with_db_settings_base: Ax currently requires a sqlalchemy version below 2.0. This will be addressed in a future release. Disabling SQL storage in Ax for now, if you would like to use SQL storage please install Ax with mysql extras via `pip install ax-platform[mysql]`.
[INFO 07-13 14:30:50] ax.service.utils.instantiation: Created search space: SearchSpace(parameters=[RangeParameter(name='x0', parameter_type=FLOAT, range=[0.0, 1.0]), RangeParameter(name='x1', parameter_type=FLOAT, range=[0.0, 1.0]), RangeParameter(name='x2', parameter_type=FLOAT, range=[0.0, 1.0]), RangeParameter(name='x3', parameter_type=FLOAT, range=[0.0, 1.0]), RangeParameter(name='x4', parameter_type=FLOAT, range=[0.0, 1.0]), RangeParameter(name='x5', parameter_type=FLOAT, range=[0.0, 1.0])], parameter_constraints=[]).
[INFO 07-13 14:30:50] ax.modelbridge.dispatch_utils: Using Models.GPEI since there are more ordered parameters than there are categories for the unordered categorical parameters.
[INFO 07-13 14:30:50] ax.modelbridge.dispatch_utils: Calculating the number of remaining initialization trials based on num_initialization_trials=None max_initialization_trials=None num_tunable_parameters=6 num_trials=None use_batch_trials=False
[INFO 07-13 14:30:50] ax.modelbridge.dispatch_utils: calculated num_initialization_trials=12
[INFO 07-13 14:30:50] ax.modelbridge.dispatch_utils: num_completed_initialization_trials=0 num_remaining_initialization_trials=12
[INFO 07-13 14:30:50] ax.modelbridge.dispatch_utils: Using Bayesian Optimization generation strategy: GenerationStrategy(name='Sobol+GPEI', steps=[Sobol for 12 trials, GPEI for subsequent trials]). Iterations after 12 will take longer to generate due to model-fitting.
[INFO 07-13 14:30:50] Scheduler: `Scheduler` requires experiment to have immutable search space and optimization config. Setting property immutable_search_space_and_opt_config to `True` on experiment.
[INFO 2023-07-13 14:30:50,865 MainProcess] boa: 

##############################################


BOA Experiment Run
Output Experiment Dir: [/path/to/your/dir/]/r_streamlined_run_20230713T143050
Start Time: 20230713T143050
Version: 0.8.7.dev4+gae30cf2.d20230713

##############################################

[INFO 07-13 14:30:50] Scheduler: Running trials [0]...
[INFO 07-13 14:30:52] Scheduler: Running trials [1]...
[INFO 07-13 14:30:54] Scheduler: Running trials [2]...
[INFO 07-13 14:30:56] Scheduler: Running trials [3]...
[INFO 07-13 14:30:57] Scheduler: Running trials [4]...
[INFO 07-13 14:30:59] Scheduler: Running trials [5]...
[INFO 07-13 14:31:00] Scheduler: Running trials [6]...
[INFO 07-13 14:31:01] Scheduler: Running trials [7]...
[INFO 07-13 14:31:04] Scheduler: Running trials [8]...
[INFO 07-13 14:31:07] Scheduler: Running trials [9]...
[INFO 07-13 14:31:08] Scheduler: Retrieved COMPLETED trials: 0 - 9.
[INFO 07-13 14:31:08] Scheduler: Fetching data for trials: 0 - 9.
[INFO 2023-07-13 14:31:08,745 MainProcess] boa: Saved JSON-serialized state of optimization to `[/path/to/your/dir/]/r_streamlined_run_20230713T143050/scheduler.json`.
Boa version: 0.8.7.dev4+gae30cf2.d20230713
[INFO 2023-07-13 14:31:08,803 MainProcess] boa: Trials so far: 10
Running trials: 
Will Produce next trials from generation step: Sobol
Best trial so far: {9: {'metric': -0.5032}}
[INFO 07-13 14:31:08] Scheduler: Running trials [10]...
[INFO 07-13 14:31:09] Scheduler: Running trials [11]...
[INFO 07-13 14:31:15] Scheduler: Running trials [12]...
[INFO 07-13 14:31:16] ax.modelbridge.torch: The observations are identical to the last set of observations used to fit the model. Skipping model fitting.
[INFO 07-13 14:31:21] Scheduler: Running trials [13]...
[INFO 07-13 14:31:22] ax.modelbridge.torch: The observations are identical to the last set of observations used to fit the model. Skipping model fitting.
[INFO 07-13 14:31:24] Scheduler: Running trials [14]...
[INFO 07-13 14:31:26] Scheduler: Retrieved COMPLETED trials: 10 - 14.
[INFO 07-13 14:31:26] Scheduler: Fetching data for trials: 10 - 14.
[INFO 2023-07-13 14:31:26,333 MainProcess] boa: Saved JSON-serialized state of optimization to `[/path/to/your/dir/]/r_streamlined_run_20230713T143050/scheduler.json`.
Boa version: 0.8.7.dev4+gae30cf2.d20230713
[INFO 2023-07-13 14:31:26,360 MainProcess] boa: Trials so far: 15
Running trials: 
Will Produce next trials from generation step: GPEI
Best trial so far: {12: {'metric': -0.6201}}
[INFO 2023-07-13 14:31:26,375 MainProcess] boa: Saved JSON-serialized state of optimization to `[/path/to/your/dir/]/r_streamlined_run_20230713T143050/scheduler.json`.
Boa version: 0.8.7.dev4+gae30cf2.d20230713
[INFO 2023-07-13 14:31:26,400 MainProcess] boa: Trials so far: 15
Running trials: 
Will Produce next trials from generation step: GPEI
Best trial so far: {12: {'metric': -0.6201}}
[INFO 2023-07-13 14:31:26,422 MainProcess] boa: 

##############################################

Trials Completed!
BOA Experiment Run
Output Experiment Dir: [/path/to/your/dir/]/r_streamlined_run_20230713T143050
Start Time: 20230713T143050
Version: 0.8.7.dev4+gae30cf2.d20230713
End Time: 20230713T143126
Total Run Time: 35.53495001792908

    trial_index arm_name trial_status  ...        x3        x4        x5
0             0      0_0    COMPLETED  ...  0.755183  0.548820  0.555407
1             1      1_0    COMPLETED  ...  0.441893  0.777394  0.218111
2             2      2_0    COMPLETED  ...  0.833717  0.685323  0.358227
3             3      3_0    COMPLETED  ...  0.337164  0.706950  0.410019
4             4      4_0    COMPLETED  ...  0.842163  0.989052  0.966872
5             5      5_0    COMPLETED  ...  0.081540  0.828128  0.128949
6             6      6_0    COMPLETED  ...  0.138704  0.598400  0.496507
7             7      7_0    COMPLETED  ...  0.723201  0.459715  0.961720
8             8      8_0    COMPLETED  ...  0.614263  0.018990  0.856079
9             9      9_0    COMPLETED  ...  0.324603  0.855542  0.100329
10           10     10_0    COMPLETED  ...  0.955197  0.150812  0.603905
11           11     11_0    COMPLETED  ...  0.646193  0.756861  0.555784
12           12     12_0    COMPLETED  ...  0.301131  0.897643  0.021916
13           13     13_0    COMPLETED  ...  0.379400  0.892975  0.053183
14           14     14_0    COMPLETED  ...  0.280032  0.808321  0.174706

[15 rows x 11 columns]

##############################################