Running an R Script with BOA#

This notebook demonstrates how to:

Write a basic Wrapper script in R and have BOA launch your optimization using the BOA CLI interface. See instructions for creating a model wrapper for more details about creating a wrapper script. You can also look at instructions for configurations files for more details on creating a configuration file.

Configuration File Overview#

config.yaml

optimization_options:
    objective_options:
        objectives:
            - name: metric
    experiment:
        name: "r_streamlined_run"
    trials: 15

parameters:
    x0:
        'bounds': [ 0, 1 ]
        'type': 'range'
        'value_type': 'float'
    x1:
        'bounds': [ 0, 1]
        'type': 'range'
        'value_type': 'float'
    x2:
        'bounds': [ 0, 1 ]
        'type': 'range'
        'value_type': 'float'
    x3:
        'bounds': [ 0, 1]
        'type': 'range'
        'value_type': 'float'
    x4:
        'bounds': [ 0, 1 ]
        'type': 'range'
        'value_type': 'float'
    x5:
        'bounds': [ 0, 1]
        'type': 'range'
        'value_type': 'float'

script_options:
    # notice here that this is a shell command
    # this is what BOA will do to launch your script
    # it will also pass as a command line argument the current trial directory
    # that is being parameterized

    # This can either be a relative path or absolute path
    # (by default when BOA launches from a config file
    # it uses the config file directory as your working directory)
    # here config.yaml and run_model.R are in the same directory
    run_model: Rscript run_model.R


# options only needed by the model and not BOA
# You can put anything here that your model might need
# We don't need anything extra so we leave it commented out
# model_options:
    # the_question: 42

Wrapper Run Script#

run_model.R

# load in any libraries and modules we need
library(jsonlite)
source("../r_utils/hartman6.R")

# This is where we read in from BOA the command line argument.
# If in your script, you use any other command line arguments,
# generally BOA's trial_dir should be the last command line arugment,
# so taking the last one should generally be safe.
args <- commandArgs(trailingOnly=TRUE)
trial_dir <- args[length(args)]

# this this trial_dir folder there are 2 files supplied by BOA,
# a parameters.json that has just the parameters, and a trial.json
# that includes the parameters and a lot more in case you need it.
# Most people will only need the parameters.json
param_path <- file.path(trial_dir, "parameters.json")
data <- read_json(path=param_path)

# The parameter keys config from whatever you  named them in your
# config file, which you are free to change.
x0 <- data$x0
x1 <- data$x1
x2 <- data$x2
x3 <- data$x3
x4 <- data$x4
x5 <- data$x5
X <- c(x0, x1, x2, x3, x4, x5)

# This is where we actually run our "model".
# Here we are using a synthetic function called hartman6
# But you could substitute it for your own model in
# a number of ways.
res <- hartman6(X)

# In this case, we directly ran the model, so we are getting back a number
# or nan, so we know if it succeeded or failed. If you are submitting a job
# to an HPC (a super computer) queue, this might work, or you might have to
# rely on another method. Other options could be relying on log file output
# or information from querying the queue itself,
# though those may be better as stand alone `Set Trial Status Scripts`
if (!is.na(res)) {

    # if it was a success, we don't even need to write out trial status,
    # it is assumed a success if we write out data and don't fail
    out_data <- list(
        metric=res
        # trial_status=unbox("COMPLETED")  #  this is optional if it succeeds
    )

} else {

    # If we fail, then we do need to include a trial status, and mark it as failed.
    out_data <- list(
        trial_status=unbox("FAILED")
    )
}

json_data <- toJSON(out_data, pretty = TRUE)
write(json_data, file.path(trial_dir, "output.json"))

We also use a function called hartman6 which is a 6 dimensional version of the synthetic hartman function as the stand in for our model function. The code is below. You would substitute this for any call your model, be it local call to your own R model, a system call to a fortran model wrapped in your R script, or perhaps a some code that launches a HPC job and collects the results.

hartman6.R

hartman6 <- function(X) {
     out <- tryCatch(
     {
          alpha <- c(1.0, 1.2, 3.0, 3.2)
          A <- c(10, 3, 17, 3.5, 1.7, 8,
                 0.05, 10, 17, 0.1, 8, 14,
                 3, 3.5, 1.7, 10, 17, 8,
                 17, 8, 0.05, 10, 0.1, 14)
          A <- matrix(A, 4, 6, byrow=TRUE)
          P <- 10^(-4) * c(1312, 1696, 5569, 124, 8283, 5886,
                           2329, 4135, 8307, 3736, 1004, 9991,
                           2348, 1451, 3522, 2883, 3047, 6650,
                           4047, 8828, 8732, 5743, 1091, 381)
          P <- matrix(P, 4, 6, byrow=TRUE)

          Xmat <- matrix(rep(X,times=4), 4, 6, byrow=TRUE)
          inner_sum <- rowSums(A[,1:6]*(Xmat-P[,1:6])^2)
          outer_sum <- sum(alpha * exp(-inner_sum))
          y <- -outer_sum
          return(y)
     },
       error=function(cond) {
            return(NA)
        }
     )
    return(out)
}

Running our script#

To run our script we just need to path the config file to BOA’s CLI

python -m boa --config-file path/to/config.yaml

python -m boa -c path/to/config.yaml

[WARNING 08-09 18:51:27] ax.service.utils.with_db_settings_base: Ax currently requires a sqlalchemy version below 2.0. This will be addressed in a future release. Disabling SQL storage in Ax for now, if you would like to use SQL storage please install Ax with mysql extras via `pip install ax-platform[mysql]`.
[INFO 08-09 18:51:28] ax.service.utils.instantiation: Created search space: SearchSpace(parameters=[RangeParameter(name='x0', parameter_type=FLOAT, range=[0.0, 1.0]), RangeParameter(name='x1', parameter_type=FLOAT, range=[0.0, 1.0]), RangeParameter(name='x2', parameter_type=FLOAT, range=[0.0, 1.0]), RangeParameter(name='x3', parameter_type=FLOAT, range=[0.0, 1.0]), RangeParameter(name='x4', parameter_type=FLOAT, range=[0.0, 1.0]), RangeParameter(name='x5', parameter_type=FLOAT, range=[0.0, 1.0])], parameter_constraints=[]).
[INFO 08-09 18:51:28] ax.modelbridge.dispatch_utils: Using Models.GPEI since there are more ordered parameters than there are categories for the unordered categorical parameters.
[INFO 08-09 18:51:28] ax.modelbridge.dispatch_utils: Calculating the number of remaining initialization trials based on num_initialization_trials=None max_initialization_trials=None num_tunable_parameters=6 num_trials=None use_batch_trials=False
[INFO 08-09 18:51:28] ax.modelbridge.dispatch_utils: calculated num_initialization_trials=12
[INFO 08-09 18:51:28] ax.modelbridge.dispatch_utils: num_completed_initialization_trials=0 num_remaining_initialization_trials=12
[INFO 08-09 18:51:28] ax.modelbridge.dispatch_utils: Using Bayesian Optimization generation strategy: GenerationStrategy(name='Sobol+GPEI', steps=[Sobol for 12 trials, GPEI for subsequent trials]). Iterations after 12 will take longer to generate due to model-fitting.
[INFO 08-09 18:51:28] Scheduler: `Scheduler` requires experiment to have immutable search space and optimization config. Setting property immutable_search_space_and_opt_config to `True` on experiment.
[INFO 2023-08-09 18:51:28,752 MainProcess] boa: 

##############################################


BOA Experiment Run
Output Experiment Dir: [/path/to/your/dir/]/r_streamlined_run_20230809T185128
Start Time: 20230809T185128
Version: 0.8.8.dev0+gd6e453f.d20230809

##############################################

[INFO 08-09 18:51:28] Scheduler: Running trials [0]...
[INFO 08-09 18:51:30] Scheduler: Running trials [1]...
[INFO 08-09 18:51:31] Scheduler: Running trials [2]...
[INFO 08-09 18:51:32] Scheduler: Running trials [3]...
[INFO 08-09 18:51:33] Scheduler: Running trials [4]...
[INFO 08-09 18:51:34] Scheduler: Running trials [5]...
[INFO 08-09 18:51:36] Scheduler: Running trials [6]...
[INFO 08-09 18:51:37] Scheduler: Running trials [7]...
[INFO 08-09 18:51:38] Scheduler: Running trials [8]...
[INFO 08-09 18:51:39] Scheduler: Running trials [9]...
[INFO 08-09 18:51:41] Scheduler: Retrieved COMPLETED trials: 0 - 9.
[INFO 08-09 18:51:41] Scheduler: Fetching data for trials: 0 - 9.
[INFO 2023-08-09 18:51:41,229 MainProcess] boa: Saved JSON-serialized state of optimization to `[/path/to/your/dir/]/r_streamlined_run_20230809T185128/scheduler.json`.
Boa version: 0.8.8.dev0+gd6e453f.d20230809
[INFO 2023-08-09 18:51:41,247 MainProcess] boa: Saved optimization parametrization and objective to `[/path/to/your/dir/]/r_streamlined_run_20230809T185128/optimization.csv`.
[INFO 2023-08-09 18:51:41,261 MainProcess] boa: Trials so far: 10
Running trials: 
Will Produce next trials from generation step: Sobol
Best trial so far: {3: {'metric': -0.8066}}
[INFO 08-09 18:51:41] Scheduler: Running trials [10]...
[INFO 08-09 18:51:42] Scheduler: Running trials [11]...
[INFO 08-09 18:51:45] Scheduler: Running trials [12]...
[INFO 08-09 18:51:46] ax.modelbridge.torch: The observations are identical to the last set of observations used to fit the model. Skipping model fitting.
[INFO 08-09 18:51:48] Scheduler: Running trials [13]...
[INFO 08-09 18:51:49] ax.modelbridge.torch: The observations are identical to the last set of observations used to fit the model. Skipping model fitting.
[INFO 08-09 18:51:51] Scheduler: Running trials [14]...
[INFO 08-09 18:51:52] Scheduler: Retrieved COMPLETED trials: 10 - 14.
[INFO 08-09 18:51:52] Scheduler: Fetching data for trials: 10 - 14.
[INFO 2023-08-09 18:51:52,889 MainProcess] boa: Saved JSON-serialized state of optimization to `[/path/to/your/dir/]/r_streamlined_run_20230809T185128/scheduler.json`.
Boa version: 0.8.8.dev0+gd6e453f.d20230809
[INFO 2023-08-09 18:51:52,907 MainProcess] boa: Saved optimization parametrization and objective to `[/path/to/your/dir/]/r_streamlined_run_20230809T185128/optimization.csv`.
[INFO 2023-08-09 18:51:52,923 MainProcess] boa: Trials so far: 15
Running trials: 
Will Produce next trials from generation step: GPEI
Best trial so far: {14: {'metric': -1.0227}}
[INFO 2023-08-09 18:51:52,939 MainProcess] boa: Saved JSON-serialized state of optimization to `[/path/to/your/dir/]/r_streamlined_run_20230809T185128/scheduler.json`.
Boa version: 0.8.8.dev0+gd6e453f.d20230809
[INFO 2023-08-09 18:51:52,957 MainProcess] boa: Saved optimization parametrization and objective to `[/path/to/your/dir/]/r_streamlined_run_20230809T185128/optimization.csv`.
[INFO 2023-08-09 18:51:52,972 MainProcess] boa: Trials so far: 15
Running trials: 
Will Produce next trials from generation step: GPEI
Best trial so far: {14: {'metric': -1.0227}}
[INFO 2023-08-09 18:51:53,001 MainProcess] boa: 

##############################################

Trials Completed!
BOA Experiment Run
Output Experiment Dir: [/path/to/your/dir/]/r_streamlined_run_20230809T185128
Start Time: 20230809T185128
Version: 0.8.8.dev0+gd6e453f.d20230809
End Time: 20230809T185152
Total Run Time: 24.220675230026245

    trial_index arm_name trial_status  ...        x3        x4        x5
0             0      0_0    COMPLETED  ...  0.629275  0.137796  0.515604
1             1      1_0    COMPLETED  ...  0.602257  0.457678  0.505580
2             2      2_0    COMPLETED  ...  0.930436  0.081166  0.682230
3             3      3_0    COMPLETED  ...  0.213464  0.156547  0.999392
4             4      4_0    COMPLETED  ...  0.889072  0.460256  0.870208
5             5      5_0    COMPLETED  ...  0.507055  0.845420  0.125827
6             6      6_0    COMPLETED  ...  0.355159  0.336532  0.021124
7             7      7_0    COMPLETED  ...  0.657354  0.645192  0.058214
8             8      8_0    COMPLETED  ...  0.198422  0.875165  0.143583
9             9      9_0    COMPLETED  ...  0.255256  0.807304  0.604236
10           10     10_0    COMPLETED  ...  0.538395  0.418263  0.199643
11           11     11_0    COMPLETED  ...  0.957110  0.653261  0.482938
12           12     12_0    COMPLETED  ...  0.217348  0.181392  0.974875
13           13     13_0    COMPLETED  ...  0.176546  0.106343  1.000000
14           14     14_0    COMPLETED  ...  0.286782  0.250568  1.000000

[15 rows x 11 columns]

##############################################