Overview Information Here: boa.wrappers
- class boa.wrappers.script_wrapper.ScriptWrapper(config_path: Optional[PathLike] = None, config: Optional[BOAConfig] = None, setup=True, *args, **kwargs)[source]#
Bases:
BaseWrapperThis is the Wrapper that will control calling your scripts you specify in your configuration file.
On every script it calls, it will add an addition command line argument at the end that is the path to the trial directory for the trial that is being run (you can’t rely on the newest directory created since the trials are run in parallel). It will place a number of data json files in this directory for you to access that should include any and all information you need to run your scripts.
parameters.jsonincludes all of the parameters for that trial.trial.jsonincludes the complete json serialization of the current trial (including the parameters, this is usually more than you need, but has lots of information, such as the trial index (You also know that by the trial dir path you are passed), metric_properties.json which include the metric_properties you custom configure for any individual metric in your configuration. That last one is only available in the final stages when fetch_trial_status is being called.- Parameters
config_path (PathLike) –
config (BOAConfig) –
- write_configs(trial: BaseTrial) None[source]#
It can be convenient to separate our your writing out model configuration files from your run_model script. If this is the case, then if you include a script option in your configuration file to run this command, you can output whatever configuration files your model might need. Maybe your model needs certain configuration files in certain places, or your parameters create some files like NetCDF. Whatever it is, if you want to separate out your logic for creating the configuration for your model and running your model, write a script to do it, and put in your script_options section the command to run said command before the run_model command.
BOA will write out some data files for you to process the data.
- Parameters
trial (BaseTrial) –
- Return type
None
- run_model(trial: BaseTrial) None[source]#
This Script is the one that runs your model. If your model is in the same language as your wrapper, you might just directly run it in your wrapper, if it is in another language, you might call system commands or start a shell script in your wrapper of your language of choice to start your model, or maybe your start a batch job to a HPC to be collected later.
Certain models and wrapper combos have easy access to information about if the model succeeded or failed, For example, if you are running the model directly in your language and not as a batch job, you can do error handling to know if it failed or not. If you are running its own process, but also not as a batch job, it often will return an exit code to your model and if so, you can use that (0 for success, non 0 for various types of errors). If this is the case, It might be advised to directly right out your trial_status.json file, instead of in a different set_trial_status script. See
set_trial_status()for formatting and options- Parameters
trial (BaseTrial) –
- Return type
None
- set_trial_status(trial: BaseTrial) None[source]#
Marks the status of a trial to reflect the status of the model run for the trial.
To mark the trial status, first you can write out your data output to a output.json file with or without marking the trial status if you are marking as success as without the trial_status key as detailed below (if there is no trial_status.json file and there is no trial_status key inside the output.json file, which is the file that also contains the objective metrics data, it will assume it passed since you wrote out data). You can also directly write out a trial_status.json file, to do this write out a JSON file of a key being trial_status and the value being on of the below trial statuses. See below for the proper format.
Each script is passed a path to the current trial directory as a command line arg, that is also the directory you write the json file out to, calling it trial_status.json
Each trial will be polled periodically to determine its status (completed, failed, still running, etc). This function defines the criteria for determining the status of the model run for a trial (e.g., whether the model run is completed/still running, failed, etc). The trial status is updated accordingly when the trial is polled.
The approach for determining the trial status will depend on the structure of the particular model and its outputs. If your model is being ran directly in the same language or as a direct system call and not a submission to a batch job system, it might be able to set it easily in
run_model()Other methods can be checking the log files of your model for things like “run complete” and “run crashed” You can also check for output files, though if your model crashes, it can leave you just waiting as it never writes the files. So this is a less ideal option and should be paired with timeouts in BOA or your scripts- Parameters
trial (BaseTrial) – something something
- Return type
None
Relevant ENUM list
You can set it to either to text version, or the numerical equivalent
Text
Numerical Equivalent
FAILED
2
COMPLETED
3
ABANDONED
4
EARLY_STOPPED
7
Format
format for trial_status.json file
{ "trial_status": "COMPLETED" }format for output.json file
{ "obj_metric1": ..., # data for obj_metric1 "trial_status": "COMPLETED" }alternative format for output.json file if the trial succeeded you can skip marking it in json file (existence is enough to show completion if their is no trial_status file or trial_status key)
{ "obj_metric1": ..., # data for obj_metric1 }See also
run_model()# TODO add sphinx link to ax trial status
- fetch_trial_data(trial: BaseTrial, metric_properties: dict, *args, **kwargs) dict[source]#
Retrieves the trial data and prepares it for the metric(s) used in the objective function.
For example, for a case where you are minimizing the error between a model and observations, using RMSE as a metric, this function would load the model output and the corresponding observation data that will be passed to the RMSE metric.
The return value of this function is a dictionary, with keys that match the keys of the metric used in the objective function.
{ "mean": { "a": [-0.3691, 4.6544, 1.2675, -0.4327] } }
We use “mean” as the key in the above example, because we assumed the metric that was specified in the config under objectives was mean. mean is a wrapper around
numpy.mean(), which takes as an argument an array called a.Multiple metrics can be specified for a Multi Objective Optimization,
{ "mean": { "a": [-0.3691, 4.6544, 1.2675, -0.4327] }, "MSE": { "y_true": [1.12, 1.25, 2.54, 4.52] "y_pred": [1.51, 1.01, 2.21, 4.50] } }
- load_config(config_path: PathLike, *args, **kwargs) BOAConfig#
Load config takes a configuration path of either a JSON file or a YAML file and returns your configuration dataclass.
Load_config will (unless overwritten in a subclass), do some basic “normalizations” to your configuration for convenience. See
normalize_config()for more information about how the normalization works and what config options you can control.This implementation offers a default implementation that should work for most JSON or YAML files, but can be overwritten in subclasses if need be.
- Parameters
config_path (PathLike) – File path for the experiment configuration file
- Returns
loaded_config
- Return type
- mk_experiment_dir(experiment_dir: PathLike = None, output_dir: PathLike = None, experiment_name: str = None, append_timestamp: bool = None, **kwargs) Path#
Make the experiment directory that boa will write all of its trials and logs to.
All parameters can be set in your configuration file as well. experiment_dir -> optimization_options -> experiment_dir experiment_name -> optimization_options -> experiment -> name append_timestamp -> script_options -> append_timestamp
- Parameters
experiment_dir (PathLike) – Path to the directory for the output of the experiment You may specify this or output_dir in your configuration file instead. (Defaults to your configuration file and then None)
output_dir (PathLike) –
Output directory of project, If you specify output_dir, then output will be saved in output_dir / experiment_name Because of this only either experiment_dir or output_dir may be specified. (if neither experiment_dir
nor output_dir are specified, output_dir defaults to whatever pwd returns (and equivalent on windows))
experiment_name (str) – Name of experiment, used for creating path to experiment dir with the output dir (Defaults to your configuration file and then boa_runs)
append_timestamp (bool) – Whether to append a timestamp to the end of the experiment directory to ensure uniqueness (Defaults to your configuration file and then True)
- Return type