Configuration Guide#
Example Configuration#
# ###########
# objective #
# ###########
# Your objective to be optimized by BOA.
# This can be a single objective, scalarized objective, or a multi-objective (pareto objective).
# For a single objective, list a single metric in the metrics field.
# For a multi-objective, list multiple metrics in the metrics field.
# For a scalarized objective, list multiple metrics in the metrics field and specify the
# weights for each metric in each metrics weight field.
objective:
# A list of BOAMetric objects that represent the metrics to be used in the objective.
metrics:
- name: metric1 # Name of the metric. This is used to identify the metric in your wrapper script.
# metrics to be used for optimization. You can use list any metric in built into BOA.
# Those metrics can be found here: :mod:`Metrics <boa.metrics.metrics>`.
# If no metric is specified, a :class:`pass through<.PassThrough>` metric will be used.
# Which means that the metric will be computed by the user and passed to BOA.
# You can also use any metric from sklearn by passing in the name of the metric
# and metric type as `sklearn_metric`.
# You can also use any metric from the Ax's or BoTorch's synthetic metrics modules by
# passing in the name of the metric and metric type as `synthetic_metric`.
metric: RMSE
# String representation of outcome constraint of metrics.
# This bounds a metric (or linear combination of metrics)
# by some bound (>= or <=).
# (ex. ['metric1 >= 0.0', 'metric2 <= 1.0', '2*metric1 + .5*metric2 <= 1.0'])
outcome_constraints: '...'
# String representation of Objective Thresholds for multi-objective optimization.
# An objective threshold represents the threshold for an objective metric
# to contribute to hypervolume calculations. A list containing the objective
# threshold for each metric collectively form a reference point.
# Because the objective thresholds are used to calculate hypervolume, they
# can only be used for multi-objective optimization.
# (ex. ['metric1 >= 0.0', 'metric2 <= 1.0'])
objective_thresholds: '...'
# A boolean that indicates whether the scalarized objective should be minimized or maximized.
# Only used for scalarized objectives because each metric can have its own minimize flag.
# Will be ignored for non scalarized objectives.
minimize: '...'
# ############
# parameters #
# ############
# Parameters to optimize over. This can be expressed in two ways. The first is a list of dictionaries, where each
# dictionary represents a parameter. The second is a dictionary of dictionaries, where the key is the name of the
# parameter and the value is the dictionary representing the parameter.
# .. code-block:: yaml
# ## Dictionary of dictionaries
# x1:
# type: range
# bounds: [0, 1]
# value_type: float
# x2:
# type: range
# bounds: [0.0, 1.0] # value_type is inferred from bounds
# .. code-block:: yaml
# ## List of dictionaries
# - name: x1
# type: range
# bounds: [0, 1]
# value_type: float
# .. code-block:: yaml
# ## Fixed Types
# x3: 4.0 # Fixed type, value is 4.0
# x4:
# type: fixed
# value: "some string" # Fixed type, value is "some string"
# ## Choice Options
# x5:
# type: choice
# values: ["a", "b"]
parameters:
x1:
type: range
bounds:
- 0
- 1
value_type: float
x2:
type: choice
values:
- a
- b
- c
x3: 4.0
# #####################
# generation_strategy #
# #####################
# Your generation strategy is how new trials will be generated, that is, what acquisition function
# will be used to select the next trial, what kernel will be used to model the objective function,
# as well as other options such as max parallelism.
# This is an optional section. If not specified, Ax will choose a generation strategy for you.
# Based on your objective, parameters, and other options. You can pass options to how Ax chooses
# a generation strategy by passing options under `generation_strategy`.
# Taken from Ax's documentation:
# Select an appropriate generation strategy based on the properties of
# the search space and expected settings of the experiment, such as number of
# arms per trial, optimization algorithm settings, expected number of trials
# in the experiment, etc.
# Args:
# search_space: SearchSpace, based on the properties of which to select the
# generation strategy.
# use_batch_trials: Whether this generation strategy will be used to generate
# batched trials instead of 1-arm trials.
# enforce_sequential_optimization: Whether to enforce that 1) the generation
# strategy needs to be updated with ``min_trials_observed`` observations for
# a given generation step before proceeding to the next one and 2) maximum
# number of trials running at once (max_parallelism) if enforced for the
# BayesOpt step. NOTE: ``max_parallelism_override`` and
# ``max_parallelism_cap`` settings will still take their effect on max
# parallelism even if ``enforce_sequential_optimization=False``, so if those
# settings are specified, max parallelism will be enforced.
# random_seed: Fixed random seed for the Sobol generator.
# torch_device: The device to use for generation steps implemented in PyTorch
# (e.g. via BoTorch). Some generation steps (in particular EHVI-based ones
# for multi-objective optimization) can be sped up by running candidate
# generation on the GPU. If not specified, uses the default torch device
# (usually the CPU).
# no_winsorization: Whether to apply the winsorization transform
# prior to applying other transforms for fitting the BoTorch model.
# winsorization_config: Explicit winsorization settings, if winsorizing. Usually
# only `upper_quantile_margin` is set when minimizing, and only
# `lower_quantile_margin` when maximizing.
# derelativize_with_raw_status_quo: Whether to derelativize using the raw status
# quo values in any transforms. This argument is primarily to allow automatic
# Winsorization when relative constraints are present. Note: automatic
# Winsorization will fail if this is set to `False` (or unset) and there
# are relative constraints present.
# no_bayesian_optimization: If True, Bayesian optimization generation
# strategy will not be suggested and quasi-random strategy will be used.
# num_trials: Total number of trials in the optimization, if
# known in advance.
# num_initialization_trials: Specific number of initialization trials, if wanted.
# Typically, initialization trials are generated quasi-randomly.
# max_initialization_trials: If ``num_initialization_trials`` unspecified, it
# will be determined automatically. This arg provides a cap on that
# automatically determined number.
# num_completed_initialization_trials: The final calculated number of
# initialization trials is reduced by this number. This is useful when
# warm-starting an experiment, to specify what number of completed trials
# can be used to satisfy the initialization_trial requirement.
# max_parallelism_cap: Integer cap on parallelism in this generation strategy.
# If specified, ``max_parallelism`` setting in each generation step will be
# set to the minimum of the default setting for that step and the value of
# this cap. ``max_parallelism_cap`` is meant to just be a hard limit on
# parallelism (e.g. to avoid overloading machine(s) that evaluate the
# experiment trials). Specify only if not specifying
# ``max_parallelism_override``.
# max_parallelism_override: Integer, with which to override the default max
# parallelism setting for all steps in the generation strategy returned from
# this function. Each generation step has a ``max_parallelism`` value, which
# restricts how many trials can run simultaneously during a given generation
# step. By default, the parallelism setting is chosen as appropriate for the
# model in a given generation step. If ``max_parallelism_override`` is -1,
# no max parallelism will be enforced for any step of the generation
# strategy. Be aware that parallelism is limited to improve performance of
# Bayesian optimization, so only disable its limiting if necessary.
# optimization_config: used to infer whether to use MOO and will be passed in to
# ``Winsorize`` via its ``transform_config`` in order to determine default
# winsorization behavior when necessary.
# should_deduplicate: Whether to deduplicate the parameters of proposed arms
# against those of previous arms via rejection sampling. If this is True,
# the generation strategy will discard generator runs produced from the
# generation step that has `should_deduplicate=True` if they contain arms
# already present on the experiment and replace them with new generator runs.
# If no generator run with entirely unique arms could be produced in 5
# attempts, a `GenerationStrategyRepeatedPoints` error will be raised, as we
# assume that the optimization converged when the model can no longer suggest
# unique arms.
# use_saasbo: Whether to use SAAS prior for any GPEI generation steps.
# verbose: Whether GP model should produce verbose logs. If not ``None``, its
# value gets added to ``model_kwargs`` during ``generation_strategy``
# construction. Defaults to ``True`` for SAASBO, else ``None``. Verbose
# outputs are currently only available for SAASBO, so if ``verbose is not
# None`` for a different model type, it will be overridden to ``None`` with
# a warning.
# disable_progbar: Whether GP model should produce a progress bar. If not
# ``None``, its value gets added to ``model_kwargs`` during
# ``generation_strategy`` construction. Defaults to ``True`` for SAASBO, else
# ``None``. Progress bars are currently only available for SAASBO, so if
# ``disable_probar is not None`` for a different model type, it will be
# overridden to ``None`` with a warning.
# jit_compile: Whether to use jit compilation in Pyro when SAASBO is used.
# experiment: If specified, ``_experiment`` attribute of the generation strategy
# will be set to this experiment (useful for associating a generation
# strategy with a given experiment before it's first used to ``gen`` with
# that experiment). Can also provide `optimization_config` if it is not
# provided as an arg to this function.
# use_update: Whether to use ``ModelBridge.update`` to update the model with
# new data rather than fitting it from scratch. This is much more efficient,
# particularly when running trials in parallel. Note that this is not
# compatible with metrics that are available while running.
# It will default to True if using SAASBO and the given experiment does not
# have any metrics that are available while running.
#
# See https://ax.dev/tutorials/generation_strategy.html and
# https://ax.dev/api/modelbridge.html#ax.modelbridge.dispatch_utils.choose_generation_strategy
# For specific options.
# If you want to specify your own generation strategy, you can do so by passing a list of
# steps under `generation_strategy.steps`
# .. code-block:: yaml
# generation_strategy:
# # Use Ax's SAASBO algorithm, which is particularly well suited for high dimensional problems
# use_saasbo: true
# max_parallelism_cap: 10 # Maximum number of trials allowed to run in parallel
# Other options are possible,
# see https://ax.dev/tutorials/generation_strategy.html#1A.-Manually-configured-generation-strategy
# and Models from ax.modelbridge.registry.py for more options
# Some options include SOBOL, GPEI, Thompson, GPKG (knowledge gradient), and others.
# See https://ax.dev/api/modelbridge.html#ax.modelbridge.generation_node.GenerationStep
# For specific options you can pass to each step
# .. code-block:: yaml
# generation_strategy:
# steps:
# - model: SOBOL
# num_trials: 20
# - model: GPEI # Gaussian Process with Expected Improvement
# num_trials: -1
# max_parallelism: 10 # Maximum number of trials allowed to run in parallel
generation_strategy: {}
# ###########
# scheduler #
# ###########
# Settings for a scheduler instance.
# Attributes:
# max_pending_trials: Maximum number of pending trials the scheduler
# can have ``STAGED`` or ``RUNNING`` at once, required. If looking
# to use ``Runner.poll_available_capacity`` as a primary guide for
# how many trials should be pending at a given time, set this limit
# to a high number, as an upper bound on number of trials that
# should not be exceeded.
# trial_type: Type of trials (1-arm ``Trial`` or multi-arm ``Batch
# Trial``) that will be deployed using the scheduler. Defaults
# to 1-arm `Trial`. NOTE: use ``BatchTrial`` only if need to
# evaluate multiple arms *together*, e.g. in an A/B-test
# influenced by data nonstationarity. For cases where just
# deploying multiple arms at once is beneficial but the trials
# are evaluated *independently*, implement ``run_trials`` method
# in scheduler subclass, to deploy multiple 1-arm trials at
# the same time.
# batch_size: If using BatchTrial the number of arms to be generated and
# deployed per trial.
# total_trials: Limit on number of trials a given ``Scheduler``
# should run. If no stopping criteria are implemented on
# a given scheduler, exhaustion of this number of trials
# will be used as default stopping criterion in
# ``Scheduler.run_all_trials``. Required to be non-null if
# using ``Scheduler.run_all_trials`` (not required for
# ``Scheduler.run_n_trials``).
# tolerated_trial_failure_rate: Fraction of trials in this
# optimization that are allowed to fail without the whole
# optimization ending. Expects value between 0 and 1.
# NOTE: Failure rate checks begin once
# min_failed_trials_for_failure_rate_check trials have
# failed; after that point if the ratio of failed trials
# to total trials ran so far exceeds the failure rate,
# the optimization will halt.
# min_failed_trials_for_failure_rate_check: The minimum number
# of trials that must fail in `Scheduler` in order to start
# checking failure rate.
# log_filepath: File, to which to write optimization logs.
# logging_level: Minimum level of logging statements to log,
# defaults to ``logging.INFO``.
# ttl_seconds_for_trials: Optional TTL for all trials created
# within this ``Scheduler``, in seconds. Trials that remain
# ``RUNNING`` for more than their TTL seconds will be marked
# ``FAILED`` once the TTL elapses and may be re-suggested by
# the Ax optimization models.
# init_seconds_between_polls: Initial wait between rounds of
# polling, in seconds. Relevant if using the default wait-
# for-completed-runs functionality of the base ``Scheduler``
# (if ``wait_for_completed_trials_and_report_results`` is not
# overridden). With the default waiting, every time a poll
# returns that no trial evaluations completed, wait
# time will increase; once some completed trial evaluations
# are found, it will reset back to this value. Specify 0
# to not introduce any wait between polls.
# min_seconds_before_poll: Minimum number of seconds between
# beginning to run a trial and the first poll to check
# trial status.
# timeout_hours: Number of hours after which the optimization will abort.
# seconds_between_polls_backoff_factor: The rate at which the poll
# interval increases.
# run_trials_in_batches: If True and ``poll_available_capacity`` is
# implemented to return non-null results, trials will be dispatched
# in groups via `run_trials` instead of one-by-one via ``run_trial``.
# This allows to save time, IO calls or computation in cases where
# dispatching trials in groups is more efficient then sequential
# deployment. The size of the groups will be determined as
# the minimum of ``self.poll_available_capacity()`` and the number
# of generator runs that the generation strategy is able to produce
# without more data or reaching its allowed max paralellism limit.
# debug_log_run_metadata: Whether to log run_metadata for debugging purposes.
# early_stopping_strategy: A ``BaseEarlyStoppingStrategy`` that determines
# whether a trial should be stopped given the current state of
# the experiment. Used in ``should_stop_trials_early``.
# global_stopping_strategy: A ``BaseGlobalStoppingStrategy`` that determines
# whether the full optimization should be stopped or not.
# suppress_storage_errors_after_retries: Whether to fully suppress SQL
# storage-related errors if encounted, after retrying the call
# multiple times. Only use if SQL storage is not important for the given
# use case, since this will only log, but not raise, an exception if
# it's encountered while saving to DB or loading from it.
#
# n_trials: Only run this many trials,
# in contrast to `total_trials` which is a hard limit, even after reloading the
# scheduler, this will run n_trials trials every time you reload the scheduler.
# Making it easier to use when reloading the scheduler and continuing to run trials.
scheduler:
max_pending_trials: '...'
trial_type: '...'
batch_size: '...'
total_trials: '...'
tolerated_trial_failure_rate: '...'
min_failed_trials_for_failure_rate_check: '...'
log_filepath: '...'
logging_level: '...'
ttl_seconds_for_trials: '...'
init_seconds_between_polls: '...'
min_seconds_before_poll: '...'
seconds_between_polls_backoff_factor: '...'
timeout_hours: '...'
run_trials_in_batches: '...'
debug_log_run_metadata: '...'
early_stopping_strategy: '...'
global_stopping_strategy: '...'
suppress_storage_errors_after_retries: '...'
# #######################
# parameter_constraints #
# #######################
parameter_constraints: []
# ###############
# model_options #
# ###############
model_options: '...'
# ################
# script_options #
# ################
script_options:
# Whether to use the config file as the base path for all relative paths.
# If True, all relative paths will be relative to the config file directory.
# Defaults to True if not specified.
# If launched through BOA CLI, this will be set to True automatically.
# rel_to_config and rel_to_launch cannot both be specified.
rel_to_config: '...'
# Whether to use the CLI launch directory as the base path for all relative paths.
# If True, all relative paths will be relative to the CLI launch directory.
# Defaults to `rel_to_config` argument if not specified.
# rel_to_config and rel_to_launch cannot both be specified.
rel_to_launch: '...'
# Name of the python wrapper class. Used for python interface only.
# Defaults to `Wrapper` if not specified.
wrapper_name: '...'
# Path to the python wrapper file. Used for python interface only.
# Defaults to `wrapper.py` if not specified.
wrapper_path: '...'
# Path to the working directory. Defaults to `.` (Current working directory) if not specified.
working_dir: '...'
# Path to the directory for the output of the experiment
# You may specify this or output_dir in your configuration file instead.
experiment_dir: '...'
# Output directory of project,
# If you specify output_dir, then output will be saved in
# output_dir / experiment_name
# Because of this only either experiment_dir or output_dir may be specified.
# (if neither experiment_dir nor output_dir are specified, output_dir defaults
# to whatever pwd returns (and equivalent on windows))
output_dir: '...'
# name of the experiment. Used with output_dir to create the experiment directory
# if experiment_dir is not specified.
exp_name: '...'
# Whether to append a timestamp to the output directory to ensure uniqueness.
# Defaults to `True` if not specified.
append_timestamp: '...'
# Shell command to run the model. Used for the language-agnostic interface only.
# this is what BOA will do to launch your script.
# it will also pass as a command line argument the current trial directory
# that is be parameterized by BOA.
# `run_model` is the only needed shell command of these 4, because you
# can use it also to write your config, run your model, set your trial status,
# and fetch your trial data all in one script if you so choose. The
# other scripts are provided as a convenience to segment out your logic.
# This can either be a relative path or absolute path.
run_model: '...'
# Shell command to write your configs out. See `run_model` for more details.
write_configs: '...'
# Shell command to set your trial status. See `run_model` for more details.
set_trial_status: '...'
# Shell command to fetch your trial data. See `run_model` for more details.
fetch_trial_data: '...'
base_path: '...'
# ################
# parameter_keys #
# ################
parameter_keys: '...'
# #############
# config_path #
# #############
config_path: '...'
# ##########
# n_trials #
# ##########
n_trials: '...'
Jinja2 Templating#
BOA supports Jinja2 templating in the configuration file. This allows for
the use of variables and conditionals in the configuration file. For example,
the following configuration file uses Jinja2 templating to set the
parameters and parameter_constraints fields based on a loop.
Much more complex templating is possible, including the use of conditionals.
See Jinja2 for more information on Jinja2 templating
and additional options and examples.
A number of variables are available by default from BOA in your Jinja2 style Config.
These variables are listed in JinjaTemplateVars.
# We can set up a list of parameter names to be used throughout the config file
# This will give us a list from x0 to x9
{% set param_names = [] %}
{% for i in range(10) %}
{% do param_names.append("x" + i|string) %}
{% endfor %}
# We could also do this directly instead of in a loop
{% set param_names2 = ["y0", "y1", "y2"] %}
objective:
metrics:
# List all of your metrics here,
# only list 1 metric for a single objective optimization
- name: rmse
metric: RootMeanSquaredError
parameters:
# We can use the list of parameter names we created above
# and loop through them to create our parameters
{% for param in param_names %}
{{ param }}:
type: range
bounds: [0, 1]
value_type: float
{% endfor %}
{% for param in param_names2 %}
{{ param }}:
type: range
bounds: [0.0, 1.0]
{% endfor %}
parameter_constraints:
# We can also use the list of parameter names we created above
# and loop through them to create our parameter constraints
{% for param in param_names %}
- {{ param }} <= .5
{% endfor %}
{% for param in param_names2 %}
- {{ param }} >= .5
{% endfor %}
scheduler:
n_trials: 1
- class boa.config.BOAConfig(**config)[source]#
Bases:
_UtilsBase doc string
Method generated by attrs for class _Utils.
- objective: dict | boa.config.config.BOAObjective#
Your objective to be optimized by BOA. This can be a single objective, scalarized objective, or a multi-objective (pareto objective). For a single objective, list a single metric in the metrics field. For a multi-objective, list multiple metrics in the metrics field. For a scalarized objective, list multiple metrics in the metrics field and specify the weights for each metric in each metrics weight field.
- parameters: dict[str, dict] | list[Dict[str, Union[str, bool, float, int, NoneType, Sequence[Union[str, bool, float, int, NoneType]], Dict[str, List[str]]]]]#
Parameters to optimize over. This can be expressed in two ways. The first is a list of dictionaries, where each dictionary represents a parameter. The second is a dictionary of dictionaries, where the key is the name of the parameter and the value is the dictionary representing the parameter.
## Dictionary of dictionaries x1: type: range bounds: [0, 1] value_type: float x2: type: range bounds: [0.0, 1.0] # value_type is inferred from bounds
## List of dictionaries - name: x1 type: range bounds: [0, 1] value_type: float
## Fixed Types x3: 4.0 # Fixed type, value is 4.0 x4: type: fixed value: "some string" # Fixed type, value is "some string" ## Choice Options x5: type: choice values: ["a", "b"]
- generation_strategy: Optional[dict]#
Your generation strategy is how new trials will be generated, that is, what acquisition function will be used to select the next trial, what kernel will be used to model the objective function, as well as other options such as max parallelism.
This is an optional section. If not specified, Ax will choose a generation strategy for you. Based on your objective, parameters, and other options. You can pass options to how Ax chooses a generation strategy by passing options under generation_strategy.
Taken from Ax’s documentation: Select an appropriate generation strategy based on the properties of
the search space and expected settings of the experiment, such as number of arms per trial, optimization algorithm settings, expected number of trials in the experiment, etc.
- Args:
- search_space: SearchSpace, based on the properties of which to select the
generation strategy.
- use_batch_trials: Whether this generation strategy will be used to generate
batched trials instead of 1-arm trials.
- enforce_sequential_optimization: Whether to enforce that 1) the generation
strategy needs to be updated with
min_trials_observedobservations for a given generation step before proceeding to the next one and 2) maximum number of trials running at once (max_parallelism) if enforced for the BayesOpt step. NOTE:max_parallelism_overrideandmax_parallelism_capsettings will still take their effect on max parallelism even ifenforce_sequential_optimization=False, so if those settings are specified, max parallelism will be enforced.
random_seed: Fixed random seed for the Sobol generator. torch_device: The device to use for generation steps implemented in PyTorch
(e.g. via BoTorch). Some generation steps (in particular EHVI-based ones for multi-objective optimization) can be sped up by running candidate generation on the GPU. If not specified, uses the default torch device (usually the CPU).
- no_winsorization: Whether to apply the winsorization transform
prior to applying other transforms for fitting the BoTorch model.
- winsorization_config: Explicit winsorization settings, if winsorizing. Usually
only upper_quantile_margin is set when minimizing, and only lower_quantile_margin when maximizing.
- derelativize_with_raw_status_quo: Whether to derelativize using the raw status
quo values in any transforms. This argument is primarily to allow automatic Winsorization when relative constraints are present. Note: automatic Winsorization will fail if this is set to False (or unset) and there are relative constraints present.
- no_bayesian_optimization: If True, Bayesian optimization generation
strategy will not be suggested and quasi-random strategy will be used.
- num_trials: Total number of trials in the optimization, if
known in advance.
- num_initialization_trials: Specific number of initialization trials, if wanted.
Typically, initialization trials are generated quasi-randomly.
- max_initialization_trials: If
num_initialization_trialsunspecified, it will be determined automatically. This arg provides a cap on that automatically determined number.
- num_completed_initialization_trials: The final calculated number of
initialization trials is reduced by this number. This is useful when warm-starting an experiment, to specify what number of completed trials can be used to satisfy the initialization_trial requirement.
- max_parallelism_cap: Integer cap on parallelism in this generation strategy.
If specified,
max_parallelismsetting in each generation step will be set to the minimum of the default setting for that step and the value of this cap.max_parallelism_capis meant to just be a hard limit on parallelism (e.g. to avoid overloading machine(s) that evaluate the experiment trials). Specify only if not specifyingmax_parallelism_override.- max_parallelism_override: Integer, with which to override the default max
parallelism setting for all steps in the generation strategy returned from this function. Each generation step has a
max_parallelismvalue, which restricts how many trials can run simultaneously during a given generation step. By default, the parallelism setting is chosen as appropriate for the model in a given generation step. Ifmax_parallelism_overrideis -1, no max parallelism will be enforced for any step of the generation strategy. Be aware that parallelism is limited to improve performance of Bayesian optimization, so only disable its limiting if necessary.- optimization_config: used to infer whether to use MOO and will be passed in to
Winsorizevia itstransform_configin order to determine default winsorization behavior when necessary.- should_deduplicate: Whether to deduplicate the parameters of proposed arms
against those of previous arms via rejection sampling. If this is True, the generation strategy will discard generator runs produced from the generation step that has should_deduplicate=True if they contain arms already present on the experiment and replace them with new generator runs. If no generator run with entirely unique arms could be produced in 5 attempts, a GenerationStrategyRepeatedPoints error will be raised, as we assume that the optimization converged when the model can no longer suggest unique arms.
use_saasbo: Whether to use SAAS prior for any GPEI generation steps. verbose: Whether GP model should produce verbose logs. If not
None, itsvalue gets added to
model_kwargsduringgeneration_strategyconstruction. Defaults toTruefor SAASBO, elseNone. Verbose outputs are currently only available for SAASBO, so ifverbose is not Nonefor a different model type, it will be overridden toNonewith a warning.- disable_progbar: Whether GP model should produce a progress bar. If not
None, its value gets added tomodel_kwargsduringgeneration_strategyconstruction. Defaults toTruefor SAASBO, elseNone. Progress bars are currently only available for SAASBO, so ifdisable_probar is not Nonefor a different model type, it will be overridden toNonewith a warning.
jit_compile: Whether to use jit compilation in Pyro when SAASBO is used. experiment: If specified,
_experimentattribute of the generation strategywill be set to this experiment (useful for associating a generation strategy with a given experiment before it’s first used to
genwith that experiment). Can also provide optimization_config if it is not provided as an arg to this function.- use_update: Whether to use
ModelBridge.updateto update the model with new data rather than fitting it from scratch. This is much more efficient, particularly when running trials in parallel. Note that this is not compatible with metrics that are available while running. It will default to True if using SAASBO and the given experiment does not have any metrics that are available while running.
See https://ax.dev/tutorials/generation_strategy.html and https://ax.dev/api/modelbridge.html#ax.modelbridge.dispatch_utils.choose_generation_strategy For specific options.
If you want to specify your own generation strategy, you can do so by passing a list of steps under generation_strategy.steps
generation_strategy: # Use Ax's SAASBO algorithm, which is particularly well suited for high dimensional problems use_saasbo: true max_parallelism_cap: 10 # Maximum number of trials allowed to run in parallel
Other options are possible, see https://ax.dev/tutorials/generation_strategy.html#1A.-Manually-configured-generation-strategy and Models from ax.modelbridge.registry.py for more options Some options include SOBOL, GPEI, Thompson, GPKG (knowledge gradient), and others. See https://ax.dev/api/modelbridge.html#ax.modelbridge.generation_node.GenerationStep For specific options you can pass to each step
generation_strategy: steps: - model: SOBOL num_trials: 20 - model: GPEI # Gaussian Process with Expected Improvement num_trials: -1 max_parallelism: 10 # Maximum number of trials allowed to run in parallel
- scheduler: Optional[Union[dict, SchedulerOptions]]#
Settings for a scheduler instance.
- Attributes:
- max_pending_trials: Maximum number of pending trials the scheduler
can have
STAGEDorRUNNINGat once, required. If looking to useRunner.poll_available_capacityas a primary guide for how many trials should be pending at a given time, set this limit to a high number, as an upper bound on number of trials that should not be exceeded.- trial_type: Type of trials (1-arm
Trialor multi-arm ``Batch Trial``) that will be deployed using the scheduler. Defaults to 1-arm Trial. NOTE: use
BatchTrialonly if need to evaluate multiple arms together, e.g. in an A/B-test influenced by data nonstationarity. For cases where just deploying multiple arms at once is beneficial but the trials are evaluated independently, implementrun_trialsmethod in scheduler subclass, to deploy multiple 1-arm trials at the same time.- batch_size: If using BatchTrial the number of arms to be generated and
deployed per trial.
- total_trials: Limit on number of trials a given
Scheduler should run. If no stopping criteria are implemented on a given scheduler, exhaustion of this number of trials will be used as default stopping criterion in
Scheduler.run_all_trials. Required to be non-null if usingScheduler.run_all_trials(not required forScheduler.run_n_trials).- tolerated_trial_failure_rate: Fraction of trials in this
optimization that are allowed to fail without the whole optimization ending. Expects value between 0 and 1. NOTE: Failure rate checks begin once min_failed_trials_for_failure_rate_check trials have failed; after that point if the ratio of failed trials to total trials ran so far exceeds the failure rate, the optimization will halt.
- min_failed_trials_for_failure_rate_check: The minimum number
of trials that must fail in Scheduler in order to start checking failure rate.
log_filepath: File, to which to write optimization logs. logging_level: Minimum level of logging statements to log,
defaults to
logging.INFO.- ttl_seconds_for_trials: Optional TTL for all trials created
within this
Scheduler, in seconds. Trials that remainRUNNINGfor more than their TTL seconds will be markedFAILEDonce the TTL elapses and may be re-suggested by the Ax optimization models.- init_seconds_between_polls: Initial wait between rounds of
polling, in seconds. Relevant if using the default wait- for-completed-runs functionality of the base
Scheduler(ifwait_for_completed_trials_and_report_resultsis not overridden). With the default waiting, every time a poll returns that no trial evaluations completed, wait time will increase; once some completed trial evaluations are found, it will reset back to this value. Specify 0 to not introduce any wait between polls.- min_seconds_before_poll: Minimum number of seconds between
beginning to run a trial and the first poll to check trial status.
timeout_hours: Number of hours after which the optimization will abort. seconds_between_polls_backoff_factor: The rate at which the poll
interval increases.
- run_trials_in_batches: If True and
poll_available_capacityis implemented to return non-null results, trials will be dispatched in groups via run_trials instead of one-by-one via
run_trial. This allows to save time, IO calls or computation in cases where dispatching trials in groups is more efficient then sequential deployment. The size of the groups will be determined as the minimum ofself.poll_available_capacity()and the number of generator runs that the generation strategy is able to produce without more data or reaching its allowed max paralellism limit.
debug_log_run_metadata: Whether to log run_metadata for debugging purposes. early_stopping_strategy: A
BaseEarlyStoppingStrategythat determineswhether a trial should be stopped given the current state of the experiment. Used in
should_stop_trials_early.- global_stopping_strategy: A
BaseGlobalStoppingStrategythat determines whether the full optimization should be stopped or not.
- suppress_storage_errors_after_retries: Whether to fully suppress SQL
storage-related errors if encounted, after retrying the call multiple times. Only use if SQL storage is not important for the given use case, since this will only log, but not raise, an exception if it’s encountered while saving to DB or loading from it.
- n_trials: Only run this many trials,
in contrast to total_trials which is a hard limit, even after reloading the scheduler, this will run n_trials trials every time you reload the scheduler. Making it easier to use when reloading the scheduler and continuing to run trials.
- script_options: Optional[Union[dict, BOAScriptOptions]]#
- classmethod from_jsonlike(file, rel_to_config: Optional[bool] = None, template_kw: Optional[dict] = None, **kwargs)[source]#
- property trials#
- class boa.config.BOAObjective(**config)[source]#
Bases:
_UtilsYour objective to be optimized by BOA. This can be a single objective, scalarized objective, or a multi-objective (pareto objective). For a single objective, list a single metric in the metrics field. For a multi-objective, list multiple metrics in the metrics field. For a scalarized objective, list multiple metrics in the metrics field and specify the weights for each metric in each metrics weight field.
Method generated by attrs for class _Utils.
- metrics: list[boa.config.config.BOAMetric]#
A list of BOAMetric objects that represent the metrics to be used in the objective.
- outcome_constraints: Optional[list[str]]#
- String representation of outcome constraint of metrics.
This bounds a metric (or linear combination of metrics) by some bound (>= or <=). (ex. [‘metric1 >= 0.0’, ‘metric2 <= 1.0’, ‘2*metric1 + .5*metric2 <= 1.0’])
- objective_thresholds: Optional[list[str]]#
- String representation of Objective Thresholds for multi-objective optimization.
An objective threshold represents the threshold for an objective metric to contribute to hypervolume calculations. A list containing the objective threshold for each metric collectively form a reference point. Because the objective thresholds are used to calculate hypervolume, they can only be used for multi-objective optimization. (ex. [‘metric1 >= 0.0’, ‘metric2 <= 1.0’])
- minimize: Optional[bool]#
- A boolean that indicates whether the scalarized objective should be minimized or maximized.
Only used for scalarized objectives because each metric can have its own minimize flag. Will be ignored for non scalarized objectives.
- property metric_names#
- class boa.config.BOAScriptOptions(**config)[source]#
Bases:
_UtilsMethod generated by attrs for class _Utils.
- rel_to_config: Optional[bool]#
Whether to use the config file as the base path for all relative paths. If True, all relative paths will be relative to the config file directory. Defaults to True if not specified. If launched through BOA CLI, this will be set to True automatically. rel_to_config and rel_to_launch cannot both be specified.
- rel_to_launch: Optional[bool]#
Whether to use the CLI launch directory as the base path for all relative paths. If True, all relative paths will be relative to the CLI launch directory. Defaults to rel_to_config argument if not specified. rel_to_config and rel_to_launch cannot both be specified.
- wrapper_name: str#
- Name of the python wrapper class. Used for python interface only.
Defaults to Wrapper if not specified.
- wrapper_path: str#
- Path to the python wrapper file. Used for python interface only.
Defaults to wrapper.py if not specified.
- working_dir: str#
Path to the working directory. Defaults to . (Current working directory) if not specified.
- experiment_dir: Optional[PathLike]#
- Path to the directory for the output of the experiment
You may specify this or output_dir in your configuration file instead.
- output_dir: Optional[PathLike]#
- Output directory of project,
If you specify output_dir, then output will be saved in output_dir / experiment_name Because of this only either experiment_dir or output_dir may be specified. (if neither experiment_dir nor output_dir are specified, output_dir defaults to whatever pwd returns (and equivalent on windows))
- exp_name: Optional[str]#
- name of the experiment. Used with output_dir to create the experiment directory
if experiment_dir is not specified.
- append_timestamp: bool#
- Whether to append a timestamp to the output directory to ensure uniqueness.
Defaults to True if not specified.
- run_model: Optional[str]#
- Shell command to run the model. Used for the language-agnostic interface only.
this is what BOA will do to launch your script. it will also pass as a command line argument the current trial directory that is be parameterized by BOA. run_model is the only needed shell command of these 4, because you can use it also to write your config, run your model, set your trial status, and fetch your trial data all in one script if you so choose. The other scripts are provided as a convenience to segment out your logic. This can either be a relative path or absolute path.
- write_configs: Optional[str]#
Shell command to write your configs out. See run_model for more details.
- set_trial_status: Optional[str]#
Shell command to set your trial status. See run_model for more details.
- class boa.config.BOAMetric(*args, lower_is_better: Optional[bool] = None, **kwargs)[source]#
Bases:
_UtilsMethod generated by attrs for class _Utils.
- Parameters
lower_is_better (Optional[bool]) –
- metric: Optional[str | ModularMetric]#
- metrics to be used for optimization. You can use list any metric in built into BOA.
Those metrics can be found here:
Metrics. If no metric is specified, apass throughmetric will be used. Which means that the metric will be computed by the user and passed to BOA. You can also use any metric from sklearn by passing in the name of the metric and metric type as sklearn_metric. You can also use any metric from the Ax’s or BoTorch’s synthetic metrics modules by passing in the name of the metric and metric type as synthetic_metric.
- name: Optional[str]#
Name of the metric. This is used to identify the metric in your wrapper script.
- metric_type: Optional[MetricType | str]#
- Type of metric. In built BOA metrics are of type metric,
by using sklearn_metric you can use any metric from sklearn.metrics module, by using synthetic_metric you can use any synthetic function from Ax’s or BoTorch’s synthetic metrics modules. You can also specify pass_through to use a metric that is computed by the user.
- noise_sd: Optional[float]#
- Standard deviation of the noise to be added to the metric.
This is useful when you want to simulate noisy metrics. If None, interpret the function as noisy with unknown noise level. Defaults to 0 (noiseless).
- minimize: Optional[bool]#
- Whether to minimize or maximize the metric.
Defaults to True (minimize) for a general metric, but every in built metric in BOA (Mean, RMSE, etc.) has its own default value.
- info_only: bool#
- Whether the metric is only used for information purposes only but will still be reported.
This means that the metric will not be used for optimization.
- weight: Optional[float]#
- Weight of the metric. Used in scalarized optimization, which is combining multiple metrics
into one metric. Scalarized optimization is a way to cheat a multi-objective optimization problem into a single objective optimization problem and significantly reduce the computational cost.
- properties: Optional[dict]#
- Arbitrary properties of the metric. This is used to pass additional information about the metric
to your wrapper. You can pass whatever information you want to your wrapper and use it in your wrapper functions.
- metric_func_kwargs: Optional[dict]#
- Additional keyword arguments to be passed to the metric function.
This is useful when you are setting up a metric and only want to pass the metric function additional arguments. Example: Passing metric_func_kwargs={“sqaured”: false} to sklearn mean_squared_error to get the root mean squared error instead of the mean squared error (Though BOA already has
RMSEavailable from sklrean built in if needed).
- class boa.config.MetricType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
Bases:
StrEnum- METRIC = 'metric'#
- BOA_METRIC = 'boa_metric'#
- SKLEARN_METRIC = 'sklearn_metric'#
- SYNTHETIC_METRIC = 'synthetic_metric'#
- PASSTHROUGH = 'pass_through'#
- INSTANTIATED = 'instantiated'#