Configuration Guide#
Example Configurations#
This configuration can be generated from BOA with the following command:
python -m boa.config --output-path [path to output]
Default Configuration#
# ###########
# objective #
# ###########
# Your objective to be optimized by BOA.
# This can be a single objective, scalarized objective, or a multi-objective (pareto objective).
# For a single objective, list a single metric in the metrics field.
# For a multi-objective, list multiple metrics in the metrics field.
# For a scalarized objective, list multiple metrics in the metrics field and specify the
# weights for each metric in each metrics weight field.
objective:
# A list of BOAMetric objects that represent the metrics to be used in the objective.
metrics:
- name: metric1 # Name of the metric. This is used to identify the metric in your wrapper script.
# metrics to be used for optimization. You can use list any metric in built into BOA.
# Those metrics can be found here: :mod:`Metrics <boa.metrics.metrics>`.
# If no metric is specified, a :class:`pass through<.PassThrough>` metric will be used.
# Which means that the metric will be computed by the user and passed to BOA.
# You can also use any metric from sklearn by passing in the name of the metric
# and metric type as `sklearn_metric`.
# You can also use any metric from the Ax's or BoTorch's synthetic metrics modules by
# passing in the name of the metric and metric type as `synthetic_metric`.
metric: RMSE
# String representation of outcome constraint of metrics.
# This bounds a metric (or linear combination of metrics)
# by some bound (>= or <=).
# (ex. ['metric1 >= 0.0', 'metric2 <= 1.0', '2*metric1 + .5*metric2 <= 1.0'])
outcome_constraints: '...'
# String representation of Objective Thresholds for multi-objective optimization.
# An objective threshold represents the threshold for an objective metric
# to contribute to hypervolume calculations. A list containing the objective
# threshold for each metric collectively form a reference point.
# Because the objective thresholds are used to calculate hypervolume, they
# can only be used for multi-objective optimization.
# (ex. ['metric1 >= 0.0', 'metric2 <= 1.0'])
objective_thresholds: '...'
# A boolean that indicates whether the scalarized objective should be minimized or maximized.
# Only used for scalarized objectives because each metric can have its own minimize flag.
# Will be ignored for non scalarized objectives.
minimize: '...'
# ############
# parameters #
# ############
# Parameters to optimize over. This can be expressed in two ways. The first is a list of dictionaries, where each
# dictionary represents a parameter. The second is a dictionary of dictionaries, where the key is the name of the
# parameter and the value is the dictionary representing the parameter.
# .. code-block:: yaml
# ## Dictionary of dictionaries
# x1:
# type: range
# bounds: [0, 1]
# value_type: float
# x2:
# type: range
# bounds: [0.0, 1.0] # value_type is inferred from bounds
# .. code-block:: yaml
# ## List of dictionaries
# - name: x1
# type: range
# bounds: [0, 1]
# value_type: float
# .. code-block:: yaml
# ## Fixed Types
# x3: 4.0 # Fixed type, value is 4.0
# x4:
# type: fixed
# value: "some string" # Fixed type, value is "some string"
# ## Choice Options
# x5:
# type: choice
# values: ["a", "b"]
parameters:
x1:
type: range
bounds:
- 0
- 1
value_type: float
x2:
type: choice
values:
- a
- b
- c
x3: 4.0
# #####################
# generation_strategy #
# #####################
# Your generation strategy is how new trials will be generated, that is, what acquisition function
# will be used to select the next trial, what kernel will be used to model the objective function,
# as well as other options such as max parallelism.
# This is an optional section. If not specified, Ax will choose a generation strategy for you.
# Based on your objective, parameters, and other options. You can pass options to how Ax chooses
# a generation strategy by passing options under `generation_strategy`.
# Taken from Ax's documentation:
# Select an appropriate generation strategy based on the properties of
# the search space and expected settings of the experiment, such as number of
# arms per trial, optimization algorithm settings, expected number of trials
# in the experiment, etc.
# Args:
# search_space: SearchSpace, based on the properties of which to select the
# generation strategy.
# use_batch_trials: Whether this generation strategy will be used to generate
# batched trials instead of 1-arm trials.
# enforce_sequential_optimization: Whether to enforce that 1) the generation
# strategy needs to be updated with ``min_trials_observed`` observations for
# a given generation step before proceeding to the next one and 2) maximum
# number of trials running at once (max_parallelism) if enforced for the
# BayesOpt step. NOTE: ``max_parallelism_override`` and
# ``max_parallelism_cap`` settings will still take their effect on max
# parallelism even if ``enforce_sequential_optimization=False``, so if those
# settings are specified, max parallelism will be enforced.
# random_seed: Fixed random seed for the Sobol generator.
# torch_device: The device to use for generation steps implemented in PyTorch
# (e.g. via BoTorch). Some generation steps (in particular EHVI-based ones
# for multi-objective optimization) can be sped up by running candidate
# generation on the GPU. If not specified, uses the default torch device
# (usually the CPU).
# no_winsorization: Whether to apply the winsorization transform
# prior to applying other transforms for fitting the BoTorch model.
# winsorization_config: Explicit winsorization settings, if winsorizing. Usually
# only `upper_quantile_margin` is set when minimizing, and only
# `lower_quantile_margin` when maximizing.
# derelativize_with_raw_status_quo: Whether to derelativize using the raw status
# quo values in any transforms. This argument is primarily to allow automatic
# Winsorization when relative constraints are present. Note: automatic
# Winsorization will fail if this is set to `False` (or unset) and there
# are relative constraints present.
# no_bayesian_optimization: If True, Bayesian optimization generation
# strategy will not be suggested and quasi-random strategy will be used.
# num_trials: Total number of trials in the optimization, if
# known in advance.
# num_initialization_trials: Specific number of initialization trials, if wanted.
# Typically, initialization trials are generated quasi-randomly.
# max_initialization_trials: If ``num_initialization_trials`` unspecified, it
# will be determined automatically. This arg provides a cap on that
# automatically determined number.
# num_completed_initialization_trials: The final calculated number of
# initialization trials is reduced by this number. This is useful when
# warm-starting an experiment, to specify what number of completed trials
# can be used to satisfy the initialization_trial requirement.
# max_parallelism_cap: Integer cap on parallelism in this generation strategy.
# If specified, ``max_parallelism`` setting in each generation step will be
# set to the minimum of the default setting for that step and the value of
# this cap. ``max_parallelism_cap`` is meant to just be a hard limit on
# parallelism (e.g. to avoid overloading machine(s) that evaluate the
# experiment trials). Specify only if not specifying
# ``max_parallelism_override``.
# max_parallelism_override: Integer, with which to override the default max
# parallelism setting for all steps in the generation strategy returned from
# this function. Each generation step has a ``max_parallelism`` value, which
# restricts how many trials can run simultaneously during a given generation
# step. By default, the parallelism setting is chosen as appropriate for the
# model in a given generation step. If ``max_parallelism_override`` is -1,
# no max parallelism will be enforced for any step of the generation
# strategy. Be aware that parallelism is limited to improve performance of
# Bayesian optimization, so only disable its limiting if necessary.
# optimization_config: used to infer whether to use MOO and will be passed in to
# ``Winsorize`` via its ``transform_config`` in order to determine default
# winsorization behavior when necessary.
# should_deduplicate: Whether to deduplicate the parameters of proposed arms
# against those of previous arms via rejection sampling. If this is True,
# the generation strategy will discard generator runs produced from the
# generation step that has `should_deduplicate=True` if they contain arms
# already present on the experiment and replace them with new generator runs.
# If no generator run with entirely unique arms could be produced in 5
# attempts, a `GenerationStrategyRepeatedPoints` error will be raised, as we
# assume that the optimization converged when the model can no longer suggest
# unique arms.
# use_saasbo: Whether to use SAAS prior for any GPEI generation steps.
# verbose: Whether GP model should produce verbose logs. If not ``None``, its
# value gets added to ``model_kwargs`` during ``generation_strategy``
# construction. Defaults to ``True`` for SAASBO, else ``None``. Verbose
# outputs are currently only available for SAASBO, so if ``verbose is not
# None`` for a different model type, it will be overridden to ``None`` with
# a warning.
# disable_progbar: Whether GP model should produce a progress bar. If not
# ``None``, its value gets added to ``model_kwargs`` during
# ``generation_strategy`` construction. Defaults to ``True`` for SAASBO, else
# ``None``. Progress bars are currently only available for SAASBO, so if
# ``disable_probar is not None`` for a different model type, it will be
# overridden to ``None`` with a warning.
# jit_compile: Whether to use jit compilation in Pyro when SAASBO is used.
# experiment: If specified, ``_experiment`` attribute of the generation strategy
# will be set to this experiment (useful for associating a generation
# strategy with a given experiment before it's first used to ``gen`` with
# that experiment). Can also provide `optimization_config` if it is not
# provided as an arg to this function.
# use_update: Whether to use ``ModelBridge.update`` to update the model with
# new data rather than fitting it from scratch. This is much more efficient,
# particularly when running trials in parallel. Note that this is not
# compatible with metrics that are available while running.
# It will default to True if using SAASBO and the given experiment does not
# have any metrics that are available while running.
#
# See https://ax.dev/tutorials/generation_strategy.html and
# https://ax.dev/api/modelbridge.html#ax.modelbridge.dispatch_utils.choose_generation_strategy
# For specific options.
# If you want to specify your own generation strategy, you can do so by passing a list of
# steps under `generation_strategy.steps`
# .. code-block:: yaml
# generation_strategy:
# # Use Ax's SAASBO algorithm, which is particularly well suited for high dimensional problems
# use_saasbo: true
# max_parallelism_cap: 10 # Maximum number of trials allowed to run in parallel
# Other options are possible,
# see https://ax.dev/tutorials/generation_strategy.html#1A.-Manually-configured-generation-strategy
# and Models from ax.modelbridge.registry.py for more options
# Some options include SOBOL, GPEI, Thompson, GPKG (knowledge gradient), and others.
# See https://ax.dev/api/modelbridge.html#ax.modelbridge.generation_node.GenerationStep
# For specific options you can pass to each step
# .. code-block:: yaml
# generation_strategy:
# steps:
# - model: SOBOL
# num_trials: 20
# - model: GPEI # Gaussian Process with Expected Improvement
# num_trials: -1
# max_parallelism: 10 # Maximum number of trials allowed to run in parallel
generation_strategy: {}
# ###########
# scheduler #
# ###########
# Settings for a scheduler instance.
# Attributes:
# max_pending_trials: Maximum number of pending trials the scheduler
# can have ``STAGED`` or ``RUNNING`` at once, required. If looking
# to use ``Runner.poll_available_capacity`` as a primary guide for
# how many trials should be pending at a given time, set this limit
# to a high number, as an upper bound on number of trials that
# should not be exceeded.
# trial_type: Type of trials (1-arm ``Trial`` or multi-arm ``Batch
# Trial``) that will be deployed using the scheduler. Defaults
# to 1-arm `Trial`. NOTE: use ``BatchTrial`` only if need to
# evaluate multiple arms *together*, e.g. in an A/B-test
# influenced by data nonstationarity. For cases where just
# deploying multiple arms at once is beneficial but the trials
# are evaluated *independently*, implement ``run_trials`` method
# in scheduler subclass, to deploy multiple 1-arm trials at
# the same time.
# batch_size: If using BatchTrial the number of arms to be generated and
# deployed per trial.
# total_trials: Limit on number of trials a given ``Scheduler``
# should run. If no stopping criteria are implemented on
# a given scheduler, exhaustion of this number of trials
# will be used as default stopping criterion in
# ``Scheduler.run_all_trials``. Required to be non-null if
# using ``Scheduler.run_all_trials`` (not required for
# ``Scheduler.run_n_trials``).
# tolerated_trial_failure_rate: Fraction of trials in this
# optimization that are allowed to fail without the whole
# optimization ending. Expects value between 0 and 1.
# NOTE: Failure rate checks begin once
# min_failed_trials_for_failure_rate_check trials have
# failed; after that point if the ratio of failed trials
# to total trials ran so far exceeds the failure rate,
# the optimization will halt.
# min_failed_trials_for_failure_rate_check: The minimum number
# of trials that must fail in `Scheduler` in order to start
# checking failure rate.
# log_filepath: File, to which to write optimization logs.
# logging_level: Minimum level of logging statements to log,
# defaults to ``logging.INFO``.
# ttl_seconds_for_trials: Optional TTL for all trials created
# within this ``Scheduler``, in seconds. Trials that remain
# ``RUNNING`` for more than their TTL seconds will be marked
# ``FAILED`` once the TTL elapses and may be re-suggested by
# the Ax optimization models.
# init_seconds_between_polls: Initial wait between rounds of
# polling, in seconds. Relevant if using the default wait-
# for-completed-runs functionality of the base ``Scheduler``
# (if ``wait_for_completed_trials_and_report_results`` is not
# overridden). With the default waiting, every time a poll
# returns that no trial evaluations completed, wait
# time will increase; once some completed trial evaluations
# are found, it will reset back to this value. Specify 0
# to not introduce any wait between polls.
# min_seconds_before_poll: Minimum number of seconds between
# beginning to run a trial and the first poll to check
# trial status.
# timeout_hours: Number of hours after which the optimization will abort.
# seconds_between_polls_backoff_factor: The rate at which the poll
# interval increases.
# run_trials_in_batches: If True and ``poll_available_capacity`` is
# implemented to return non-null results, trials will be dispatched
# in groups via `run_trials` instead of one-by-one via ``run_trial``.
# This allows to save time, IO calls or computation in cases where
# dispatching trials in groups is more efficient then sequential
# deployment. The size of the groups will be determined as
# the minimum of ``self.poll_available_capacity()`` and the number
# of generator runs that the generation strategy is able to produce
# without more data or reaching its allowed max paralellism limit.
# debug_log_run_metadata: Whether to log run_metadata for debugging purposes.
# early_stopping_strategy: A ``BaseEarlyStoppingStrategy`` that determines
# whether a trial should be stopped given the current state of
# the experiment. Used in ``should_stop_trials_early``.
# global_stopping_strategy: A ``BaseGlobalStoppingStrategy`` that determines
# whether the full optimization should be stopped or not.
# suppress_storage_errors_after_retries: Whether to fully suppress SQL
# storage-related errors if encounted, after retrying the call
# multiple times. Only use if SQL storage is not important for the given
# use case, since this will only log, but not raise, an exception if
# it's encountered while saving to DB or loading from it.
#
# n_trials: Only run this many trials,
# in contrast to `total_trials` which is a hard limit, even after reloading the
# scheduler, this will run n_trials trials every time you reload the scheduler.
# Making it easier to use when reloading the scheduler and continuing to run trials.
scheduler:
max_pending_trials: '...'
trial_type: '...'
batch_size: '...'
total_trials: '...'
tolerated_trial_failure_rate: '...'
min_failed_trials_for_failure_rate_check: '...'
log_filepath: '...'
logging_level: '...'
ttl_seconds_for_trials: '...'
init_seconds_between_polls: '...'
min_seconds_before_poll: '...'
seconds_between_polls_backoff_factor: '...'
timeout_hours: '...'
run_trials_in_batches: '...'
debug_log_run_metadata: '...'
early_stopping_strategy: '...'
global_stopping_strategy: '...'
suppress_storage_errors_after_retries: '...'
# ######
# name #
# ######
name: '...'
# #######################
# parameter_constraints #
# #######################
parameter_constraints: []
# ###############
# model_options #
# ###############
model_options: '...'
# ################
# script_options #
# ################
script_options:
# Whether to use the config file as the base path for all relative paths.
# If True, all relative paths will be relative to the config file directory.
# Defaults to True if not specified.
# If launched through BOA CLI, this will be set to True automatically.
# rel_to_config and rel_to_launch cannot both be specified.
rel_to_config: '...'
# Whether to use the CLI launch directory as the base path for all relative paths.
# If True, all relative paths will be relative to the CLI launch directory.
# Defaults to `rel_to_config` argument if not specified.
# rel_to_config and rel_to_launch cannot both be specified.
rel_to_launch: '...'
# Name of the python wrapper class. Used for python interface only.
# Defaults to `Wrapper` if not specified.
wrapper_name: '...'
# Path to the python wrapper file. Used for python interface only.
# Defaults to `wrapper.py` if not specified.
wrapper_path: '...'
# Path to the working directory. Defaults to `.` (Current working directory) if not specified.
working_dir: '...'
# Path to the directory for the output of the experiment
# You may specify this or output_dir in your configuration file instead.
experiment_dir: '...'
# Output directory of project,
# If you specify output_dir, then output will be saved in
# output_dir / experiment_name
# Because of this only either experiment_dir or output_dir may be specified.
# (if neither experiment_dir nor output_dir are specified, output_dir defaults
# to whatever pwd returns (and equivalent on windows))
output_dir: '...'
# Whether to append a timestamp to the output directory to ensure uniqueness.
# Defaults to `True` if not specified.
append_timestamp: '...'
# Shell command to run the model. Used for the language-agnostic interface only.
# this is what BOA will do to launch your script.
# it will also pass as a command line argument the current trial directory
# that is be parameterized by BOA.
# `run_model` is the only needed shell command of these 4, because you
# can use it also to write your config, run your model, set your trial status,
# and fetch your trial data all in one script if you so choose. The
# other scripts are provided as a convenience to segment out your logic.
# This can either be a relative path or absolute path.
run_model: '...'
# Shell command to write your configs out. See `run_model` for more details.
write_configs: '...'
# Shell command to set your trial status. See `run_model` for more details.
set_trial_status: '...'
# Shell command to fetch your trial data. See `run_model` for more details.
fetch_trial_data: '...'
base_path: '...'
# ################
# parameter_keys #
# ################
parameter_keys: '...'
# #############
# config_path #
# #############
config_path: '...'
# ##########
# n_trials #
# ##########
n_trials: '...'