Configuration Guide#

Example Configurations#

This configuration can be generated from BOA with the following command:

python -m boa.config --output-path [path to output]

Default Configuration#

# ###########
# objective #
# ###########

# Your objective to be optimized by BOA.
# This can be a single objective, scalarized objective, or a multi-objective (pareto objective).
# For a single objective, list a single metric in the metrics field.
# For a multi-objective, list multiple metrics in the metrics field.
# For a scalarized objective, list multiple metrics in the metrics field and specify the
# weights for each metric in each metrics weight field.
objective:
  # A list of BOAMetric objects that represent the metrics to be used in the objective.
  metrics:
  - name: metric1  # Name of the metric. This is used to identify the metric in your wrapper script.
      # metrics to be used for optimization. You can use list any metric in built into BOA.
      # Those metrics can be found here: :mod:`Metrics <boa.metrics.metrics>`.
      # If no metric is specified, a :class:`pass through<.PassThrough>` metric will be used.
      # Which means that the metric will be computed by the user and passed to BOA.
      # You can also use any metric from sklearn by passing in the name of the metric
      # and metric type as `sklearn_metric`.
      # You can also use any metric from the Ax's or BoTorch's synthetic metrics modules by
      # passing in the name of the metric and metric type as `synthetic_metric`.
    metric: RMSE
  # String representation of outcome constraint of metrics.
  # This bounds a metric (or linear combination of metrics)
  # by some bound (>= or <=).
  # (ex. ['metric1 >= 0.0', 'metric2 <= 1.0', '2*metric1 + .5*metric2 <= 1.0'])
  outcome_constraints: '...'
  # String representation of Objective Thresholds for multi-objective optimization.
  # An objective threshold represents the threshold for an objective metric
  # to contribute to hypervolume calculations. A list containing the objective
  # threshold for each metric collectively form a reference point.
  # Because the objective thresholds are used to calculate hypervolume, they
  # can only be used for multi-objective optimization.
  # (ex. ['metric1 >= 0.0', 'metric2 <= 1.0'])
  objective_thresholds: '...'
  # A boolean that indicates whether the scalarized objective should be minimized or maximized.
  # Only used for scalarized objectives because each metric can have its own minimize flag.
  # Will be ignored for non scalarized objectives.
  minimize: '...'

# ############
# parameters #
# ############

# Parameters to optimize over. This can be expressed in two ways. The first is a list of dictionaries, where each
# dictionary represents a parameter. The second is a dictionary of dictionaries, where the key is the name of the
# parameter and the value is the dictionary representing the parameter.

# .. code-block:: yaml

#     ## Dictionary of dictionaries
#     x1:
#         type: range
#         bounds: [0, 1]
#         value_type: float
#     x2:
#         type: range
#         bounds: [0.0, 1.0]  # value_type is inferred from bounds

# .. code-block:: yaml

#     ## List of dictionaries 
#     -   name: x1
#         type: range
#         bounds: [0, 1]
#         value_type: float

# .. code-block:: yaml    


#     ## Fixed Types 
#     x3: 4.0  # Fixed type, value is 4.0
#     x4:
#         type: fixed
#         value: "some string"  # Fixed type, value is "some string"

#     ## Choice Options 
#     x5:
#         type: choice
#         values: ["a", "b"]
parameters:
  x1:
    type: range
    bounds:
    - 0
    - 1
    value_type: float
  x2:
    type: choice
    values:
    - a
    - b
    - c
  x3: 4.0

# #####################
# generation_strategy #
# #####################

# Your generation strategy is how new trials will be generated, that is, what acquisition function
# will be used to select the next trial, what kernel will be used to model the objective function,
# as well as other options such as max parallelism.

# This is an optional section. If not specified, Ax will choose a generation strategy for you.
# Based on your objective, parameters, and other options. You can pass options to how Ax chooses
# a generation strategy by passing options under `generation_strategy`.

# Taken from Ax's documentation:
# Select an appropriate generation strategy based on the properties of
#     the search space and expected settings of the experiment, such as number of
#     arms per trial, optimization algorithm settings, expected number of trials
#     in the experiment, etc.

#     Args:
#         search_space: SearchSpace, based on the properties of which to select the
#             generation strategy.
#         use_batch_trials: Whether this generation strategy will be used to generate
#             batched trials instead of 1-arm trials.
#         enforce_sequential_optimization: Whether to enforce that 1) the generation
#             strategy needs to be updated with ``min_trials_observed`` observations for
#             a given generation step before proceeding to the next one and 2) maximum
#             number of trials running at once (max_parallelism) if enforced for the
#             BayesOpt step. NOTE: ``max_parallelism_override`` and
#             ``max_parallelism_cap`` settings will still take their effect on max
#             parallelism even if ``enforce_sequential_optimization=False``, so if those
#             settings are specified, max parallelism will be enforced.
#         random_seed: Fixed random seed for the Sobol generator.
#         torch_device: The device to use for generation steps implemented in PyTorch
#             (e.g. via BoTorch). Some generation steps (in particular EHVI-based ones
#             for multi-objective optimization) can be sped up by running candidate
#             generation on the GPU. If not specified, uses the default torch device
#             (usually the CPU).
#         no_winsorization: Whether to apply the winsorization transform
#             prior to applying other transforms for fitting the BoTorch model.
#         winsorization_config: Explicit winsorization settings, if winsorizing. Usually
#             only `upper_quantile_margin` is set when minimizing, and only
#             `lower_quantile_margin` when maximizing.
#         derelativize_with_raw_status_quo: Whether to derelativize using the raw status
#             quo values in any transforms. This argument is primarily to allow automatic
#             Winsorization when relative constraints are present. Note: automatic
#             Winsorization will fail if this is set to `False` (or unset) and there
#             are relative constraints present.
#         no_bayesian_optimization: If True, Bayesian optimization generation
#             strategy will not be suggested and quasi-random strategy will be used.
#         num_trials: Total number of trials in the optimization, if
#             known in advance.
#         num_initialization_trials: Specific number of initialization trials, if wanted.
#             Typically, initialization trials are generated quasi-randomly.
#         max_initialization_trials: If ``num_initialization_trials`` unspecified, it
#             will be determined automatically. This arg provides a cap on that
#             automatically determined number.
#         num_completed_initialization_trials: The final calculated number of
#             initialization trials is reduced by this number. This is useful when
#             warm-starting an experiment, to specify what number of completed trials
#             can be used to satisfy the initialization_trial requirement.
#         max_parallelism_cap: Integer cap on parallelism in this generation strategy.
#             If specified, ``max_parallelism`` setting in each generation step will be
#             set to the minimum of the default setting for that step and the value of
#             this cap. ``max_parallelism_cap`` is meant to just be a hard limit on
#             parallelism (e.g. to avoid overloading machine(s) that evaluate the
#             experiment trials). Specify only if not specifying
#             ``max_parallelism_override``.
#         max_parallelism_override: Integer, with which to override the default max
#             parallelism setting for all steps in the generation strategy returned from
#             this function. Each generation step has a ``max_parallelism`` value, which
#             restricts how many trials can run simultaneously during a given generation
#             step. By default, the parallelism setting is chosen as appropriate for the
#             model in a given generation step. If ``max_parallelism_override`` is -1,
#             no max parallelism will be enforced for any step of the generation
#             strategy. Be aware that parallelism is limited to improve performance of
#             Bayesian optimization, so only disable its limiting if necessary.
#         optimization_config: used to infer whether to use MOO and will be passed in to
#             ``Winsorize`` via its ``transform_config`` in order to determine default
#             winsorization behavior when necessary.
#         should_deduplicate: Whether to deduplicate the parameters of proposed arms
#             against those of previous arms via rejection sampling. If this is True,
#             the generation strategy will discard generator runs produced from the
#             generation step that has `should_deduplicate=True` if they contain arms
#             already present on the experiment and replace them with new generator runs.
#             If no generator run with entirely unique arms could be produced in 5
#             attempts, a `GenerationStrategyRepeatedPoints` error will be raised, as we
#             assume that the optimization converged when the model can no longer suggest
#             unique arms.
#         use_saasbo: Whether to use SAAS prior for any GPEI generation steps.
#         verbose: Whether GP model should produce verbose logs. If not ``None``, its
#             value gets added to ``model_kwargs`` during ``generation_strategy``
#             construction. Defaults to ``True`` for SAASBO, else ``None``. Verbose
#             outputs are currently only available for SAASBO, so if ``verbose is not
#             None`` for a different model type, it will be overridden to ``None`` with
#             a warning.
#         disable_progbar: Whether GP model should produce a progress bar. If not
#             ``None``, its value gets added to ``model_kwargs`` during
#             ``generation_strategy`` construction. Defaults to ``True`` for SAASBO, else
#             ``None``. Progress bars are currently only available for SAASBO, so if
#             ``disable_probar is not None`` for a different model type, it will be
#             overridden to ``None`` with a warning.
#         jit_compile: Whether to use jit compilation in Pyro when SAASBO is used.
#         experiment: If specified, ``_experiment`` attribute of the generation strategy
#             will be set to this experiment (useful for associating a generation
#             strategy with a given experiment before it's first used to ``gen`` with
#             that experiment). Can also provide `optimization_config` if it is not
#             provided as an arg to this function.
#         use_update: Whether to use ``ModelBridge.update`` to update the model with
#             new data rather than fitting it from scratch. This is much more efficient,
#             particularly when running trials in parallel. Note that this is not
#             compatible with metrics that are available while running.
#             It will default to True if using SAASBO and the given experiment does not
#             have any metrics that are available while running.
#     

# See https://ax.dev/tutorials/generation_strategy.html and 
# https://ax.dev/api/modelbridge.html#ax.modelbridge.dispatch_utils.choose_generation_strategy 
# For specific options. 

# If you want to specify your own generation strategy, you can do so by passing a list of
# steps under `generation_strategy.steps`

# .. code-block:: yaml

#     generation_strategy:
#         # Use Ax's SAASBO algorithm, which is particularly well suited for high dimensional problems
#         use_saasbo: true
#         max_parallelism_cap: 10  # Maximum number of trials allowed to run in parallel

# Other options are possible, 
# see https://ax.dev/tutorials/generation_strategy.html#1A.-Manually-configured-generation-strategy
# and Models from ax.modelbridge.registry.py for more options
# Some options include SOBOL, GPEI, Thompson, GPKG (knowledge gradient), and others.
# See https://ax.dev/api/modelbridge.html#ax.modelbridge.generation_node.GenerationStep
# For specific options you can pass to each step

# .. code-block:: yaml

#     generation_strategy:
#     steps:
#         -   model: SOBOL
#             num_trials: 20
#         -   model: GPEI  # Gaussian Process with Expected Improvement
#             num_trials: -1
#             max_parallelism: 10  # Maximum number of trials allowed to run in parallel
generation_strategy: {}

# ###########
# scheduler #
# ###########
# Settings for a scheduler instance.

#     Attributes:
#         max_pending_trials: Maximum number of pending trials the scheduler
#             can have ``STAGED`` or ``RUNNING`` at once, required. If looking
#             to use ``Runner.poll_available_capacity`` as a primary guide for
#             how many trials should be pending at a given time, set this limit
#             to a high number, as an upper bound on number of trials that
#             should not be exceeded.
#         trial_type: Type of trials (1-arm ``Trial`` or multi-arm ``Batch
#             Trial``) that will be deployed using the scheduler. Defaults
#             to 1-arm `Trial`. NOTE: use ``BatchTrial`` only if need to
#             evaluate multiple arms *together*, e.g. in an A/B-test
#             influenced by data nonstationarity. For cases where just
#             deploying multiple arms at once is beneficial but the trials
#             are evaluated *independently*, implement ``run_trials`` method
#             in scheduler subclass, to deploy multiple 1-arm trials at
#             the same time.
#         batch_size: If using BatchTrial the number of arms to be generated and
#             deployed per trial.
#         total_trials: Limit on number of trials a given ``Scheduler``
#             should run. If no stopping criteria are implemented on
#             a given scheduler, exhaustion of this number of trials
#             will be used as default stopping criterion in
#             ``Scheduler.run_all_trials``. Required to be non-null if
#             using ``Scheduler.run_all_trials`` (not required for
#             ``Scheduler.run_n_trials``).
#         tolerated_trial_failure_rate: Fraction of trials in this
#             optimization that are allowed to fail without the whole
#             optimization ending. Expects value between 0 and 1.
#             NOTE: Failure rate checks begin once
#             min_failed_trials_for_failure_rate_check trials have
#             failed; after that point if the ratio of failed trials
#             to total trials ran so far exceeds the failure rate,
#             the optimization will halt.
#         min_failed_trials_for_failure_rate_check: The minimum number
#             of trials that must fail in `Scheduler` in order to start
#             checking failure rate.
#         log_filepath: File, to which to write optimization logs.
#         logging_level: Minimum level of logging statements to log,
#             defaults to ``logging.INFO``.
#         ttl_seconds_for_trials: Optional TTL for all trials created
#             within this ``Scheduler``, in seconds. Trials that remain
#             ``RUNNING`` for more than their TTL seconds will be marked
#             ``FAILED`` once the TTL elapses and may be re-suggested by
#             the Ax optimization models.
#         init_seconds_between_polls: Initial wait between rounds of
#             polling, in seconds. Relevant if using the default wait-
#             for-completed-runs functionality of the base ``Scheduler``
#             (if ``wait_for_completed_trials_and_report_results`` is not
#             overridden). With the default waiting, every time a poll
#             returns that no trial evaluations completed, wait
#             time will increase; once some completed trial evaluations
#             are found, it will reset back to this value. Specify 0
#             to not introduce any wait between polls.
#         min_seconds_before_poll: Minimum number of seconds between
#             beginning to run a trial and the first poll to check
#             trial status.
#         timeout_hours: Number of hours after which the optimization will abort.
#         seconds_between_polls_backoff_factor: The rate at which the poll
#             interval increases.
#         run_trials_in_batches: If True and ``poll_available_capacity`` is
#             implemented to return non-null results, trials will be dispatched
#             in groups via `run_trials` instead of one-by-one via ``run_trial``.
#             This allows to save time, IO calls or computation in cases where
#             dispatching trials in groups is more efficient then sequential
#             deployment. The size of the groups will be determined as
#             the minimum of ``self.poll_available_capacity()`` and the number
#             of generator runs that the generation strategy is able to produce
#             without more data or reaching its allowed max paralellism limit.
#         debug_log_run_metadata: Whether to log run_metadata for debugging purposes.
#         early_stopping_strategy: A ``BaseEarlyStoppingStrategy`` that determines
#             whether a trial should be stopped given the current state of
#             the experiment. Used in ``should_stop_trials_early``.
#         global_stopping_strategy: A ``BaseGlobalStoppingStrategy`` that determines
#             whether the full optimization should be stopped or not.
#         suppress_storage_errors_after_retries: Whether to fully suppress SQL
#             storage-related errors if encounted, after retrying the call
#             multiple times. Only use if SQL storage is not important for the given
#             use case, since this will only log, but not raise, an exception if
#             it's encountered while saving to DB or loading from it.
#     
#         n_trials: Only run this many trials,
#             in contrast to `total_trials` which is a hard limit, even after reloading the
#             scheduler, this will run n_trials trials every time you reload the scheduler.
#             Making it easier to use when reloading the scheduler and continuing to run trials.
scheduler:
  max_pending_trials: '...'
  trial_type: '...'
  batch_size: '...'
  total_trials: '...'
  tolerated_trial_failure_rate: '...'
  min_failed_trials_for_failure_rate_check: '...'
  log_filepath: '...'
  logging_level: '...'
  ttl_seconds_for_trials: '...'
  init_seconds_between_polls: '...'
  min_seconds_before_poll: '...'
  seconds_between_polls_backoff_factor: '...'
  timeout_hours: '...'
  run_trials_in_batches: '...'
  debug_log_run_metadata: '...'
  early_stopping_strategy: '...'
  global_stopping_strategy: '...'
  suppress_storage_errors_after_retries: '...'

# ######
# name #
# ######

name: '...'

# #######################
# parameter_constraints #
# #######################

parameter_constraints: []

# ###############
# model_options #
# ###############

model_options: '...'

# ################
# script_options #
# ################

script_options:

  # Whether to use the config file as the base path for all relative paths.
  # If True, all relative paths will be relative to the config file directory.
  # Defaults to True if not specified.
  # If launched through BOA CLI, this will be set to True automatically.
  # rel_to_config and rel_to_launch cannot both be specified.
  rel_to_config: '...'

  # Whether to use the CLI launch directory as the base path for all relative paths.
  # If True, all relative paths will be relative to the CLI launch directory.
  # Defaults to `rel_to_config` argument if not specified.
  # rel_to_config and rel_to_launch cannot both be specified.
  rel_to_launch: '...'
  # Name of the python wrapper class. Used for python interface only.
  # Defaults to `Wrapper` if not specified.
  wrapper_name: '...'
  # Path to the python wrapper file. Used for python interface only.
  # Defaults to `wrapper.py` if not specified.
  wrapper_path: '...'
  # Path to the working directory. Defaults to `.` (Current working directory) if not specified.
  working_dir: '...'
  # Path to the directory for the output of the experiment
  # You may specify this or output_dir in your configuration file instead.
  experiment_dir: '...'
  # Output directory of project,
  # If you specify output_dir, then output will be saved in
  # output_dir / experiment_name
  # Because of this only either experiment_dir or output_dir may be specified.
  # (if neither experiment_dir nor output_dir are specified, output_dir defaults
  # to whatever pwd returns (and equivalent on windows))
  output_dir: '...'
  # Whether to append a timestamp to the output directory to ensure uniqueness.
  # Defaults to `True` if not specified.
  append_timestamp: '...'
  # Shell command to run the model. Used for the language-agnostic interface only.
  # this is what BOA will do to launch your script.
  # it will also pass as a command line argument the current trial directory
  # that is be parameterized by BOA.
  # `run_model` is the only needed shell command of these 4, because you
  # can use it also to write your config, run your model, set your trial status,
  # and fetch your trial data all in one script if you so choose. The
  # other scripts are provided as a convenience to segment out your logic.
  # This can either be a relative path or absolute path.
  run_model: '...'
  # Shell command to write your configs out. See `run_model` for more details.
  write_configs: '...'
  # Shell command to set your trial status. See `run_model` for more details.
  set_trial_status: '...'
  # Shell command to fetch your trial data. See `run_model` for more details.
  fetch_trial_data: '...'
  base_path: '...'

# ################
# parameter_keys #
# ################

parameter_keys: '...'

# #############
# config_path #
# #############

config_path: '...'

# ##########
# n_trials #
# ##########

n_trials: '...'

Configuration Detailed Reference#