Non-myopic H-Entropy Search

This repo support Bayesian optimization experiments with non-myopic H-Entropy Search. Bayesian optimization is a widely used approach for making optimal decisions in uncertain scenarios by acquiring information through costly experiments. Many real-world applications can be cast as instances of this problem, ranging from designing biological sequences to conducting ground surveys. In these contexts, the cost associated with each experiment can be dynamic and non-uniform. For instance, in cases where each experiment corresponds to a location, there exists a variable travel cost contingent on the distances between successive experiments. Conventional Bayesian optimization techniques, often reliant on myopic acquisition functions and assuming a fixed cost structure, yield suboptimal results in dynamic cost environments. To address these limitations, we introduce a scalable nonmyopic acquisition function grounded in a decision-theoretic extension of mutual information. Our empirical evaluations demonstrate that our method outperforms numerous baseline approaches across a range of global optimization tasks.

There are two main experiments:

Synthetic experiments: We consider the synthetic environment with the following settings:
- 2D environment: Ackley, Alpine, Beale, Branin, EggHolder, Griewank, HolderTable, Levy, SixHumpCamel, StyblinskiTang, and SynGP
- 4D environment: Powell
- 6D environment: Hartmann
- 8D environment: Cosine8
Real-world experiments: We consider the real-world environment with protein sequence optimization.

How to reproduce

Install the requirements

 pip install -r requirements.txt
 or 
 conda env create -f environment.yml

Run the experiments by bash script scripts.sh

python _0_main.py [-h] [--seeds SEEDS [SEEDS ...]] [--task TASK] [--env_names ENV_NAMES [ENV_NAMES ...]] [--env_noise ENV_NOISE] [--env_discretized] [--algos ALGOS [ALGOS ...]]
                  [--algo_ts] [--algo_n_iterations ALGO_N_ITERATIONS] [--algo_lookahead_steps ALGO_LOOKAHEAD_STEPS] [--cost_spotlight_k COST_SPOTLIGHT_K] [--cost_p_norm COST_P_NORM]
                  [--cost_max_noise COST_MAX_NOISE] [--cost_discount COST_DISCOUNT] [--cost_discount_threshold COST_DISCOUNT_THRESHOLD] [--gpu_id GPU_ID [GPU_ID ...]]
                  [--continue_once CONTINUE_ONCE] [--test_only]

options:
  -h, --help            show this help message and exit
  --seeds SEEDS [SEEDS ...]
  --task TASK
  --env_names ENV_NAMES [ENV_NAMES ...]
  --env_noise ENV_NOISE
  --env_discretized
  --algos ALGOS [ALGOS ...]
  --algo_ts
  --algo_n_iterations ALGO_N_ITERATIONS
  --algo_lookahead_steps ALGO_LOOKAHEAD_STEPS
  --cost_spotlight_k COST_SPOTLIGHT_K
  --cost_p_norm COST_P_NORM
  --cost_max_noise COST_MAX_NOISE
  --cost_discount COST_DISCOUNT
  --cost_discount_threshold COST_DISCOUNT_THRESHOLD
  --gpu_id GPU_ID [GPU_ID ...]
  --continue_once CONTINUE_ONCE
  --test_only

Plot the results by command

python draw_regrets.py [ENV_NAMES]

Analyzing world models

python _0_main_gp.py [-h] [--seeds SEEDS [SEEDS ...]] [--env_names ENV_NAMES [ENV_NAMES ...]] [--env_noise ENV_NOISE] [--env_discretized] [--gpu_id GPU_ID]

options:
  -h, --help            show this help message and exit
  --seeds SEEDS [SEEDS ...]
  --env_names ENV_NAMES [ENV_NAMES ...]
  --env_noise ENV_NOISE
  --env_discretized
  --gpu_id GPU_ID

Running the real-world experiments

Train the oracle model

accelerate launch --main_process_port 29505 src/train_bash.py \
    --stage oracle \
    --do_train \
    --template default \
    --model_name_or_path facebook/esm2_t36_3B_UR50D \
    --use_fast_tokenizer True \
    --finetuning_type freeze \
    --flash_attn False \
    --dataset proteinea/fluorescence \
    --preprocessing_num_workers 32 \
    --num_train_epochs 10.0 \
    --bf16 False \
    --tf32 False \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --learning_rate 5e-05 \
    --lr_scheduler_type cosine \
    --max_grad_norm 1.0 \
    --logging_steps 1 \
    --warmup_ratio 0.01 \
    --save_steps 1000 \
    --output_dir ckpts/oracle2_test \
    --save_total_limit 5 \
    --report_to none \
    --plot_loss True

Running the experiments

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
ackey_function		ackey_function
archived_results		archived_results
gp_results		gp_results
llm_sequence_design		llm_sequence_design
results_backup		results_backup
scripts		scripts
sequence_function		sequence_function
wnb_configs		wnb_configs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_0_main.py		_0_main.py
_0_main_gp.py		_0_main_gp.py
_10_budgeted_bo.py		_10_budgeted_bo.py
_11_kernels.py		_11_kernels.py
_12_alpine.py		_12_alpine.py
_13_embedder.py		_13_embedder.py
_14_sequence_design_func.py		_14_sequence_design_func.py
_15_syngp.py		_15_syngp.py
_16_env_wrapper.py		_16_env_wrapper.py
_17_logcos.py		_17_logcos.py
_1_run.py		_1_run.py
_2_actor.py		_2_actor.py
_3_amortized_network.py		_3_amortized_network.py
_3_amortized_network_antbo.py		_3_amortized_network_antbo.py
_4_qhes.py		_4_qhes.py
_4_qhes_ts.py		_4_qhes_ts.py
_5_evalplot.py		_5_evalplot.py
_6_samplers.py		_6_samplers.py
_7_utils.py		_7_utils.py
_8_tools.py		_8_tools.py
_9_semifuncs.py		_9_semifuncs.py
amortized-inference.zip		amortized-inference.zip
compute_metrics.py		compute_metrics.py
draw_metrics.py		draw_metrics.py
draw_regrets.py		draw_regrets.py
environment.yml		environment.yml
requirements.txt		requirements.txt
test_antbo.py		test_antbo.py

License

sangttruong/nonmyopia

Folders and files

Latest commit

History

Repository files navigation

Non-myopic H-Entropy Search

How to reproduce

Analyzing world models

Running the real-world experiments

nonmyopia_anonymous

About

Topics

Resources

License

Stars

Watchers

Forks

Languages