DesignBench
DesignBench is a benchmarking framework for solving automatic design problems that involve choosing an input that maximizes a blackbox function. This type of optimization is used across scientific and engineering disciplines in ways such as designing proteins and DNA sequences with particular functions, chemical formulas and molecule substructures, the morphology and controllers of robots, and many more applications.
These applications have significant potential to accelerate research in biochemistry, chemical engineering, materials science, robotics and many other disciplines. We hope this framework serves as a robust platform to drive these applications and create widespread excitement for modelbased optimization.
Offline ModelBased Optimization
The goal of modelbased optimization is to find an input x that maximizes an unknown blackbox function f. This function is frequently difficulty or costly to evaluatesuch as requiring wetlab experiments in the case of protein design. In these cases, f is described by a set of function evaluations: D = {(x_0, y_0), (x_1, y_1), ... (x_n, y_n)}, and optimization is performed without querying f on new data points.
Installation
DesignBench can be installed with the complete set of benchmarks via our pip package.
conda create n mbo c condaforge rdkit
pip install designbench[all]==2.0.20
pip install morphingagents==1.5.1
Alternatively, if you do not have MuJoCo, you may opt for a minimal install.
conda create n mbo c condaforge rdkit
pip install designbench==2.0.20
Available Tasks
In the below table, we list the supported datasets and objective functions for modelbased optimization, where a
Dataset \ Oracle  Exact  Gaussian Process  Random Forest  Fully Connected  LSTM  ResNet  Transformer 

TF Bind 8 







GFP 







ChEMBL 







UTR 







Hopper Controller 





Superconductor 




Ant Morphology 





D'Kitty Morphology 




Combinations of datasets and oracles that are not available for download from our server are automatically trained on your machine on task creation. This currently only affects approximate oracles on userdefined MBO tasks. Below we provide the preferred oracle for each task, as well as meta data such as the number of data points measured.
Task Name  Dataset  Oracle  Dataset Size  Spearman's ρ 

TFBind8Exactv0  TF Bind 8  Exact  65792  
GFPTransformerv0  GFP  Transformer  56086  0.8497 
ChEMBLResNetv0  ChEMBL  ResNet  40516  0.3208 
UTRResNetv0  UTR  ResNet  280000  0.8617 
HopperControllerExactv0  Hopper Controller  Exact  3200  
SuperconductorRandomForestv0  Superconductor  Random Forest  21263  0.9155 
AntMorphologyExactv0  Ant Morphology  Exact  25009  
DKittyMorphologyExactv0  D'Kitty Morphology  Exact  25009 
Performance Of Baselines
We benchmark a set of 9 methods for solving offline modelbased optimization problems. Performance is reported in normalized form, where the 100th percentile score of 128 candidate designs is evaluated and normalized such that a 1.0 corresponds to performance equivalent to the best performing design in the full unobserved dataset assoctated with each modelbased optimization task. A 0.0 corresponds to performance equivalent to the worst performing design in the full unobserved dataset. In circumstances where an exact oracle is not available, this full unobserved dataset is used for training the approximate oracle that is used for evaluation of candidate designs proposed by each method. The symbol ± indicates the empirical standard deviation of reported performance across 8 trials.
Method \ Task  GFP  TF Bind 8  UTR  ChEMBL 

Auto. CbAS  0.865 ± 0.000  0.910 ± 0.044  0.650 ± 0.006  0.470 ± 0.000 
CbAS  0.865 ± 0.000  0.927 ± 0.051  0.650 ± 0.002  0.517 ± 0.055 
BOqEI  0.254 ± 0.352  0.798 ± 0.083  0.659 ± 0.000  0.333 ± 0.035 
CMAES  0.054 ± 0.002  0.953 ± 0.022  0.666 ± 0.004  0.350 ± 0.017 
Grad.  0.864 ± 0.001  0.977 ± 0.025  0.639 ± 0.009  0.360 ± 0.029 
Grad. Min  0.864 ± 0.000  0.984 ± 0.012  0.647 ± 0.007  0.361 ± 0.004 
Grad. Mean  0.864 ± 0.000  0.986 ± 0.012  0.647 ± 0.005  0.373 ± 0.013 
MINs  0.865 ± 0.001  0.905 ± 0.052  0.649 ± 0.004  0.473 ± 0.057 
REINFORCE  0.865 ± 0.000  0.948 ± 0.028  0.646 ± 0.005  0.459 ± 0.036 
Performance On Discrete Tasks.
Method \ Task  Superconductor  Ant Morphology  D'Kitty Morphology  Hopper Controller 

Auto. CbAS  0.421 ± 0.045  0.884 ± 0.046  0.906 ± 0.006  0.137 ± 0.005 
CbAS  0.503 ± 0.069  0.879 ± 0.032  0.892 ± 0.008  0.141 ± 0.012 
BOqEI  0.402 ± 0.034  0.820 ± 0.000  0.896 ± 0.000  0.550 ± 0.118 
CMAES  0.465 ± 0.024  1.219 ± 0.738  0.724 ± 0.001  0.604 ± 0.215 
Grad.  0.518 ± 0.024  0.291 ± 0.023  0.874 ± 0.022  1.035 ± 0.482 
Grad. Min  0.506 ± 0.009  0.478 ± 0.064  0.889 ± 0.011  1.391 ± 0.589 
Grad. Mean  0.499 ± 0.017  0.444 ± 0.081  0.892 ± 0.011  1.586 ± 0.454 
MINs  0.469 ± 0.023  0.916 ± 0.036  0.945 ± 0.012  0.424 ± 0.166 
REINFORCE  0.481 ± 0.013  0.263 ± 0.032  0.562 ± 0.196  0.020 ± 0.067 
Performance On Continuous Tasks.
Reproducing Baseline Performance
In order to reproduce this table, you must first install the implementation of the baseline algorithms.
git clone https://github.com/brandontrabucco/designbaselines
conda env create f designbaselines/environment.yml
conda activate designbaselines
You may then run the following series of commands in a bash terminal using the commandline interface exposed in designbaselines. Also, please ensure that the conda environment designbaselines
is activated in the bash session that you run these commands from in order to access the designbaselines
commandline interface.
# set up machine parameters
NUM_CPUS=32
NUM_GPUS=8
for TASK_NAME in \
gfp \
tfbind8 \
utr \
chembl \
superconductor \
ant \
dkitty \
hopper; do
for ALGORITHM_NAME in \
autofocusedcbas \
cbas \
boqei \
cmaes \
gradientascent \
gradientascentminensemble \
gradientascentmeanensemble \
mins \
reinforce; do
# launch several modelbased optimization algorithms using the command line interface
# for example:
# (designbaselines) name@computer:~/$ cbas gfp \
# localdir ~/dbresults/cbasgfp \
# cpus 32 \
# gpus 8 \
# numparallel 8 \
# numsamples 8
$ALGORITHM_NAME $TASK_NAME \
localdir ~/dbresults/$ALGORITHM_NAME$TASK_NAME \
cpus $NUM_CPUS \
gpus $NUM_GPUS \
numparallel 8 \
numsamples 8
done
done
# generate the main performance table of the paper
designbaselines maketable dir ~/dbresults/ percentile 100th
# generate the performance tables in the appendix
designbaselines maketable dir ~/dbresults/ percentile 50th
designbaselines maketable dir ~/dbresults/ percentile 100th nonormalize
These commands will run several modelbased optimization algorithms (such as CbAS) contained in designbaselines on all tasks released with the designbench benchmark, and will then generate three performance tables from those results, and print a latex rendition of these performance tables to stdout.
The TrainTest Discrepency
For tasks where an exact numerical ground truth is not available for evaluating the performance of previously unseen candidate designs, we provide several families of approximate oracle models that have been trained using a larger held out dataset of designs x and corresponding scores y.
Using a learned oracle for evaluation and training an MBO method using real data creates a traintest discrepency. This discrepency can be avoided by relabelling the y values in an offline MBO dataset with the predictions of the learned oracle, which is controlled by the following parameter when building a task.
import design_bench
# instantiate the task using y values generated from the learned oracle
task = design_bench.make('GFPTransformerv0', relabel=True)
# instantiate the task using y values generated from real experiments
task = design_bench.make('GFPTransformerv0', relabel=False)
Task API
DesignBench tasks share a common interface specified in design_bench/task.py, which exposes a set of input designs task.x and a set of output predictions task.y. In addition, the performance of a new set of input designs (such as those output from a modelbased optimization algorithm) can be found using y = task.predict(x).
import design_bench
task = design_bench.make('TFBind8Exactv0')
def solve_optimization_problem(x0, y0):
return x0 # solve a modelbased optimization problem
# solve for the best input x_star and evaluate it
x_star = solve_optimization_problem(task.x, task.y)
y_star = task.predict(x_star)
Many datasets of interest to practitioners are too large to load in memory all at once, and so the task interface defines an several iterables that load samples from the dataset incrementally.
import design_bench
task = design_bench.make('TFBind8Exactv0')
for x, y in task:
pass # train a model here
for x, y in task.iterate_batches(32):
pass # train a model here
for x, y in task.iterate_samples():
pass # train a model here
Certain optimization algorithms require a particular input format, and so tasks support normalization of both task.x and task.y, as well as conversion of task.x from discrete tokens to the logits of a categorical probability distributionneeded when optimizing x with a gradientbased modelbased optimization algorithm.
import design_bench
task = design_bench.make('TFBind8Exactv0')
# convert x to logits of a categorical probability distribution
task.map_to_logits()
discrete_x = task.to_integers(task.x)
# normalize the inputs to have zero mean and unit variance
task.map_normalize_x()
original_x = task.denormalize_x(task.x)
# normalize the outputs to have zero mean and unit variance
task.map_normalize_y()
original_y = task.denormalize_y(task.y)
# remove the normalization applied to the outputs
task.map_denormalize_y()
normalized_y = task.normalize_y(task.y)
# remove the normalization applied to the inputs
task.map_denormalize_x()
normalized_x = task.normalize_x(task.x)
# convert x back to integers
task.map_to_integers()
continuous_x = task.to_logits(task.x)
Each task provides access to the modelbased optimization dataset used to learn the oracle (where applicable) as well as the oracle itself, which includes metadata for how it was trained (where applicable). These provide finegrain control over the data distribution for modelbased optimization.
import design_bench
task = design_bench.make('GFPGPv0')
# an instance of the DatasetBuilder class from design_bench.datasets.dataset_builder
dataset = task.dataset
# modify the distribution of the task dataset
dataset.subsample(max_samples=10000,
distribution="uniform",
min_percentile=10,
max_percentile=90)
# an instance of the OracleBuilder class from design_bench.oracles.oracle_builder
oracle = task.oracle
# check how the model was fit
print(oracle.params["rank_correlation"],
oracle.params["model_kwargs"],
oracle.params["split_kwargs"])
Dataset API
Datasets provide a modelbased optimization algorithm with information about the blackbox function, and are used in design bench to fit approximate oracle models when an exact oracle is not available. All datasets inherit from the DatasetBuilder class defined in design_bench.datasets.dataset_builder.
All datasets implement methods for modifying the format and distribution of the dataset, including normalization, subsampling, relabelling the outputs, and (for discrete datasets) converting discrete inputs to realvalued. There are also special methods for splitting the dataset into a training and validation set.
Display code snippet
from design_bench.datasets.discrete.gfp_dataset import GFPDataset
dataset = GFPDataset()
# convert x to logits of a categorical probability distribution
dataset.map_to_logits()
discrete_x = dataset.to_integers(dataset.x)
# normalize the inputs to have zero mean and unit variance
dataset.map_normalize_x()
original_x = dataset.denormalize_x(dataset.x)
# normalize the outputs to have zero mean and unit variance
dataset.map_normalize_y()
original_y = dataset.denormalize_y(dataset.y)
# remove the normalization applied to the outputs
dataset.map_denormalize_y()
normalized_y = dataset.normalize_y(dataset.y)
# remove the normalization applied to the inputs
dataset.map_denormalize_x()
normalized_x = dataset.normalize_x(dataset.x)
# convert x back to integers
dataset.map_to_integers()
continuous_x = dataset.to_logits(dataset.x)
# modify the distribution of the dataset
dataset.subsample(max_samples=10000,
distribution="uniform",
min_percentile=10,
max_percentile=90)
# change the outputs as a function of their old values
dataset.relabel(lambda x, y: y ** 2  2.0 * y)
# split the dataset into a validation set
training, validation = dataset.split(val_fraction=0.1)
If you would like to define your own dataset for use with designbench, you can directly instantiate a continuous dataset or a discrete dataset depending on the input format you are using. The DiscreteDataset class and ContinuousDataset are built with this in mind, and accept both two numpy arrays containing inputs x outputs y.
Display code snippet
from design_bench.datasets.discrete_dataset import DiscreteDataset
from design_bench.datasets.continuous_dataset import ContinuousDataset
import numpy as np
# create dummy inputs and outputs for modelbased optimization
x = np.random.randint(500, size=(5000, 43))
y = np.random.uniform(size=(5000, 1))
# create a discrete dataset for those inputs and outputs
dataset = DiscreteDataset(x, y)
# create dummy inputs and outputs for modelbased optimization
x = np.random.uniform(size=(5000, 871))
y = np.random.uniform(size=(5000, 1))
# create a continuous dataset for those inputs and outputs
dataset = ContinuousDataset(x, y)
In the event that you are using a dataset that is saved to a set of sharded numpy files (ending in .npy), you may also create dataset by providing a list of shard files representing using the DiskResource class. The DiscreteDataset class and ContinuousDataset accept two lists of sharded inputs x and outputs y represented by DiskResource objects.
Display code snippet
from design_bench.disk_resource import DiskResource
from design_bench.datasets.discrete_dataset import DiscreteDataset
from design_bench.datasets.continuous_dataset import ContinuousDataset
import os
import numpy as np
# list the disk resource for each shard
os.makedirs("new_dataset/")
x = [DiskResource("new_dataset/shardx0.npy"),
DiskResource("new_dataset/shardx1.npy")]
y = [DiskResource("new_dataset/shardy0.npy"),
DiskResource("new_dataset/shardy1.npy")]
# create dummy inputs and outputs for modelbased optimization
xs = np.random.randint(500, size=(5000, 43))
ys = np.random.uniform(size=(5000, 1))
# save the dataset to a set of shard files
np.save("new_dataset/shardx0.npy", xs[:3000])
np.save("new_dataset/shardx1.npy", xs[3000:])
np.save("new_dataset/shardy0.npy", ys[:3000])
np.save("new_dataset/shardy1.npy", ys[3000:])
# create a discrete dataset for those inputs and outputs
dataset = DiscreteDataset(x, y)
# create dummy inputs and outputs for modelbased optimization
xs = np.random.uniform(size=(5000, 871))
ys = np.random.uniform(size=(5000, 1))
# save the dataset to a set of shard files
np.save("new_dataset/shardx0.npy", xs[:3000])
np.save("new_dataset/shardx1.npy", xs[3000:])
np.save("new_dataset/shardy0.npy", ys[:3000])
np.save("new_dataset/shardy1.npy", ys[3000:])
# create a continuous dataset for those inputs and outputs
dataset = ContinuousDataset(x, y)
Oracle API
Oracles provide a way of measuring the performance of candidate solutions to a modelbased optimization problem, found by a modelbased optimization algorithm, without having to perform additional realworld experiments. To this end, oracle implement a prediction function oracle.predict(x) that takes a set of designs and makes a prediction about their performance. The goal of modelbased optimization is to maximize the predictions of the oracle.
Display code snippet
from design_bench.datasets.discrete.gfp_dataset import GFPDataset
from design_bench.oracles.tensorflow import TransformerOracle
# create a dataset and a noisy oracle
dataset = GFPDataset()
oracle = TransformerOracle(dataset, noise_std=0.1)
def solve_optimization_problem(x0, y0):
return x0 # solve a modelbased optimization problem
# evaluate the performance of the solution x_star
x_star = solve_optimization_problem(dataset.x, dataset.y)
y_star = oracle.predict(x_star)
In order to handle when an exact ground truth is unknown or not tractable to evaluate, DesignBench provides a set of approximate oracles including a Gaussian Process, Random Forest, and several deep neural network architectures specialized to particular data modalities. These approximate oracles may have the following parameters.
Display code snippet
from design_bench.datasets.discrete.gfp_dataset import GFPDataset
from design_bench.oracles.tensorflow import TransformerOracle
# parameters for the transformer architecture
model_kwargs=dict(
hidden_size=64,
feed_forward_size=256,
activation='relu',
num_heads=2,
num_blocks=4,
epochs=20,
shuffle_buffer=60000,
learning_rate=0.0001,
dropout_rate=0.1)
# parameters for building the validation set
split_kwargs=dict(
val_fraction=0.1,
subset=None,
shard_size=5000,
to_disk=True,
disk_target="gfp/split",
is_absolute=False)
# create a transformer oracle for the GFP dataset
dataset = GFPDataset()
oracle = TransformerOracle(
dataset,
noise_std=0.0,
# parameters for ApproximateOracle subclasses
disk_target="new_model.zip",
is_absolute=True,
fit=True,
max_samples=None,
distribution=None,
max_percentile=100,
min_percentile=0,
model_kwargs=model_kwargs,
split_kwargs=split_kwargs)
def solve_optimization_problem(x0, y0):
return x0 # solve a modelbased optimization problem
# evaluate the performance of the solution x_star
x_star = solve_optimization_problem(dataset.x, dataset.y)
y_star = oracle.predict(x_star)
Defining New MBO Tasks
New modelbased optimization tasks are simple to create and register with designbench. By subclassing either DiscreteDataset or ContinuousDataset, and providing either a pair of numpy arrays containing inputs and outputs, or a pair of lists of DiskResource shards containing inputs and outputs, you can define your own modelbased optimization dataset class. Once a custom dataset class is created, you can register it as a modelbased optimization task by choosing an appropriate oracle type, and making a call to the register function. After doing so, subsequent calls to design_bench.make can find your newly registered modelbased optimization task.
Display code snippet
from design_bench.datasets.continuous_dataset import ContinuousDataset
import design_bench
import numpy as np
# define a custom dataset subclass of ContinuousDataset
class QuadraticDataset(ContinuousDataset):
def __init__(self, **kwargs):
# define a set of inputs and outputs of a quadratic function
x = np.random.normal(0.0, 1.0, (5000, 7))
y = (x ** 2).sum(keepdims=True)
# pass inputs and outputs to the base class
super(QuadraticDataset, self).__init__(x, y, **kwargs)
# parameters used for building the validation set
split_kwargs=dict(
val_fraction=0.1,
subset=None,
shard_size=5000,
to_disk=True,
disk_target="quadratic/split",
is_absolute=True)
# parameters used for building the model
model_kwargs=dict(
hidden_size=512,
activation='relu',
num_layers=2,
epochs=5,
shuffle_buffer=5000,
learning_rate=0.001)
# keyword arguments for building the dataset
dataset_kwargs=dict(
max_samples=None,
distribution=None,
max_percentile=80,
min_percentile=0)
# keyword arguments for training FullyConnected oracle
oracle_kwargs=dict(
noise_std=0.0,
max_samples=None,
distribution=None,
max_percentile=100,
min_percentile=0,
split_kwargs=split_kwargs,
model_kwargs=model_kwargs)
# register the new dataset with design_bench
design_bench.register(
'QuadraticFullyConnectedv0', QuadraticDataset,
'design_bench.oracles.tensorflow:FullyConnectedOracle',
dataset_kwargs=dataset_kwargs, oracle_kwargs=oracle_kwargs)
# build the new task (and train a model)
task = design_bench.make("QuadraticFullyConnectedv0")
def solve_optimization_problem(x0, y0):
return x0 # solve a modelbased optimization problem
# evaluate the performance of the solution x_star
x_star = solve_optimization_problem(task.x, task.y)
y_star = task.predict(x_star)
Citation
Thanks for using our benchmark, and please cite our paper!
@misc{
trabucco2021designbench,
title={DesignBench: Benchmarks for DataDriven Offline ModelBased Optimization},
author={Brandon Trabucco and Aviral Kumar and Xinyang Geng and Sergey Levine},
year={2021},
url={https://openreview.net/forum?id=cQzf26aA3vM}
}