This project implements a Bayesian Optimization (BO) framework to optimize the structural features of a drug candidate. The goal is to maximize bioavailability
Let
where:
-
$f(x)$ : Bioavailability ($logP$ ). -
$v(x)$ : Synthesizability score ($SA$ ). -
$\kappa = 4$ : Maximum allowed synthesizability.
The observed values are noisy:
where
An example problem looks as follows:
Two Gaussian Processes (GPs) are used:
- For
$f(x)$ : Bioavailability- Kernel options: Matérn (
$\nu = 2.5$ ) or RBF. - Tunable parameters: Variance and length scale.
- Kernel options: Matérn (
- For
$v(x)$ : Synthesizability- Kernel: Additive (Linear + Matérn or RBF).
- Prior mean:
$4$ .
The algorithm implements three acquisition functions for constrained optimization:
-
Upper Confidence Bound (UCB):
$\text{UCB}(x) = \mu_f(x) + \beta \cdot \sigma_f(x) - \max(0, \mu_v(x) - \kappa)$ -
Constrained Expected Improvement (CEI):
$\text{CEI}(x) = EI(x) \cdot P(v(x) < \kappa)$ where
$EI(x)$ is the Expected Improvement. -
Probability of Feasibility (PoF):
$\text{PoF}(x) = \Phi\left(\frac{\kappa - \mu_v(x)}{\sigma_v(x)}\right)$ where
$\Phi$ is the CDF of the standard normal distribution.
- The optimization starts from an initial safe point
$x_\triangle$ where$v(x_\triangle) < \kappa$ . - Unsafe evaluations (
$v(x) \geq \kappa$ ) are penalized.
The algorithm's performance is evaluated using normalized regret:
where
The final score combines regret, penalties for unsafe evaluations, and trivial solutions:
where
- Final Score: 0.863 (ranked 90/273)
The algorithm generates (here):
- Posterior plots of
$f(x)$ and$v(x)$ , including uncertainty bounds. - Acquisition function values for each iteration.
- Safe and unsafe evaluation regions.
