Support Multi-Output, Multi-Fidelity optimization#705
Conversation
|
One issue with just adding another acqf is that, when building the qMFHVKG acqf you must first optimize the current posterior (eg. in this tutorial). This then means you need to pass in some extra info. I still want to avoid creating many many new classes, but maybe it's required to make a |
|
Hi @TobyBoyne, what is the timeframe from your side for this? Until when do you need answer(s)? I need to first have a look into this as I never did anything with knowledge gradient acqfs. I can try to look into this tmr. @Jimbo994: would this PR also fix the issue that you mentioned recently? Best, Johannes |
|
Regarding a timeframe, there is no rush. I will keep contributing to this PR and I'll likely be using the code for my project, but it doesn't need to land in BoFire particularly soon. But of course any feedback/advice for implementing this nicely would be appreciated :) I haven't worked with KG much before either, but it seems like the only solution for MOMF problems. There is also the |
I will try to dive into this a bit over the weekend and try to give useful feedback then ;) |
|
I've run into another problem, this time with the DownsamplingKernel. This kernel is only defined for Is there any way to force the bounds of the input scaler to use the bounds of the feature? I see that this line has been commented out, why? I imagine this might also be an issue for the Spherical kernel, which assumes that inputs lie between I'm happy to open a separate issue to discuss/fix this, @jduerholt. Thanks :) |
|
Hi guys, Sorry for my late reply to the recent mention. I was playing around with a dataset for which a I actually checked just now in 0.3.2, and here the ValidationError was removed, and the setup seems to work! So it turns out there was no issue @jduerholt. TLDR; I do think Multi-Output, Multi-Fidelity that @TobyBoyne describes is interesting and useful. |
|
cc @R-M-Lee |
|
Hi @TobyBoyne: I overlooked your comment here, until a few weeks ago, we were enforcing that the bounds of the normalize transform were always the one applied to the feature, I released this and relied on the learning of the bounds when I was implementing the engineered features, as I do not know the bounds there (but it would be possible to calculate possible lower and upper bounds also there), back then I thought this is not necessary and this was simplifying the implementation. It was changed in this PR: #678 So this means that we have to bring this behavior back? Should I file a PR for this? We can also setup a quick call for this. |
|
Hi @jduerholt, I see why you made that change! I'm free to have a quick call now to discuss? Feel free to send me a Teams invite. |
|
With #719, the surrogate now works properly, and the thing runs successfully :) I've answered some of my own questions in implementing this strategy. My remaining questions for this PR before I take it out of draft are:
|
|
Hi @TobyBoyne, I will have a look and answer the questions! Best, Johannes |
|
Associating a cost with each fidelity makes a lot of sense. Assuming that the costs are in [0.0, 1.0], maybe the default could be And a notebook-style tutorial is also a very good idea in my opinion, even if you don't think it's very different from some existing tutorials. It can make it a lot easier for people to get started. Great work Toby |
|
Reading Toby's last message, this looks as if it was waiting for review. @jduerholt, since you are entered as reviewer, do you want to review this? Or should I do it? |
|
Ah, totally overlooked this. Yeah, I will handle it. @TobyBoyne: can you get main merged in to resolve the conflicts? I will then review! Sorry! |
|
Merged @jduerholt :) I will check back here to make sure the tests are passing! |
|
Tests are failing but I think those are related to this (meta-pytorch/botorch#3291) change to how fully bayesian models are exposed by BoTorch. The tests seem to be working on my machine at least! |
Perfect, I will also ask Max for a new botorch release ;) |
jduerholt
left a comment
There was a problem hiding this comment.
Hi @TobyBoyne,
thank you very much, this looks really impressive and I think is almost done.
I let a lot of small comments mostly focusing on small stuff. Main things, that I see:
- Code duplication from generate_surrogate_specs, we should further modularize it to not repeat always the boilerplate code.
- Code duplication from MOBO, we should either have a mixin for the mobo functionality or solve in a different way, duplicating this can lead to a lot of errors ...
Just go over it, in case that you do not have time etc. for the code duplication stuff, just tell me. Then I try to fix it ;)
Best,
Johannes
| return float(base_prices[base] * mmol_base) | ||
|
|
||
|
|
||
| class MOMFBraninCurrin(Benchmark): |
There was a problem hiding this comment.
I recently build a generic interaface for botorch test functions to not have so much code duplication, have a look here:
bofire/bofire/benchmarks/benchmark.py
Line 273 in f9ffad5
I would prefer if we extend it to MOMF optimization, because then we can use all kind of botorch MOMF test functions. Should also not be complicated. At some point we also should tidy up the whole thing.
What do you think?
There was a problem hiding this comment.
Yes I like that, I will have a look into this!
There was a problem hiding this comment.
Why are we getting the WedgeKernel stuff in here? This is already merged in another PR, or?
There was a problem hiding this comment.
Ah, we did not had it in the aggregations? Then we maybe need to update the register functions ...
There was a problem hiding this comment.
It was causing an issue because I was combining the WedgeKernel with the DownsamplingKernel, and getting some issues: as you say, I think it was missing from some of the aggregations. I'm happy to remove it from this PR for now if you want to move the change to another PR?
| RBFKernel, | ||
| SphericalLinearKernel, | ||
| ) | ||
| from bofire.data_models.kernels.fidelity import DownsamplingKernel, FidelityKernel |
There was a problem hiding this comment.
FidelityKernek is imported but not used?
There was a problem hiding this comment.
There used to be something like AbstractKernel here that included the FidelityKernel type. You can also see that AggregationKernel, FeatureSpecificKernel, and MolecularKernel are all imported but not used. Happy to remove any/all of the above?
There was a problem hiding this comment.
I think we also have to properly register the stuff here, with AnyPrior etc. Maybe we have to build an automatic machine for this. Need some discussion with Claude on this.
There was a problem hiding this comment.
Okay - I'll leave this one with you because I'm less familiar with the registration+Pydantic side of things. test_(de)serialization.py is passing so I'm not sure what is currently broken that needs to be fixed.
There was a problem hiding this comment.
Fine for me, before final merge I will make a roundtrip regarding this.
| return self | ||
|
|
||
| @staticmethod | ||
| def _generate_fidelity_cost_model_spec( |
There was a problem hiding this comment.
We could also simplify it and let the user generate the fidelity cost model and do not provide a default, what do you think?
There was a problem hiding this comment.
I like having the default for a few reasons:
- makes it clearer to the user what a fidelity cost model needs to look like.
- reduces the time to "first-working-prototype" for a user, which they can then refine with their own model
- since the default is encapsulated in its own function, it doesn't really increase the complexity of the class in my opinion.
Do those reasons make sense?
| return project | ||
|
|
||
|
|
||
| def get_acquisition_function_qMFHVKG( |
There was a problem hiding this comment.
What do you think about putting this code directly here in botorch:
There was a problem hiding this comment.
Yes I think that's a good idea. Should I raise an issue, or just go straight in with a PR?
There was a problem hiding this comment.
As you want, my latest changes to this function were all accepted, so I think the chances are quite high. But we can also keep it here first and make the change to botorch in parallel.
|
|
||
| X_train, X_pending = self.get_acqf_input_tensors() | ||
|
|
||
| # We must subset the model here, to remove the deterministic cost model from the |
There was a problem hiding this comment.
Ok, now I also see the point of why not to have it in the real surrogates, this would be getting really complicated here at the end, on how botorch is composing the stuff.
There was a problem hiding this comment.
Ah yes that comment is outdated and needs to be removed, but it is a perfect example of some of the subtle issues that came up when including it in the real surrgoates!
| ) | ||
| ).tolist() | ||
|
|
||
| def get_current_value(self, target_fidelities: dict[int, float]): |
There was a problem hiding this comment.
Can you explain to me what this optimization is doing? What are we acutally optimizing?
There was a problem hiding this comment.
(I will write this in the docstring to better explain this).
The thing we are trying to find in get_current_value is
My understanding is this:
In standard, single-output Knowledge Gradient (KG), the acquisition function is
where
In the multi-output setting, this is exactly the same, except instead of optimizing the posterior mean under the model, we optimize the hypervolume of the posterior mean.
It's also worth noting that that
|
|
||
| def _get_domain_with_fixed_task_inputs( | ||
| domain: Domain, | ||
| target_fidelities: dict[int, float], |
There was a problem hiding this comment.
Why not pass in the target_fidelitis as dict[str, float], would be cleaner or?
There was a problem hiding this comment.
But it is also okay like this.
There was a problem hiding this comment.
Let me have a think. I wanted to match the BoTorch convention but actually maybe your suggestion is cleaner - I might make this change
There was a problem hiding this comment.
Decided to keep it as-is.
| X_pending_evaluation_mask=None, | ||
| current_value=current_value, | ||
| cost_aware_utility=cost_aware_utility, | ||
| project=get_project(target_fidelities, X_observed.shape[-1]), |
There was a problem hiding this comment.
Why this additonal get_project? we could just hand it in as a callable via functools.partial, or? No need for the wrapper above.
|
@jduerholt, thanks for the review! I have made the requested changes. All that is left is a change to the inheritance structure such that the new strategy inherits from I haven't made the change to the benchmark to inherit from |
- Mark `TaskInput` and `FidelityKernel` as abstract via `type: Any` with docstrings, so they no longer pretend to be concrete classes outside the `AnyFeature`/`AnyKernel` unions. - Add a valid spec for `ContinuousTaskInput` so the serialization roundtrip test actually exercises it. - Wire `MultiFidelityVarianceBasedStrategy` and `MultiFidelityHVKGStrategy` into the `AnyPredictive` union. - Add a `validate_ref_point` validator to `MultiFidelityHVKGStrategy` matching `MoboStrategy`, so `dict[str, float]` and `None` reference points are normalized to `ExplicitReferencePoint` before the functional class consumes them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Hi @TobyBoyne — I just pushed a small follow-up commit (b23bb5d) to tighten a few registration / roundtrip loose ends I spotted while reviewing:
The bigger thing I wanted to discuss is the duplication between Option A — inherit from
Option B — extract a
Option C — composition: pull the ref-point logic into
My current preference: Option A for the functional class (the duplication there is genuinely line-for-line) combined with Option C-lite for the data model (a free helper Tangential thought: What's your preference? This was written by my agent ;) |
|
Thanks for committing those change @jduerholt :) All the options make sense. I definitely agree with using A for the functional class. I would personally like to use A for the data model as well, if possible. It would be nice if the data model structure reflects the functional model structure as well. For example, The option you've given for data models does also make sense. If this seems like the best way forward to you, then I can get behind that. Let me know if you would like me to take this on, or if you'd like to do it. And for your tangential comment, I think that can definitely be refactored in line with what you've suggested. I actually had some more ideas about that - I will speak to you separately. But yes, outside the scope of this PR. My solution here would be to use Generics (which I think I have said for any problem I have with Python type hinting - I may be over-relying on them...). I've given a toy example below. The key change here is that you can narrow the type of (To be clear, all of the redefinitions of from pydantic import BaseModel
from typing import Generic, TypeVar
class Acqf(BaseModel): pass
class LogEIAcqf(Acqf): pass
class UCBAcqf(Acqf): pass
class MFHVKGAcqf(Acqf): pass
AnyAcqf = LogEIAcqf | UCBAcqf | MFHVKGAcqf
AcqfT = TypeVar("AcqfT", bound=AnyAcqf)
class MoboStrategy(BaseModel, Generic[AcqfT]):
acqf: AcqfT
class MFHVKGStrategy(MoboStrategy[MFHVKGAcqf]):
pass
strategy1 = MFHVKGStrategy(acqf=MFHVKGAcqf()) # this is fine
strategy2 = MoboStrategy(acqf=MFHVKGAcqf()) # this is also fine
strategy3 = MFHVKGStrategy(acqf=LogEIAcqf()) # this raises an error! |
|
Oh, this is really nice with the Generics. For me it is fine, if you would go behind it and unify MOBO and MFHVKGStrategy, I would love it. Let us start with the functional model and then do it with the generics maybe on the data model. But I think, we should merge this is in now, and do it in a seperate, one or? |
|
So, would this be fine for you? |
|
I have changed the inheritance so that we now inherit from Mobo. I ended up not doing any of the Generic things because my type checker is already complaining about "Variable not allowed in type expression" in I have tried to create an instance of the MFHVKG strategy with qLogEI acquisition function, and Pydantic correctly raises an error. So:
Maybe we can come back and add the generics at some point, but for now I think this should be good to merge? |
|
Looks good to me, can you have a look to the failing test, there is something with the make method. |
|
This should fix it. There was a mismatch between |
Motivation
The current support for multi-fidelity (MF) in BoFire is limited to the
MultiFidelityStrategy, which is limited to (1) discrete fidelities, and (2) single output. I would like to be able to optimize in a multi-output MF setting, with a continuous fidelity parameter (eg. "simulation resolution"). This is implemented in botorch with theqMultiFidelityHypervolumeKnowledgeGradientAF.I had a few questions I wanted to ask before I dived too deep into this:
Strategy? In my opinion, it should work directly withMoboStrategy, but this might require some changes.SingleTaskMultiFidelityGPbe a new surrogate, based on this? Or should it just be aSingleTaskGPwith the custom kernel?botorch.acquisition.get_acquisition_functionto build the AF objects.qMFHVKGis not available through that function. What do you think about creating aget_acquisition_function_custom, which we can register with new AFs so that we can be more active in adding AFs to the list?TaskInputintoContinuousTaskInputandCategoricalTaskInput? Will this break the API for existing users of MF models, and how big of an issue is that?For (1) and (2) above, I am conscious about polluting BoFire with too many classes. But it might prove difficult to make the existing classes flexible enough.
There is an old PR, #533, which I will close, But maybe there is some good code in there that can be salvaged.
For reference, see https://botorch.org/docs/tutorials/Multi_objective_multi_fidelity_BO/
@jduerholt
Have you read the Contributing Guidelines on pull requests?
Yes
Test Plan
I will write tests to cover all new code, once the approach has been decided.