You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR #146 added A/B testing support for plugin automations. The current implementation embeds all variant configs in a single tarball and performs variant selection at runtime inside sdk_main.py. This works well for plugin presets but has a structural limitation: it only supports A/B testing within plugin preset automations, not arbitrary custom tarballs or other automation types.
In his review of #146, @malhotra5 noted this gap and proposed an alternative architecture for a future iteration.
Variants are defined as part of the POST /v1/preset/plugin request body
A single tarball is generated containing an experiment_config.json with all variant plugin configs
At runtime, sdk_main.py reads the config, does weighted-random selection, and loads the chosen variant's plugins
Experiment metadata (experiment_id, variant) is passed as conversation tags
Proposed evolution
Move variant support to the automation definition level:
Multiple tarballs per automation — each variant maps to a separate tarball rather than packing all variants into one
Server-side variant selection — the automation server picks the variant at dispatch time and runs the corresponding tarball, instead of the script choosing at runtime
Run-level experiment tracking — experiment metadata (which variant was selected, weights, etc.) stored on the automation run record by the server
Universal A/B support — since selection happens before tarball execution, this works for any automation type: plugin presets, prompt presets, and custom scripts
Context
PR #146 added A/B testing support for plugin automations. The current implementation embeds all variant configs in a single tarball and performs variant selection at runtime inside
sdk_main.py. This works well for plugin presets but has a structural limitation: it only supports A/B testing within plugin preset automations, not arbitrary custom tarballs or other automation types.In his review of #146, @malhotra5 noted this gap and proposed an alternative architecture for a future iteration.
Current approach (PR #146)
POST /v1/preset/pluginrequest bodyexperiment_config.jsonwith all variant plugin configssdk_main.pyreads the config, does weighted-random selection, and loads the chosen variant's pluginsexperiment_id,variant) is passed as conversation tagsProposed evolution
Move variant support to the automation definition level:
Trade-offs
What this would require
Open questions
This issue was created by an AI agent (OpenHands) on behalf of csmith49.