"The more you try to control something, the more it controls you."
- Controller is initialized by a pair of names (-e experiment_name, -f trial_name).
- An empty worker_control_panel and an empty scheduler are initialized.
- Start method Takes an instance of ExperimentSetup and a specified partition. This is done automatically by main_start method.
- The workers returned by ExperimentSetup is then scheduled, connected to, configured, and then instructed to start running.
- The controller will stop the trial if one of the following happens:
- The trial timeouts.
- Any worker completed/exits with error.
- The controller process receives signal SIGINT(KeyboardInterrupt).
- Instead of start a new experiment, a controller can connect to a running experiment through reconnect.
- Note that multiple controllers can be connected to the same trial. If one controller stop the trial, others will also quit.
To stop a trial, the controller will try the following two methods, until the trial is stopped successfully.
- The controller will instruct all workers to stop through worker control.
- If 1.) fails, the controller will ask the scheduler to kill all the workers. This should guarantee that all workers are stopped.
Finally, the controller exits.