Skip to content

Improvements to the demo #1

@LequeuISIR

Description

@LequeuISIR

I've now gained a good understanding of experimaestro, and this demo has been very helpful. However, there are a few aspects that I found unclear. I'm raising this issue to discuss potential improvements without interfering with any ongoing changes to the code.

Demo flow

1- Configs

The demo should start by introducing Configs. Specifically, I think the CNN class should be placed in a separate model.py file and inherit from Config, as shown below.

One important clarification: the term Config might be misleading, as it does not mean that the class is just a "configuration object." Instead, the demo should make it clear that inheriting Config is simply a way for users to define which parameters can be modified in their experiments using Param (and which ones are unrelated to the results, using Meta). This distinction is why I added ckpt_path as a Meta field.

Additionally, the demo should emphasize that these parameters should not be declared in __init__. Instead, they should be class attributes, with __post_init__ handling the initialization logic.

class CNN(nn.Module, Config):
    n_layers: Param[int] = 3
    hidden_dim: Param[int] = 64
    kernel_size: Param[int] = 3
    ckpt_path: Meta[str] = "path/to/checkpoint"

    def __post_init__(self):
        """Simple CNN module with n_layers hidden layers and hidden_dim hidden units"""
        super(CNN, self).__init__()

        # create a list of hidden CNN layers with ReLU activation
        self.layers = nn.Sequential()
        for i in range(n_layers):
            self.layers.add_module(f'conv{i}', 
                nn.Conv2d(
                    in_channels=1 if i == 0 else hidden_dim,
                    out_channels=hidden_dim,
                    kernel_size=kernel_size, 
                    padding='same'))
            self.layers.add_module(f'relu{i}', nn.ReLU())

        # pooling layer to reduce the size of the output to 13x13
        self.layers.add_module(f'pool', nn.MaxPool2d(kernel_size=2)) 

        # output layer
        self.output = nn.Linear(hidden_dim * 14 * 14
                                , 10)

    def forward(self, x):
         ....

2 - Tasks

Once Configs are introduced, the demo can present Task objects—essentially configs that implement an execute method. This method defines the logic the user wants to run, such as processing data or training a model.

A key point: Task parameters are declared the same way as Config parameters, and they can include other Configs as parameters. The current demo declares parameters like n_layers and hidden_dim inside the task, but I think it would be more modular to have a model: Param[CNN] instead, like this:

class TrainOnMNIST(Task):
    """Main Task that learns a rank r Self Attention layer to perform NER from LLM representations"""
    # experimaestro Task parameters
    ## Model
    model: Param[CNN]

    # Training
    epochs: Param[int] = 1      # number of epochs to train the model
    n_val: Param[int] = 100     # number of steps between validation and logging
    lr: Param[float] = 1e-2     # learning rate
    batch_size: Param[int] = 64 # batch size

    ## Task version, (not mandatory)
    version: Constant[str] = '1.0'

    def execute(self) :
         ....

This highlights experimaestro's modularity and keeps the demo clean.

3 - Experiments

After explaining Configs and Tasks, the experiment.py file can (should) be introduced as the orchestrator of the experiment. It handles:

  • Launching tasks on a cluster
  • Saving results
  • (Potentially other things—I’m still not 100% sure)

Points that should be clearer in the demo:

  • The experiment file must contain a run function.
  • While Configurations are not mandatory, it’s probably a good idea to include one.
  • A Configuration object is not the same as a Config object—it’s specific to experiments.
  • The relationship between Configuration and params.yaml should be clarified: Configuration defines which parameters can be changed in the YAML file.
  • The purpose of tag(...) should be explained: why is n_layers tagged but not batch_size?
  • How do you launch experiments without a GPU cluster (e.g., for local debugging)?

also, The following example should be updated to reflect the changes made earlier:

for n_layer in cfg.n_layers:
        for hidden_dim in cfg.hidden_dim:
            for kernel_size in cfg.kernel_size:
                # Create a task with the given parameters
                task = TrainOnMNIST(
                        # Model params are 'tagged' for later monitoring
                        model=CNN(n_layers=tag(n_layers),    
                                     hidden_dim=tag(hidden_dim),
                                     kernel_size=tag(kernel_size)),
                        # Training params are not tagged
                        epochs=cfg.epochs,
                        n_val=cfg.n_val,
                        lr=cfg.lr,
                        batch_size=cfg.batch_size,
                    )

4 - Launching

The demo suggests launching the experiment using:

experimaestro run-experiment debug.yaml

However, running this raises a workspace error. This suggests that workspaces should be introduced earlier in the demo.
The easiest way to do this might be to add a "Setting Up the Experiment Environment" section, covering:

  • The launchers.py file
  • The settings.yaml file

That said, launchers.py seems quite complex, so I’m not sure how best to present it in the demo.

Conclusion

These are just recommendations based on my experience with the demo. Many of these points can be discussed further, but I think they provide a solid starting point for improvements. I'm happy to help with any changes if needed, but I also don’t want to marcher sur les pieds of anyone already working on this. Let me know how we can move forward!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions