Skip to content

New APIs for Transform, Random Transform and SSL, new modules, and transition proposal. #100

@GabrielBG0

Description

@GabrielBG0

API Change: New APIs for Transform, Random Transform and SSL, new modules, and transition proposal.

Hello everyone, Gabriel here.

As many of you know, we are gearing up to launch Minerva 1.0. With that there are some changes that have been proposed to fix some of the problems Minerva has currently.

This issue has been written to present some of the changes that have been made and to collect feedback on them. I ask all of you to read this with attention and consideration. If you have problems, questions or suggestions, this is the time to voice them.

TL;DR

Minerva is exiting beta and launching version 1.0, bringing several breaking changes:

  • Transforms now accept both data and labels together and return a dictionary instead of a tuple, fixing sync and racing issues in segmentation and contrastive SSL workflows.
  • Random Transforms follow the same new API, eliminating the need to sync separate function calls.
  • A new SSL interface (_SSLTechnique) standardizes how SSL techniques are implemented across the library.
  • Two new modules are being introduced — SSL (for Lightning SSL techniques) and Components (for nn.Module building blocks) — to reduce clutter in the existing Models module.
  • Other breaking changes (e.g., variable renames) are also welcome now, as this is the last window for them until version 2.0.
  • Workflow change: development moves entirely to public Minerva via forks and PRs, which also fixes the authorship attribution problem caused by the previous private-repo merge process.

Context

As said before, we are preparing to launch Minerva 1.0. Maybe some of you were not aware, but all this time Minerva has been on beta. It was a way for us to evaluate which features are important, what works and what doesn't.

Because of that, some of the implementation decisions that were made have not panned out as intended. We also did not establish a design pattern for some of our features since we did not know what was the best way to do it.

Now the time has arrived to apply what we learn first building minerva to make a better version of it.

Changes

This post will mainly talk about changes in API and in architecture. I will also propose a way to transfer things from our private repo to this one, since it will cease to exist (it will be archived). With that said, let's start with the changes.

Changes on how Transforms work

Transforms on minerva currently follow the pytorch style, applying transforms to one object at a time. This works fine in the pytorch context but since we often work with semantic segmentation there is a need to have synced transforms for both data and label. The solution implemented was less than optimal, having racing problems.

There is also a problem when using contrastive SSL techniques. We needed to create alternative ways for a transform to return multiple views from the same data, and the way we initially implemented transforms did not cover this use case.

With these problems in mind we created a proposal that aims to solve these friction points. The new _Transform API would look like this:

class _Transform: 
    def __init__(self, *args, **kwargs):  
        pass

    def __call__(self, data: dict[str, any], *args, **kwargs) -> dict[str, any]:  
        return {"data": data["data"], "label": data["label"]}

The changes to transform are two fold:

  1. It now returns a dictionary
    1. The dictionary always have data and labels keys (even if label is null)
    2. If you need multiple views for the data, just add to the dict!
    3. Your model will now access keys from this dict instead of positions on a tuple.
  2. Transforms now accept both data and labels at the same time.
    1. If your transform may be only needed to data or label separately you can add a flag to control this behavior.
    2. Transforms now can be much more easily synced for data and labels since they are processed together.
    3. No racing issues.

These changes to transforms will also impact our datasets and readers, as they will also return dictionaries. There will also not be a need to pass an individual transform pipeline for data and others for labels.

Changes to Random Transforms

As stated before, syncing random transforms for data and labels has been a problem, especially so in distributed environments. The changes into the transform API will allow us to better manage random transforms. This is the new proposed API for random transform:

class _RandomTransform():
    def __init__(self, parameters, p, seed):
        """Apply random transform to provided data


        you can set parameters for the transform,
        the probability of applying the transform and the seed for reproducibility
        """
        pass

    def __call__(self, data: dict[str, any]) -> dict[str, any]:
        """ Apply the transformation to the provided data
        This function will generate the random parameters for the transformation and apply it to the data.
        """
        return {"data": data["data"], "label": data["label"]}

Contrastive transforms and context transforms will follow this same API, only adapting to their specific needs.

Since the call function has both data and label there will not be a need to sync the transform between multiple function calls.

New SSL Interface

In the current state Minerva does not have an SSL technique interface that standardizes our ssl implementations. This made the techniques implementations to not have a clear pattern and thus, making them more difficult to use. Considering this issue here is the proposed interface for ssl techniques:

_SSLTechnique — Abstract Base Class

_SSLTechnique is an abstract base class that extends PyTorch Lightning's LightningModule. It serves as the foundation for all SSL (Self-Supervised Learning) technique implementations.

Constructor Parameters

The class is initialized with the following parameters:

  • backbone (nn.Module) — The feature-extraction network shared across all components of the technique.
  • learning_rate (float, default 1e-3) — The base learning rate passed to the default Adam optimizer.
  • train_metrics (Dict[str, Metric], default None) — A dictionary of Torchmetrics to log during training. Requires _predictions_step to be overridden.
  • val_metrics (Dict[str, Metric], default None) — Torchmetrics to log during validation.

Abstract Methods

These methods have no default implementation and must be overridden by every subclass:

  • forward(x) → Any — Defines the full forward pass. The return type is technique-specific and may be a tensor, a tuple, or another structure.
  • _loss_step(batch) → Tensor — Computes the scalar loss for a single batch. This method is invoked internally by the training, validation, and test step methods.
  • training_step(batch, idx) — Implements the Lightning training step following technique-specific logic.
  • validation_step(batch, idx) — Implements the Lightning validation step following technique-specific logic.

Overridable Hooks

These methods have sensible defaults and can optionally be overridden to customize behavior:

  • _predictions_step(batch) — Returns a (y_hat, y) tuple for metric evaluation. Defaults to None; should only be overridden when the technique includes a supervised head.
  • technique_transforms() — Returns a TransformPipeline with technique-specific data augmentations. Defaults to None (no augmentations).
  • technique_callbacks() — Returns a list of Lightning callbacks with technique-specific behavior. Defaults to None (no callbacks).
  • technique_collate_fn() — Returns a custom batch collation function for DataLoaders. Useful when the technique requires non-standard batch formatting. Defaults to None (standard collation).
  • default_train_strategy() — Returns a string indicating the recommended training strategy. Useful for techniques that require a specific distributed strategy. Defaults to "auto" (no specific recommendation).
  • configure_optimizers() — Returns the optimizer configuration. Defaults to Adam. Override to substitute SGD, LARS, or to attach a learning rate scheduler.

Concrete Methods

These methods are fully implemented by the base class and available to all subclasses without any changes:

  • test_step(batch, idx) — Runs _loss_step and any registered test metrics, then logs test_loss.
  • _compute_metrics(y_hat, y, phase) — Evaluates all metrics registered for the given phase and returns a prefixed logging dictionary.
  • get_representations(x) — Returns raw backbone embeddings. Intended for downstream evaluation tasks without requiring direct access to the model's internals.

New Modules

We also identified some problems with how our models, ssl techniques and nn modules were organised. Currently, we have a single model module that hosts all models (lightning and nn.modules) and ssl techniques. This made the model module very crowded and with a lot of sub divisions. To address this issue we decided to make two new root modules:

  1. SSL: that hosts the lightning ssl technique implementations
  2. Components (im still not too sure on the name, suggestions are welcomed): will house the nn.Modules implementations that are used to build the lightning models we have.

So, the new structure on minerva will be:

  • Models: house the lightning implementations for our models
  • SSL: house the lightning implementations for our ssl techniques
  • Components(?): house the pieces (or blocks) used to build the ssl or model implementations in lightning. These are the classes that inherit from nn.Module.

So, what changes? A few things:

  1. If you have a model implementation file that houses both nn.Module implementations and LightningModel implementations this will be split into two files. (one in the Models module and the other in the Components module)
  2. If you have a SSL technique implementation that houses both nn.Module implementations and LightningModule implementations they will also be split in two files. (one in the SSL module and the other in the Components module)
  3. We (hope) that with this separation code reuse will be more effective, allowing implementations of lightning models that use the same type of backbone or prediction head to be able to share the same component class. (currently we have 3 or 4 ViT implementations, wich can make maintenance very difficult)

Other changes

Some other braking changes will be implemented, like changing the name of the fc variable on SimpleSupervisedModel changing to predictions_head. If you have a change of this nature (changing variable names for a better one, or something like that) this is the time to do it. I only ask you to open an issue on github to register the change.

This will be the last time braking changes will be accepted for a while (until version 2.0) so the time to make them is now. If you have a problem with how the library works currently and the changes addressed in this post won't be solved then this is the time to contact us and tell us your problem. We can’t guarantee that we will be able to solve your problem but this is the time for us to evaluate what we want to include in version 1.0.

While version 1.0 matures other changes may be brought forward, if that happens we will contact you for feedback again.

Transition proposal and new development practices

As you can see, there are a lot of changes to be made. There was also brought to the maintenance team a problem with authorship recognition. As you know, development has been made in our private branch and merged to the public one by Otavio. This has created a problem where every contribution to the public repo was being made in his name, and the name of the original authors were being lost in the process.

We can solve these two problems with one solution. The original author from each technique will adapt the code to the new standard and make any changes necessary. They will then open a PR into public minerva with the changes. This way their contributions will be registered as theirs. We won't be able to bring their past contributions to their name but from here forward they will appear as contributors to the project. If your code is not yet in the public minerva I advise you to contact your supervisor and check if you can make it public.

From now on the work process will be a little different. Now contributions will only be accepted into public minerva. To do that you will need to make a fork of minerva and implement your changes there. When you are done you can submit your changes through a PR. If the work you are developing is confidential and should not be publicly available until publication we strongly suggest that you private your fork. If you have any questions about workflow you can contact me.

Once more I ask you to voice your opinion on these changes and any other you may consider beneficial for Minerva.

Thank you for the time and attention,

Gabriel G & Minerva Core Team

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions