-
Notifications
You must be signed in to change notification settings - Fork 0
Models_stories
US Models : Define the structure of machine learning models, including architectures and checkpoints, to standardize training and deployment
classDiagram
%% Abstract Base Class
class Model {
<<abstract>>
+KIND: str
+get_params(deep: bool = True) Params
+set_params(**params: ParamValue) T.Self
+fit(inputs: schemas.Inputs, targets: schemas.Targets) T.Self*
+predict(inputs: schemas.Inputs) schemas.Outputs*
+explain_model() schemas.FeatureImportances
+explain_samples(inputs: schemas.Inputs) schemas.SHAPValues
+get_internal_model() T.Any
}
Model --|> pdt.BaseModel : inherits
Model --|> abc.ABC : inherits
%% Derived Class
class BaselineSklearnModel {
+KIND: T.Literal["BaselineSklearnModel"] = "BaselineSklearnModel"
+max_depth: int = 20
+n_estimators: int = 200
+random_state: int|None = 42
-_pipeline: pipeline.Pipeline|None
-_numericals: list[str]
-_categoricals: list[str]
+fit(inputs: schemas.Inputs, targets: schemas.Targets) BaselineSklearnModel
+predict(inputs: schemas.Inputs) schemas.Outputs
+explain_model() schemas.FeatureImportances
+explain_samples(inputs: schemas.Inputs) schemas.SHAPValues
+get_internal_model() pipeline.Pipeline
}
Model <|-- BaselineSklearnModel : specializes
Title:
As a machine learning engineer, I want a base Model class to standardize the implementation of machine learning models, so that I can easily integrate different frameworks and ensure a consistent interface across the project.
Description:
The Model class serves as an abstract base class for all machine learning models in the project. It defines a set of essential methods and attributes required for training, predicting, and explaining models. By implementing this base class, users can create custom models that adhere to the project's standards and integrate seamlessly into the pipeline.
Acceptance Criteria:
-
Attributes
- Define the
KINDattribute to identify the type of model being implemented. - Ensure
KINDis unique for each subclass of theModelclass.
- Define the
-
Parameter Management
-
Get Parameters:
- Implement the
get_paramsmethod to retrieve all model parameters. - Exclude private (
_) and uppercase attributes from the parameters list.
- Implement the
-
Set Parameters:
- Implement the
set_paramsmethod to update the model's parameters dynamically. - Ensure that parameter updates are applied in place and return the updated model instance.
- Implement the
-
Get Parameters:
-
Core Methods
- Define the following abstract methods:
-
fit(inputs: schemas.Inputs, targets: schemas.Targets)- Train the model on provided inputs and targets.
- Return the fitted model instance.
-
predict(inputs: schemas.Inputs)- Generate predictions using the trained model.
- Return outputs in the
schemas.Outputsformat.
-
- Define the following abstract methods:
-
Model Explainability
-
Global Explainability:
- Provide an
explain_modelmethod to return feature importances. - Raise a
NotImplementedErrorif not overridden by the subclass.
- Provide an
-
Local Explainability:
- Provide an
explain_samplesmethod to return SHAP values for individual samples. - Raise a
NotImplementedErrorif not overridden by the subclass.
- Provide an
-
Global Explainability:
-
Internal Model Access
- Implement the
get_internal_modelmethod to expose the internal model object (e.g., a scikit-learn or TensorFlow model). - Raise a
NotImplementedErrorif the method is not overridden by the subclass.
- Implement the
-
Validation and Enforcement
- Use
pydantic.BaseModelto enforce strict validation of model attributes and parameters. - Set
strict=True,frozen=False, andextra="forbid"to ensure data consistency while allowing parameter updates.
- Use
-
Testing
- Validate the following scenarios:
- Successful instantiation of model subclasses.
- Proper parameter retrieval and updates using
get_paramsandset_params. - Enforcement of abstract method implementation in subclasses.
- Write unit tests for all methods, ensuring they behave as expected.
- Validate the following scenarios:
Definition of Done (DoD):
- The
Modelclass is implemented with all specified methods and attributes. - Abstract methods enforce implementation in derived classes.
- Subclass compatibility is verified with unit tests.
- The class is well-documented, including detailed examples of usage.
- The code passes all CI/CD validation checks and integrates seamlessly with existing project modules.
Title:
As a data scientist, I want a reusable baseline model implementation using scikit-learn, so that I can easily benchmark advanced models and understand the predictive performance of standard machine learning techniques.
Description:
The BaselineSklearnModel class provides a baseline regression model leveraging scikit-learn's RandomForestRegressor wrapped in a preprocessing pipeline. This class is designed to be easily integrated into the project, with capabilities for training, predicting, and model explainability. It supports automated handling of numerical and categorical features and provides SHAP-based feature importance explanations.
Acceptance Criteria:
-
Model Parameters
- Define the following configurable parameters for the model:
-
max_depth(default: 20): Controls the depth of the random forest trees. -
n_estimators(default: 200): Specifies the number of trees in the forest. -
random_state(default: 42): Ensures reproducibility of the results.
-
- Define the following configurable parameters for the model:
-
Feature Handling
-
Numerical Features:
- Include attributes such as
temp,atemp,hum,windspeed, and other specified columns.
- Include attributes such as
-
Categorical Features:
- Use one-hot encoding for features like
seasonandweathersit.
- Use one-hot encoding for features like
- Exclude highly correlated features like
registeredto avoid data leakage.
-
Numerical Features:
-
Pipeline Construction
- Combine preprocessing steps for numerical and categorical data using a
ColumnTransformer. - Integrate the transformer with a
RandomForestRegressorin a single scikit-learnPipeline.
- Combine preprocessing steps for numerical and categorical data using a
-
Model Training
- Implement the
fitmethod to train the pipeline using input features and target values. - Ensure the model is correctly saved for future predictions.
- Implement the
-
Prediction
- Implement the
predictmethod to generate predictions using the fitted pipeline. - Output predictions in the defined
schemas.Outputsformat.
- Implement the
-
Model Explainability
-
Global Explainability:
- Provide feature importance scores using the
explain_modelmethod. - Map feature importance to transformed feature names for clear interpretation.
- Provide feature importance scores using the
-
Local Explainability:
- Implement the
explain_samplesmethod to provide SHAP values for specific input samples. - Ensure SHAP values align with the transformed features.
- Implement the
-
Global Explainability:
-
Error Handling
- Raise appropriate errors when attempting to use the model before training (
ValueError).
- Raise appropriate errors when attempting to use the model before training (
-
Validation and Testing
- Validate the following scenarios:
- Training the model with valid inputs and targets.
- Generating predictions from trained models.
- Explaining model structure and predictions.
- Write unit tests to ensure correct implementation of each method.
- Validate the following scenarios:
Definition of Done (DoD):
- All methods (
fit,predict,explain_model,explain_samples,get_internal_model) are implemented and tested. - Documentation provides clear examples for model usage, training, prediction, and explanation.
- Code is integrated into the project and passes all CI/CD checks.
- SHAP-based explanations are verified for consistency with transformed features.
- Model outputs conform to
schemas.Outputsand are reproducible with a fixedrandom_state.
Powered by MLOps Factory