Skip to content

Conversation

@mohamedelabbas1996
Copy link

@mohamedelabbas1996 mohamedelabbas1996 commented Apr 14, 2025

Description

This PR adds support for returning model feature vectors (embeddings) alongside classification results in the Data Companion API.
The classification pipeline now supports returning a vector embedding per classification, derived from the classification model backbone.

The changes are fully backward-compatible for models that do not implement custom get_features(), as they will fallback to returningNone from the base class.

Related Issues

#752

Screenshots

Detection features clustering visualization using K-means + PCA
image

@sentry
Copy link

sentry bot commented Apr 14, 2025

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: trapdata/api/models/classification.py

Function Unhandled Issue
save_results ValidationError: 15 validation errors for ClassificationResponse ...
Event Count: 2
save_results ValidationError: 10 validation errors for ClassificationResponse ...
Event Count: 2
save_results AttributeError: 'NoneType' object has no attribute 'tolist' ...
Event Count: 1
save_results ValueError: not enough values to unpack (expected 3, got 2) ...
Event Count: 1

Did you find this useful? React with a 👍 or 👎

@mohamedelabbas1996 mohamedelabbas1996 marked this pull request as ready for review April 22, 2025 15:38
]

plotly = "^5.21.0"
scikit-learn = "^1.3.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make these optional dependencies and just use numpy in the tests. unless we need to use them in the core app.

[tool.poetry.extras]
dev = ["plotly", "scikit-learn"]

model.eval()
return model

def get_features(self, batch_input: torch.Tensor) -> torch.Tensor:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on this method of extracting features! It seems more flexible than our current feature extractor. Perhaps we should add a comment in both feature extractors that the other one exists. And eventually update the old one to use this code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants