Skip to content

Conversation

@javiermtorres
Copy link
Contributor

@javiermtorres javiermtorres commented Feb 3, 2026

For CI purposes, we would need dummy models that return a predictable (e.g. constant) value, with as little processing as possible. These will honor the general interface (token classification, sequence classification, etc) but just return a dummy value. This allows testing integration and non-ML related changes efficiently.

Closes #98

@codecov-commenter
Copy link

codecov-commenter commented Feb 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines Coverage Δ
encoderfile/src/build_cli/tokenizer.rs 92.73% <100.00%> (+1.28%) ⬆️
...i/transforms/validation/sequence_classification.rs 89.61% <100.00%> (+0.42%) ⬆️
..._cli/transforms/validation/token_classification.rs 89.18% <100.00%> (ø)

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@javiermtorres javiermtorres force-pushed the 98-sample-model branch 5 times, most recently from 65b8075 to de1af14 Compare February 5, 2026 18:27
@javiermtorres javiermtorres marked this pull request as ready for review February 5, 2026 18:51
@@ -0,0 +1,20 @@
encoderfile:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i'd put configs in the respective subfolders of ./models and name them all encoderfile.yml

Copy link
Contributor Author

@javiermtorres javiermtorres Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, good idea 👍
rather than in whatever_encoderfile.yaml within ./models, perhaps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, currently models is in gitgnore. Maybe we could have some specific dir for encoderfiles?

ORTModelForTokenClassification,
)

AutoConfig.register(DUMMY_SEQUENCE_ENCODER, DummySequenceConfig)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is code where models weights themselves are generated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in

class DummySequenceClassifier(PreTrainedModel):
, which basically generates and/or tests the dummy models (usable from cmd line)
I preferred to download the weights as standard, maybe we can optionally generate them from scratch; the procedure is there, in any case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, fun fact: since the output is dynamically generated, there are actually no weights. So the torch model exporter refuses to write anything, and the onnx exporter fails because it sees no weights. We need to include a dummy val in the state so it gets exported and everything works ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here:

self.register_buffer(

We could have hardcoded weights instead (or maybe hardcoded outputs rather) if you'd prefer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Faster test models

3 participants