v0.17.0 #73
OnlyDeniko
announced in
Announcements
v0.17.0
#73
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
RePlay 0.17.0 Release notes
Highlights
We are excited to announce the release of RePlay 0.17.0!
The new version fixes serious bugs related to the performance of
LabelEncoderand saving checkpoints in transformers. In addition, methods have been added to savesplittersandSequentialTokenizerwithout using pickle.Backwards Incompatible Changes
Change
SequentialDatasetbehaviorWhen training transformers on big data, a slowdown was detected that increased the epoch time from 5 minutes to 1 hour. The slowdown was due to the fact that by default, the model trainer saves checkpoints every 50 steps of the epoch. While saving the checkpoint, not only the model was saved, but also the entire training dataset was implicitly saved. The behavior was corrected by changing the
SequentialDatasetand the callbacks used in it. Therefore, usingSequentialDatasetfrom older versions will not be possible. Otherwise, no interface changes were required.Deprecations
Added a deprecation warning related to saving
splittersandSequentialTokenizerusing a pickle. In future versions, the functionality will be removed.New Features
A new strategy in the
LabelEncoderThe
dropstrategy has been added. It allows you to throw tokens from the dataset that were not present at the training stage. If all rows are deleted, the corresponding warning will appear.New Linters
We keep up with the latest trends in code quality control, so the list of linters for testing code quality has been updated. The use of
PylintandPyCodestylehas been removed. Added the lintersRuff,Blackandtoml-sort.Improvements
PyArrow dependency
The dependency on
PyArrowhas been adjusted. The RePlay now can work with any version that is greater than12.0.1.Bug fixes
Performance fixes at the
partial_fitstage inLabelEncoderThe slowdown occurred when using
DataFramefromPandas. Thepartial_fitstage had a quadratic running time. The bug has been fixed, now the time linearly depends on the size of the dataset.Timestamp tokenization when using
SasRecFixed an error that occurs when training a
SasRectransformer with ati_modification=Trueparameter.Loading a checkpoint with a modified embedding in the transformers
The error occurred when loading the model on another device, when the dimensions of embeddings in transformers were changed before that. The example of working with embeddings in transformers has been updated.
This discussion was created from the release v0.17.0.
Beta Was this translation helpful? Give feedback.
All reactions