- Utility files
- models.py definition of pytorch models: lstm и cnn.
- params.py constants used by all notebooks
- training_utils.py utility functions to facilitate sklearn-like training and cross-validation
- Notebooks
- [notebooks/label_hierarchies] analysis of category hierarchies .
- notebooks/process_data data preprocessing
- notebooks/fastText fastText based models
- notebooks/tfidf linear model with td-idf features
- notebooks/rnn_cross_val lstm-based models
- notebooks/cross_val_сnn cnn-based models
- notebooks/validation validation of selected models
Pretrained fastText embeddings from [deeppavlov.ai]:
whet -O embeddings/ft_native_300_ru_wiki_lenta_lemmatize.vec http://files.deeppavlov.ai/embeddings/ft_native_300_ru_wiki_lenta_lemmatize/ft_native_300_ru_wiki_lenta_lemmatize.vecAccuracy on validation dataset on diferent hierarchy levels:
- lstm:
- 0.9636787056708612
- 0.9450686386664487
- 0.8911586860598137
- 0.8873079751593398
- tf-idf+LinearSVM:
- 0.9655478836411179
- 0.9490419186141527
- 0.8969807158032358
- 0.8933751429972218