Overfitting right now on synthetic data

Hi ! 
thanks a lot for the library ! 

After testing out the .onnx models hosted at https://huggingface.co/storia/font-classify-onnx/

I've realised that the model is overfitted to synthetic text generated in the exact way pillow renders text
in sense that it struggles to accurately classify fonts in real world scenarios like browser screenshots, or images rendered from other text renderers on different anti aliasing etc

Im not saying you should fix it or anything, you've given clear help and guidelines on how to train my own model with your code help. 
So thanks a lot for that.

Creating this Issue for visibility of future folks who try out this repo , to know whats going wrong if they struggle to use this repo properly for font classifications.

For others, 

The pre-existing onnx checkpoint is overfitted to Pillow lib rendered synthetic images , and if you use real world screenshots or images even with it cropped to the text, aggressively processed to black and white, or anywhere in between, they will all fail with  "Zilla Slab Highlight" as the font detected most properly due to it's logits collapsing to very negative values due to the inputs being Out of Distribution from the training dataset. 

The only way i see forward is to retrain a new model with the help of Ms. Turc's code and basing it on real world scenarios.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overfitting right now on synthetic data #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Overfitting right now on synthetic data #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions