Skip to content

Overfitting right now on synthetic data #2

@teito-dev

Description

@teito-dev

Hi !
thanks a lot for the library !

After testing out the .onnx models hosted at https://huggingface.co/storia/font-classify-onnx/

I've realised that the model is overfitted to synthetic text generated in the exact way pillow renders text
in sense that it struggles to accurately classify fonts in real world scenarios like browser screenshots, or images rendered from other text renderers on different anti aliasing etc

Im not saying you should fix it or anything, you've given clear help and guidelines on how to train my own model with your code help.
So thanks a lot for that.

Creating this Issue for visibility of future folks who try out this repo , to know whats going wrong if they struggle to use this repo properly for font classifications.

For others,

The pre-existing onnx checkpoint is overfitted to Pillow lib rendered synthetic images , and if you use real world screenshots or images even with it cropped to the text, aggressively processed to black and white, or anywhere in between, they will all fail with "Zilla Slab Highlight" as the font detected most properly due to it's logits collapsing to very negative values due to the inputs being Out of Distribution from the training dataset.

The only way i see forward is to retrain a new model with the help of Ms. Turc's code and basing it on real world scenarios.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions