-
Notifications
You must be signed in to change notification settings - Fork 1
Description
The low accuracies in the experimental notebook from #3 are worrying:
| Page Detector | random | siamese |
imagehash |
vgg16 |
annotated |
|---|---|---|---|---|---|
| Accuracy | 0.03% | 3.95% | 6.58% | 61.84% | 100.00% |
The vgg16 page detector currently uses features produced by the last hiddent layer of a VGG16 model trained on ImageNet. Finetuning may not be an option, since our dataset is ill-suited for classification (too many document pages/classes, too few examples of each class). However, since our dataset is significantly different from ImageNet, we may have better luck using an earlier hidden layer of VGG16:
- Update the implementation.
- Optimize the parameters.
- Evaluate.
The siamese page detector uses position-dependent samples to normalize the input screen images, which may explain why the performance degrades on new document page and screen images (86% accuracy on training set versus 3.95% accuracy on test set):
- Update the implementation.
- Estimate performance using CV.
- Train and optimize the parameters.
- Evaluate.
If we manage to improve the siamese page detector, we may benefit from ensembling siamese with vgg16:
- Estimate performance using CV.
- Add
ensemblepage detector tovideo699.__main__. - Evaluate.