How to prompt layout detector for hard/uncommon document structures

I'm exploring the use of a Vision-Language Model (VLM) as a layout detector to handle complex and uncommon document structures. In particular, we want to improve performance on documents like Bills of Lading that often include irregular layouts. 

Any recommended approaches, prompt patterns, or references would be greatly appreciated. Thank you!

Example: 
[example](https://website-assets.studocu.com/img/document_thumbnails/5af93140f78bc188f6fc453db2ea434a/thumb_1200_1696.png)

only the part of customer order is table, and other blocks are text

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to prompt layout detector for hard/uncommon document structures #256

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to prompt layout detector for hard/uncommon document structures #256

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions