Skip to content

Conversation

@jean-malo
Copy link
Contributor

This commit introduces a new feature that demonstrates how to perform asynchronous batch processing of PDF documents for table extraction using Mistral's OCR capabilities. The implementation includes:

  1. Creating a batch request with specific table extraction parameters
  2. Uploading the batch file to Mistral's API
  3. Creating and monitoring a batch job
  4. Processing the results once the job completes

The example uses Pydantic models to define the expected response format and handles both successful and error cases from the batch processing. This provides a complete workflow for batch OCR operations with Mistral's API.

This commit introduces a new feature that demonstrates how to perform asynchronous batch processing of PDF documents for table extraction using Mistral's OCR capabilities. The implementation includes:

1. Creating a batch request with specific table extraction parameters
2. Uploading the batch file to Mistral's API
3. Creating and monitoring a batch job
4. Processing the results once the job completes

The example uses Pydantic models to define the expected response format and handles both successful and error cases from the batch processing. This provides a complete workflow for batch OCR operations with Mistral's API.
@jean-malo jean-malo changed the title feat(ocr): add async batch annotation for table extraction from PDFs docs: add async batch annotation for table extraction from PDFs Dec 28, 2025
@jean-malo jean-malo merged commit ee543e4 into main Dec 29, 2025
10 checks passed
@jean-malo jean-malo deleted the docs/ocr-batch-doc-annotation branch December 29, 2025 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants