A CaseTemplate defines the blueprint for generating multi-document cases. It specifies what entities exist, what facts can be introduced, what document types appear, and how they're distributed across a case timeline.
from synthdocs import (
CaseTemplate,
DocumentTypeSpec,
EntitySchema,
EntityField,
FactType,
FactField,
StyleVariant,
)
template = CaseTemplate(
name="lease-dispute",
description=(
"Tenant {{ tenant.name }} is renting {{ property.address }} "
"from {{ landlord.name }}. They are in a dispute."
),
entity_schemas=[
EntitySchema(
name="Tenant",
description="The tenant involved in the dispute",
fields=[
EntityField(name="name", description="Full name", field_type="str"),
EntityField(name="email", description="Email", field_type="str"),
],
),
# ... more entity schemas
],
fact_types=[
FactType(
name="LeaseTerms",
description="Key lease terms",
fields=[
FactField(name="monthly_rent", description="Rent amount", field_type="int"),
FactField(name="lease_start", description="Start date", field_type="date"),
],
template="Lease terms: {{ monthly_rent }} per month, starting {{ lease_start }}",
),
# ... more fact types
],
document_types=[
DocumentTypeSpec(
name="Lease Agreement",
description="Signed lease contract",
probability=1.0,
min_count=1,
max_count=1,
introduces_fact_types=["LeaseTerms"],
),
# ... more document types
],
target_document_count=(3, 6),
)See examples/lease_case_template.py for a complete working example.
Entities are the "actors" in a case—people, organizations, properties, etc. They're generated once at the start of case generation and stay consistent across all documents.
EntitySchema(
name="Tenant",
description="The tenant involved in the lease dispute",
fields=[
EntityField(name="name", description="Full legal name", field_type="str"),
EntityField(name="email", description="Email address", field_type="str"),
EntityField(name="phone", description="Phone number", field_type="str"),
],
)Entity values are referenced in the case description template using Jinja2 syntax: {{ tenant.name }}.
Facts are the claims that documents introduce. Unlike entities, facts are generated just-in-time before each document, allowing the LLM to make contextually appropriate choices based on:
- The case entities and description
- Previously introduced facts
- The running summary of documents so far
FactType(
name="RentPayment",
description="Status of rent payment for a specific month",
fields=[
FactField(name="month", description="Month of payment", field_type="date"),
FactField(name="amount_paid", description="Amount paid", field_type="int"),
FactField(
name="status",
description="Payment status",
field_type="enum",
options=["paid", "partial", "late", "unpaid"],
),
],
template="Rent payment {{ month }}: {{ status }} ({{ amount_paid }})",
)The template field is a Jinja2 template that renders the fact to human-readable text. This rendered text is what gets located in the generated document.
Document types define what documents can appear in a case and how they're distributed.
DocumentTypeSpec(
name="Inspection Report",
description="Report describing property condition and issues",
style_rules=["Objective tone", "Checklist format"],
# Distribution
probability=0.7, # 70% chance this doc type appears
min_count=0, # Can be skipped
max_count=2, # Up to 2 instances
# Timing (days after case start)
days_after_start_min=30,
days_after_start_max=180,
# What facts this document introduces
introduces_fact_types=["InspectionFinding"],
# Style variants for diversity
style_variants=[
StyleVariant(name="detailed", description="Thorough, itemized findings"),
StyleVariant(name="brief", description="Short, direct observations"),
],
styles_to_sample=1, # Pick 1 style variant per instance
)Both EntityField and FactField support these types:
| Type | Description | Extra fields |
|---|---|---|
str |
Free-form text | — |
int |
Integer | min_value, max_value |
date |
ISO date string | — |
enum |
One of fixed options | options (list of strings) |
The description field on CaseTemplate is a Jinja2 template that produces the case overview. Reference entities by their schema name (lowercased):
CaseTemplate(
description=(
"Tenant {{ tenant.name }} is renting {{ property.address }} "
"{{ property.unit }}, {{ property.city }} from {{ landlord.name }}."
),
# ...
)This description is passed to the LLM when generating each document, ensuring consistency.
Use target_document_count to set bounds on total documents per case:
CaseTemplate(
# ...
target_document_count=(3, 6), # Generate 3-6 documents per case
)Global style rules that apply to all documents:
CaseTemplate(
# ...
style_preferences=[
"Use clear, formal legal language",
"Include realistic addresses and dates",
],
)from synthdocs import generate_case_batch, MistralBackend
results = generate_case_batch(
template=my_template,
count=5,
backend=MistralBackend(),
output_dir=Path("./output"),
variation_hints="Mix of urban and rural addresses",
)The variation_hints parameter guides the LLM to produce diverse entity values across cases.