Skip to content

fix(converters): emit schema-conformant OSI documents (drop non-spec root dialects/vendors)#148

Open
andreybavt wants to merge 1 commit into
open-semantic-interchange:mainfrom
Kaelio:fix/schema-conformant-osi-documents
Open

fix(converters): emit schema-conformant OSI documents (drop non-spec root dialects/vendors)#148
andreybavt wants to merge 1 commit into
open-semantic-interchange:mainfrom
Kaelio:fix/schema-conformant-osi-documents

Conversation

@andreybavt

@andreybavt andreybavt commented Jun 8, 2026

Copy link
Copy Markdown

Problem

Output from the dbt→OSI converter fails OSI's own validator (validation/validate.py)
against the published core schema:

[Schema] (root): Additional properties are not allowed ('dialects' was unexpected)

Because the CLI writes via to_osi_yaml(), every dbt→OSI conversion produces a
non-conformant document.

Root cause

OSIDocument (python/src/osi/models.py) declared optional root-level
dialects/vendors fields that aren't in core-spec/osi-schema.json, whose root is
additionalProperties: false (only version + semantic_model). The dbt converter
set dialects=[self._dialect], so dialects was emitted at the document root.

Fix

  • Remove the non-spec dialects/vendors fields from OSIDocument.
  • Stop the dbt converter from populating a document-root dialect.

Per-expression dialect tagging (OSIExpression.dialects) is unchanged and remains the
schema-valid home for dialects, so no information is lost. Dialect selection still
flows end-to-end; the two affected converter tests now assert it on the per-expression path.

Regression guard

Adds converters/dbt/tests/test_schema_conformance.py, which converts a representative
manifest and validates the emitted document (YAML and JSON, for ANSI_SQL and
SNOWFLAKE) against core-spec/osi-schema.json, reusing validation/validate.py.
CI now fails if a converter emits a non-conformant document root again.

Testing

  • validate.py on converter output: fails before ('dialects' was unexpected),
    passes after.
  • Full dbt converter test suite green.

Out of scope

Whether OSI should support a document-level (default) dialect is the open discussion in
#52 (one dialect per document) and #16 (default dialect at dataset level). This PR takes
no position and makes no schema change - it only aligns the reference model and
converter with the schema as published today.

Drop the non-spec root `dialects`/`vendors` fields from OSIDocument and stop the
dbt converter emitting a root dialect, so dbt->OSI output validates against
core-spec/osi-schema.json. Dialects remain per-expression (no information lost).
Add a regression test that schema-validates converter output.
@andreybavt

Copy link
Copy Markdown
Author

@khush-bhatia , let me know if this first PR follows the community guidelines. Happy to take bigger scope after this first one lands

@khush-bhatia khush-bhatia requested a review from QMalcolm June 8, 2026 18:23

@khush-bhatia khush-bhatia left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix.

@QMalcolm QMalcolm left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thank you for putting this together 🙂

@khush-bhatia

Copy link
Copy Markdown
Member

Thanks @andreybavt, We will merge this PR soon, we are waiting on some process to complete.

@andreybavt

Copy link
Copy Markdown
Author

Thanks @khush-bhatia and @QMalcolm ! Will continue with the other PRs then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants