Skip to content

fix: Add matching and enum_value_type fields to Datapoint schema model#150

Merged
username-still-not-available merged 1 commit intorossumai:mainfrom
stancld:ds-fix-schema-matching
Mar 16, 2026
Merged

fix: Add matching and enum_value_type fields to Datapoint schema model#150
username-still-not-available merged 1 commit intorossumai:mainfrom
stancld:ds-fix-schema-matching

Conversation

@stancld
Copy link
Copy Markdown
Contributor

@stancld stancld commented Mar 14, 2026

Add support for matching:

Datapoint with matching after this change:

[Datapoint(id='vendor_match',
           type='enum',
           label='Vendor Match',
           description=None,
           category='datapoint',
           disable_prediction=False,
           hidden=False,
           can_export=True,
           can_collapse=False,
           rir_field_names=[],
           default_value=None,
           constraints={'required': False},
           score_threshold=None,
           options=[],
           ui_configuration={'edit': 'enabled', 'type': 'lookup'},
           width=None,
           stretch=False,
           width_chars=None,
           formula=None,
           prompt=None,
           context=None,
           matching=Matching(type='master_data_hub',
                             configuration=MatchingConfiguration(dataset='imported-0d652b68-fd8b-4fc8-9cee-d39105b1304b',
                                                                 queries=[MatchingQuery(aggregate=[{'$addFields': {'search_vat_norm': {'$toLower': {'$trim': {'input': {'$replaceAll': {'find': ' ',
                                                                                                                                                                                        'input': '$$sender_vat_id'}}}}},
                                                                                                                   'vat_norm': {'$toLower': {'$trim': {'input': {'$replaceAll': {'find': ' ',
                                                                                                                                                                                 'input': '$VAT '
                                                                                                                                                                                          'ID'}}}}}}},
                                                                                                   {'$match': {'$expr': {'$eq': ['$vat_norm',
                                                                                                                                 '$search_vat_norm']}}},
                                                                                                   {'$limit': 5},
                                                                                                   {'$project': {'label': '$Name', 'value': '$ID'}}],
                                                                                        comment='Exact normalized match on VAT ID - primary strategy for '
                                                                                                'structured identifiers'),
                                                                          MatchingQuery(aggregate=[{'$match': {'Name': '$$sender_name'}},
                                                                                                   {'$limit': 5},
                                                                                                   {'$project': {'label': '$Name', 'value': '$ID'}}],
                                                                                        comment='Exact match on vendor name - secondary strategy for exact '
                                                                                                'name matches'),
                                                                          MatchingQuery(aggregate=[{'$search': {'text': {'fuzzy': {'maxEdits': 1},
                                                                                                                         'path': 'Name',
                                                                                                                         'query': '$$sender_name'}}},
                                                                                                   {'$addFields': {'score': {'$meta': 'searchScore'}}},
                                                                                                   {'$match': {'score': {'$gte': 2}}},
                                                                                                   {'$sort': {'score': -1}},
                                                                                                   {'$limit': 10},
                                                                                                   {'$project': {'label': '$Name', 'value': '$ID'}}],
                                                                                        comment='Fuzzy match on vendor name - tertiary strategy for name '
                                                                                                'variations and typos'),
                                                                          MatchingQuery(aggregate=[{'$search': {'text': {'fuzzy': {'maxEdits': 2},
                                                                                                                         'path': ['Name', 'Address'],
                                                                                                                         'query': '$$sender_name'}}},
                                                                                                   {'$addFields': {'score': {'$meta': 'searchScore'}}},
                                                                                                   {'$match': {'score': {'$gte': 1.5}}},
                                                                                                   {'$sort': {'score': -1}},
                                                                                                   {'$limit': 15},
                                                                                                   {'$project': {'label': '$Name', 'value': '$ID'}}],
                                                                                        comment='Broader fuzzy search across name and address fields - '
                                                                                                'fallback for partial matches')],
                                                                 variables={'sender_name': MatchingVariable(formula='default_to(field.sender_name, "UNKNOWN")'),
                                                                            'sender_vat_id': MatchingVariable(formula='default_to(field.sender_vat_id, '
                                                                                                                      '"UNKNOWN")')})),
           enum_value_type='string')]

Datapoint w/o matching after this change

[Datapoint(id='document_id',
           type='string',
           label='Document ID',
           description=None,
           category='datapoint',
           disable_prediction=False,
           hidden=False,
           can_export=True,
           can_collapse=False,
           rir_field_names=[],
           default_value=None,
           constraints={'required': False},
           score_threshold=None,
           options=None,
           ui_configuration={'edit': 'enabled', 'type': 'captured'},
           width=None,
           stretch=False,
           width_chars=None,
           formula=None,
           prompt=None,
           context=None,
           matching=None,
           enum_value_type=None)]

@stancld stancld force-pushed the ds-fix-schema-matching branch 5 times, most recently from b8f1690 to 5e41898 Compare March 15, 2026 13:32
Comment thread rossum_api/models/schema.py Outdated
Comment thread rossum_api/models/schema.py Outdated
Comment thread tests/models/test_schema.py
Comment thread tests/models/test_schema.py
Comment thread rossum_api/models/schema.py Outdated
Comment thread rossum_api/models/schema.py Outdated
Comment thread tests/models/test_schema.py
@stancld stancld force-pushed the ds-fix-schema-matching branch from 5e41898 to b2e614f Compare March 16, 2026 12:28
@username-still-not-available username-still-not-available merged commit 8cc1a81 into rossumai:main Mar 16, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants