Suggestions filtering

This step assumes using suggestions ranking. Matches in match_dict can have multiple options of 2 types:

I. grammar / syntactic variants ex. people / person, a / an

ex.

(A) Foo a fo fo.
(B) Foo an fo fo.

If A is ranked higher, you can hide B (most probably incorrect) by setting filter_suggestions to True:

lm_path='/path/to/your/kenlm/model.bin'
rmatcher = ReplaceMatcher(nlp, match_dict=match_dict, lm_path=lm_path, filter_suggestions=True)

ReplaCy does guessing for you. Expect suggestions filtering if: -- variants have the same lemma ex. kid / kids, walk / walked -- variants are DET, ex. a / an -- variants consist of more than one word, ex. think / think of

II. lexical variants ex. tall / big / huge

ex.

(A) Foo foo big foo.
(B) Foo foo huge foo.
(C) Foo foo tall foo.

Set default_max_count to any integer to display top n suggestions, ex. default_max_count=2 would display just A and B.

lm_path='/path/to/your/kenlm/model.bin'
rmatcher = ReplaceMatcher(nlp, match_dict=match_dict, lm_path=lm_path, filter_suggestions=True, default_max_count=2)

Additionally, any suggestion item can have MAX_COUNT property, which overwrites the above rules ( filter_suggestions and default_max_count can be omitted ), ex.

        "suggestions": [
            [
                {
                    "TEXT": {"IN": ["big", "huge", "tall", "high"]},
                    "MAX_COUNT": 3
                }
            ]
        ],

Add debug=True to get information about accepted and suppressed suggestions along with their MAX_COUNT:

lm_path='/path/to/your/kenlm/model.bin'
rmatcher = ReplaceMatcher(nlp, match_dict=match_dict, lm_path=lm_path, filter_suggestions=True, default_max_count=2, debug=True)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestions filtering

Suggestions filtering

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally