Strange behavior of tokenize(.., only_ci=True)

The following snippet gives inconsistent results:

```python
from reynir_correct import tokenize

texts = ["Skúta", "300 ára gömul írsk skúta fundin við Suður-Noreg" ]
for t in texts:
    g = tokenize(t, only_ci=True)
    for t in g:
        if t.txt:
            print(f"{t.txt:12} {t.error_code:8} {t.error_description}")
```

Output:

```
Skúta                 
300                   
ára                   
gömul                 
írsk                  
skúta        U001     Óþekkt orð: 'skúta'
fundin                
við                   
Suður-Noreg
```

The correct word `skúta` is marked as unknown, but not if it's written as standalone word. Using no options for the `tokenize()` method works as expected.

It's also not clear from the documentation, what exactly  the option`only_ci` does.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange behavior of tokenize(.., only_ci=True) #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Strange behavior of tokenize(.., only_ci=True) #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions