Skip to content

Preprocess Text: Add Spacy POS tagger#1070

Draft
ajdapretnar wants to merge 3 commits into
biolab:masterfrom
ajdapretnar:spacy
Draft

Preprocess Text: Add Spacy POS tagger#1070
ajdapretnar wants to merge 3 commits into
biolab:masterfrom
ajdapretnar:spacy

Conversation

@ajdapretnar

Copy link
Copy Markdown
Collaborator
Issue

Implements #596.

Description of changes

Add Spacy, first as a POS tagger because it is most sorely missed.

Later: implement Spacy for NER (also sorely missed). And for other NLP tasks (solo Spacy preprocessor).

Includes
  • Code changes
  • Tests
  • Documentation

@ajdapretnar

Copy link
Copy Markdown
Collaborator Author

The only thing left is to discuss the problem of additional dependencies in certain models (Chinese, Japanese, Russian and Ukrainian). Remove or somehow gracefully handle?

@VesnaT

VesnaT commented Jul 19, 2024

Copy link
Copy Markdown
Contributor

I get this, if the model is not installed.
image

def __getitem__(self, language: str) -> str:
model = find_model(language)
if model not in self.installed_models:
download(model)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.installed_models should be updated at this point. If not the package keeps getting downloaded.

@ajdapretnar

Copy link
Copy Markdown
Collaborator Author

So, the downloaded models are indeed packages. We have to warn the user that selecting a given language will install additional dependencies to the Orange environment (think about the wording).

@ajdapretnar ajdapretnar marked this pull request as draft August 29, 2024 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants