-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
bugSomething isn't workingSomething isn't working
Description
This is a continuation of the issue from TheWorldAvatar repo: 1374
In summary, the interface with the knowledge graph has a known issue with encoding strings that causes them to be stored incorrectly in the ISO-8859-1 encoding rather than utf-8. This creates downstream issues for the literature extraction since special characters extracted from the texts are stored incorrectly. Additionally, because of this when checking if an item is present in the knowledge graph it cannot find a match and so will add duplicates.
As a fix for this:
- when pulling data from the knowledge graph (for example in retreive_synthesis.py), strings should be passed through your_string.encode('ISO-8859-1').decode('utf-8').
- when querying the knowledge graph (in kg_queries.py) do the inverse: your_string.encode('utf-8').decode('iso-8859-1').
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working