Skip to content

Encoded characters not uploaded correctly #12

@pwvbutler

Description

@pwvbutler

This is a continuation of the issue from TheWorldAvatar repo: 1374

In summary, the interface with the knowledge graph has a known issue with encoding strings that causes them to be stored incorrectly in the ISO-8859-1 encoding rather than utf-8. This creates downstream issues for the literature extraction since special characters extracted from the texts are stored incorrectly. Additionally, because of this when checking if an item is present in the knowledge graph it cannot find a match and so will add duplicates.

As a fix for this:

  • when pulling data from the knowledge graph (for example in retreive_synthesis.py), strings should be passed through your_string.encode('ISO-8859-1').decode('utf-8').
  • when querying the knowledge graph (in kg_queries.py) do the inverse: your_string.encode('utf-8').decode('iso-8859-1').

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions