Encoded characters not uploaded correctly

This is a continuation of the issue from TheWorldAvatar repo: [1374](https://github.com/cambridge-cares/TheWorldAvatar/issues/1374)

In summary, the interface with the knowledge graph has a known issue with encoding strings that causes them to be stored incorrectly in the ISO-8859-1 encoding rather than utf-8. This creates downstream issues for the literature extraction since special characters extracted from the texts are stored incorrectly. Additionally, because of this when checking if an item is present in the knowledge graph it cannot find a match and so will add duplicates.

As a fix for this:
- when pulling data from the knowledge graph (for example in retreive_synthesis.py), strings should be passed through  your_string.encode('ISO-8859-1').decode('utf-8').
- when querying the knowledge graph (in kg_queries.py) do the inverse: your_string.encode('utf-8').decode('iso-8859-1').


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoded characters not uploaded correctly #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Encoded characters not uploaded correctly #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions