Skip to content

[NER] More fine-grained set definition regarding locations #59

@proycon

Description

@proycon

Currently the NER module in Frog distinguishes persons, locations, events, products(?) and miscellaneous.

Since the module has been enhanced with gazetteers, I think we can do better than this coarse division. Various named entities are perfectly enumerable; countries, cities, street names, postal codes, rivers, forests, mountains... and gazetteers serve well here; it would be a waste to lose this information by subsuming it all under "location". We already have a FoLiA set definition (https://github.com/proycon/folia/blob/master/setdefinitions/namedentities.foliaset.ttl) from a prior project that allows for a more fine-grained taxonomy regarding locations, which is compatible (i.e. a superset) with our current set.

Databases such as Geonames also contain this information, and we currently don't make use of it. I propose we migrate to a more fine-grained set (and include a few more gazetteers where possible). What do you think @kosloot @antalvdb @Irishx ?

Context: this is relevant for our 112-project (@HenkvdHeuvel), here we need to know whether a location is a street, city, etc.. I think we can include a lot of these gazetteer-based improvements in the Frog data itself, i.e. the generic dutch model (as it's not sensitive data)

(technicality: this is more of more of a frogdata issue than a Frog issue as such, but I guess it's more visible here)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions