This cube should provide:
- the notion of NerSource (i.e. Named Entities Source), e.g. dbpedia or dbpedia-en (for
Dbpedia in english).
- the notion of NerEntry, which is a token/word/entry that could be recognized.
Basically it requires a "label" and a "cwuri", but an "unormalize_label"
could be given for quicker match, a "weight" for disambiguation or
a "lang" for sorting. It should be related to a NerSource.
- the notion of NerProcess, which is an entity type that stores the parameters
for a Named Entities Recognition: a "name", an "host" (appid or url of a sparql endpoint),
a request (RQL or SPARQL, with the "token" key for substitution), a type ('rql' or 'sparql')
for now, and a lang (for sorting).
Basically a lexic could be defined (NerSource), that contains entries (NerEntry).
Thus processes (NerProcess) could be defined in other applications to retrieve these entries
in some content. |