This cube should provide:

  • the notion of NerSource (i.e. Named Entities Source), e.g. dbpedia or dbpedia-en (for Dbpedia in english).
  • the notion of NerEntry, which is a token/word/entry that could be recognized. Basically it requires a "label" and a "cwuri", but an "unormalize_label" could be given for quicker match, a "weight" for disambiguation or a "lang" for sorting. It should be related to a NerSource.
  • the notion of NerProcess, which is an entity type that stores the parameters for a Named Entities Recognition: a "name", an "host" (appid or url of a sparql endpoint), a request (RQL or SPARQL, with the "token" key for substitution), a type ('rql' or 'sparql') for now, and a lang (for sorting).

Basically a lexic could be defined (NerSource), that contains entries (NerEntry). Thus processes (NerProcess) could be defined in other applications to retrieve these entries in some content.

