Skip to content

Addition of more semantic features #9

@clarkenj

Description

@clarkenj
  1. Semantic diversity (semD)
  • A measure of the distance between various contexts in which a word appears. Low diversity values indicate that a word is used in very narrow contexts (e.g. spinach), while high values indicate usage in more diverse contexts (e.g. predicament, and function words). (Hoffman et al., 2014)
  • Implementation based on latent semantic analysis reported in Hoffman et al. (2013). SemD values for 31,741 English words are provided.
  • Correlates with other psycholinguistic measures (frequency, imageability), but contributes independent variance.
  • Also reflects ambiguity - lower value words are less ambiguous.
  • Few studies in AD, but may increase with decreasing MMSE score (poster: Nevler et al., 2020)
  1. Contextual diversity
  • Similar to above, but without the cosine calculation for contexts in which the word appears, i.e. just the range. James et al. (2006)
  • There are values available for the SUBTLEXus corpus, the percentage of films the word appears in, but these do not appear to be split by content and function.
  1. Word Movers Distance
  • A measure of text distance based on word embeddings, i.e. similarity, underlined by the transportation optimization problem. Kusner et al. (2015)
  • I used it in my previous study between sentences and windows, and it was an important feature for classifying AD. I think mostly reflects semantic coherence. Clarke et al. (2021)
  • Can be implemented for word2vec with Gensim, and there is also a library to use other types of embeddings.

Note: these are still lexico-semantic features.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions