- Semantic diversity (semD)
- A measure of the distance between various contexts in which a word appears. Low diversity values indicate that a word is used in very narrow contexts (e.g. spinach), while high values indicate usage in more diverse contexts (e.g. predicament, and function words). (Hoffman et al., 2014)
- Implementation based on latent semantic analysis reported in Hoffman et al. (2013). SemD values for 31,741 English words are provided.
- Correlates with other psycholinguistic measures (frequency, imageability), but contributes independent variance.
- Also reflects ambiguity - lower value words are less ambiguous.
- Few studies in AD, but may increase with decreasing MMSE score (poster: Nevler et al., 2020)
- Contextual diversity
- Similar to above, but without the cosine calculation for contexts in which the word appears, i.e. just the range. James et al. (2006)
- There are values available for the SUBTLEXus corpus, the percentage of films the word appears in, but these do not appear to be split by content and function.
- Word Movers Distance
- A measure of text distance based on word embeddings, i.e. similarity, underlined by the transportation optimization problem. Kusner et al. (2015)
- I used it in my previous study between sentences and windows, and it was an important feature for classifying AD. I think mostly reflects semantic coherence. Clarke et al. (2021)
- Can be implemented for word2vec with Gensim, and there is also a library to use other types of embeddings.
Note: these are still lexico-semantic features.
Note: these are still lexico-semantic features.