Addition of more semantic features

1. Semantic diversity (semD)
- A measure of the distance between various contexts in which a word appears. Low diversity values indicate that a word is used in very narrow contexts (e.g. spinach), while high values indicate usage in more diverse contexts (e.g. predicament, and function words).  [(Hoffman et al., 2014)](https://www.sciencedirect.com/science/article/abs/pii/S001094521200322X?via%3Dihub)
- Implementation based on latent semantic analysis reported in [Hoffman et al. (2013)](https://link.springer.com/article/10.3758/s13428-012-0278-x). SemD values for 31,741 English words are provided.
- Correlates with other psycholinguistic measures (frequency, imageability), but contributes independent variance.
- Also reflects ambiguity - lower value words are less ambiguous.
- Few studies in AD, but may increase with decreasing MMSE score [(poster: Nevler et al., 2020)](https://alz-journals.onlinelibrary.wiley.com/doi/full/10.1002/alz.045300)

2. Contextual diversity
- Similar to above, but without the cosine calculation for contexts in which the word appears, i.e. just the range. [James et al. (2006)](https://psycnet.apa.org/record/2006-12459-014)
- There are [values available](https://www.ugent.be/pp/experimentele-psychologie/en/research/documents/subtlexus#:~:text=Zipf%20values%20added%20to%20the%20SUBTLEX%2DUS%20frequencies&text=Zipf%20values%20range%20from%201,per%20million%20words%20and%20higher) for the SUBTLEXus corpus, the percentage of films the word appears in, but these do not appear to be split by content and function.

3. Word Movers Distance
- A measure of text distance based on word embeddings, i.e. similarity, underlined by the transportation optimization problem. [Kusner et al. (2015)](https://proceedings.mlr.press/v37/kusnerb15.pdf)
- I used it in my previous study between sentences and windows, and it was an important feature for classifying AD. I think mostly reflects semantic coherence. [Clarke et al. (2021)](https://www.frontiersin.org/journals/computer-science/articles/10.3389/fcomp.2021.634360/full)
- Can be implemented for word2vec with [Gensim](https://radimrehurek.com/gensim/auto_examples/tutorials/run_wmd.html), and there is also a [library](https://pypi.org/project/word-mover-distance/) to use other types of embeddings.

Note: these are still lexico-semantic features.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Addition of more semantic features #9

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Addition of more semantic features #9

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions