Ported to the latest gensim, line model dimentionality fixed, output format extended#9
Ported to the latest gensim, line model dimentionality fixed, output format extended#9luav wants to merge 9 commits intoGTmac:masterfrom
Conversation
GTmac
left a comment
There was a problem hiding this comment.
This seems to be your clone of DeepWalk -- could you change this to the official DeepWalk repo? Thanks!
GTmac
left a comment
There was a problem hiding this comment.
Thanks for adding this! I actually feel it would be better to set the default number of workers to a smaller value (for example, gensim word2vec uses 3: https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/word2vec.py#L660). Sometimes it is not desirable to use up all the CPUs, especially when you are running a model on a shared server. Let me know your thoughts :-)
src/baseline.py
Outdated
| @@ -1,7 +1,10 @@ | |||
| from gensim.models import Word2Vec | |||
| from gensim.models.word2vec import Word2Vec | |||
There was a problem hiding this comment.
Nit: could you remove that extra space? Thanks!
GTmac
left a comment
There was a problem hiding this comment.
This commit is touching the model part, so have you tested this change in terms of classification performance? Do you still get similar classification F1 score compared to the numbers in the paper / README file? Thanks!
Up to you, in all other functions I set the default number of workers to at least half of the available logical cpus: int(cpu_num() / 2) + 1. Any hardcoded value is not desirable, because the host might have 1 or 2 cores then a harcoded value 3 may affect the performance unlike the value dependent on cpu_num.
I have not changed any model parameters or internals except adaptation to the updated gensim API. I did perform training and evaluation of the harp + deepwalk / line embeddings on my datasets and they look fine. |
In the official deepwalk, the walks persistence expects text and not numbers (I made a pull request to the official repository). I'm not sure whether the numerical values necessity there was caused by some bugs that occurred and fixed during Harp porting to the updated Deepwalk and gensim, or the numeric walk items are required by Harp from Deepwalk. Anyway, Harp works fine with the extended version (accepting numerical walk items) of Deepwalk in my repository but I have not tested whether it works with the official repository without that extension. |
I just verified, the original latest Deepwalk lacks support of the numerical walk items to work with HARP, so the specified repository should be used until this Deepwalk pull request is merged. |
Workers number is set to 1 by default and to |
Fix for #8 (ported to the latest gensim), output extended with the .mat format