Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
d701e09 to
c625dbd
Compare
0bfd32f to
6ae6275
Compare
6ae6275 to
c955972
Compare
ebd090e to
bcb1654
Compare
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
==== Try 3: sum(vocab_sizes) trouble commit ====
Trouble with
num_embeddings=sum(vocab_sizes)in MultiHeadEmbedding module.==== Try 2: Not working - pass ngram_layer_map to deepseek layer commit ====
I moved
generate_engram_mapfrom data_loader to decoders.py, andpass layer_idandngram_layer_mapto thedeepseek.pydecoder layer. However, I am not able to pass the layer_id directly, and met error bellow.Noticed for other models like llama4 and gpt-oss, we passed layer_id to a function, and get the attention_type. It seems in JIT, it's tricky to pass this index of layer back to decoder layer.
==== Try 1 - Not working - Integrate Engram with DeepSeek custom model commit ====
I noticed an issue when putting NgramHashMapping into data_loader.py, and it seems not easily to initialize the
self.engram = engram.Engraminside of deepseek.py file.The tricky part is in the current implementation, the Engram needs
engram_vocab_sizesto initialize inside ofDeepSeekGenericLayerNNX module, which is data dependent based on each data batch here.I think all dynamical data inputs should be passed via call method, instead of init method. If so, we cannot easily initialize the
Engrammodule in this way.Alternatively, we could only put
NgramHashMappinginmodels.pyordecoders.py.