[Draft] Engram integration by RissyRan · Pull Request #3125 · AI-Hypercomputer/maxtext

RissyRan · 2026-02-12T04:50:15Z

==== Try 3: sum(vocab_sizes) trouble commit ====

Trouble with num_embeddings=sum(vocab_sizes) in MultiHeadEmbedding module.

  File "/home/ranran_google_com/maxtext/src/MaxText/layers/engram.py", line 363, in __init__
    self.embedding = Embed(
                     ^^^^^^
  File "/home/ranran_google_com/venv-maxtext/lib/python3.12/site-packages/flax/nnx/pytreelib.py", line 400, in __call__
    return _graph_node_meta_call(cls, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ranran_google_com/venv-maxtext/lib/python3.12/site-packages/flax/nnx/pytreelib.py", line 412, in _graph_node_meta_call
    cls._pytree_meta_construct(node, *args, **kwargs)
  File "/home/ranran_google_com/venv-maxtext/lib/python3.12/site-packages/flax/nnx/pytreelib.py", line 403, in _pytree_meta_construct
    self.__init__(*args, **kwargs)
  File "/home/ranran_google_com/maxtext/src/MaxText/layers/embeddings.py", line 130, in __init__
    embedding_init(
  File "/home/ranran_google_com/venv-maxtext/lib/python3.12/site-packages/jax/_src/nn/initializers.py", line 337, in init
    shape = core.canonicalize_shape(shape)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Shapes must be 1D sequences of concrete values of integer type, got (JitTracer<~int32[]>, 512).
If using `jit`, try using `static_argnums` or applying `jit` to smaller subfunctions.

==== Try 2: Not working - pass ngram_layer_map to deepseek layer commit ====

I moved generate_engram_map from data_loader to decoders.py, and pass layer_id and ngram_layer_map to the deepseek.py decoder layer. However, I am not able to pass the layer_id directly, and met error bellow.

  File "/home/ranran_google_com/maxtext/src/MaxText/layers/deepseek.py", line 369, in __call__
    engram_output = self.engram_op(x, ngram_layer_map, layer_id)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ranran_google_com/maxtext/src/MaxText/layers/deepseek.py", line 328, in engram_op
    layer_id = core.concrete_or_error(
               ^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'DynamicJaxprTracer' object is not callable

Noticed for other models like llama4 and gpt-oss, we passed layer_id to a function, and get the attention_type. It seems in JIT, it's tricky to pass this index of layer back to decoder layer.

==== Try 1 - Not working - Integrate Engram with DeepSeek custom model commit ====

I noticed an issue when putting NgramHashMapping into data_loader.py, and it seems not easily to initialize the self.engram = engram.Engram inside of deepseek.py file.

The tricky part is in the current implementation, the Engram needs engram_vocab_sizes to initialize inside of DeepSeekGenericLayer NNX module, which is data dependent based on each data batch here.

      engram_vocab_sizes = ngram_map[layer_id]["vocab_sizes"]
      self.engram_input_ids = ngram_map[layer_id]["input_ids"]
      self.engram = engram.Engram(
        config=self.config,
        mesh=mesh,
        vocab_sizes=engram_vocab_sizes,
        engram_num_heads=self.config.engram_num_heads,
        engram_head_dim=self.config.engram_head_dim,
        engram_max_ngram_size=self.config.engram_max_ngram_size,
        engram_kernel_size=self.config.engram_kernel_size,
        mhc_expansion_rate=self.config.mhc_expansion_rate,
        rngs=rngs,
      )

I think all dynamical data inputs should be passed via call method, instead of init method. If so, we cannot easily initialize the Engram module in this way.

Alternatively, we could only put NgramHashMapping in models.py or decoders.py.

codecov · 2026-02-12T06:27:45Z

Codecov Report

❌ Patch coverage is 49.10714% with 57 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/MaxText/layers/deepseek.py	42.00%	20 Missing and 9 partials ⚠️
src/MaxText/layers/decoders.py	15.78%	12 Missing and 4 partials ⚠️
src/MaxText/train.py	50.00%	4 Missing and 1 partial ⚠️
src/MaxText/layers/engram.py	25.00%	3 Missing ⚠️
src/MaxText/layers/mhc.py	88.00%	3 Missing ⚠️
src/MaxText/layers/models.py	0.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

RissyRan added 2 commits February 11, 2026 18:44

Integrate MHC with DeepSeek custom model

38f7006

Not working - Integrate Engram with DeepSeek custom model

5739f14

Not working - pass ngram_layer_map to deepseek layer

c625dbd

RissyRan force-pushed the engram_integration branch from d701e09 to c625dbd Compare February 12, 2026 06:30

sum(vocab_sizes) trouble

65b10a8

RissyRan force-pushed the engram_integration branch 3 times, most recently from 0bfd32f to 6ae6275 Compare February 13, 2026 00:14

Let's split them

c955972

RissyRan force-pushed the engram_integration branch from 6ae6275 to c955972 Compare February 13, 2026 01:04

NNX module

bcb1654

RissyRan force-pushed the engram_integration branch from ebd090e to bcb1654 Compare February 13, 2026 18:29

RissyRan mentioned this pull request Feb 18, 2026

Integrate Engram into custom model #3183

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft] Engram integration#3125

[Draft] Engram integration#3125
RissyRan wants to merge 6 commits intomainfrom
engram_integration

RissyRan commented Feb 12, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

RissyRan commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

RissyRan commented Feb 12, 2026 •

edited

Loading

codecov bot commented Feb 12, 2026 •

edited

Loading