hello. First of all, thank you for distributing good code.
I want to apply MAGMA to BERT-large inference with batch of 256, but it was difficult to know how to apply it by reading the paper or looking at the code. I would like to ask a few related questions.
Q1. Each row of traffic insts file means a job, and each job in the same insts file has no dependency on each other, right?
Q2. Even if the batched job is divided into "mini-batches that do not have dependencies on each other", the layers in the same mini-batch will depend on the before/after layers. How do I make the insts file to reflect this dependency? Do I have to put the layers with dependencies in different insts files and run them separately?
Q3. What does num_microBatch mean?
Also, if you have a sample traffic insts file or other code used in the paper, could you please update or share it?
thank you
hello. First of all, thank you for distributing good code.
I want to apply MAGMA to BERT-large inference with batch of 256, but it was difficult to know how to apply it by reading the paper or looking at the code. I would like to ask a few related questions.
Q1. Each row of traffic insts file means a job, and each job in the same insts file has no dependency on each other, right?
Q2. Even if the batched job is divided into "mini-batches that do not have dependencies on each other", the layers in the same mini-batch will depend on the before/after layers. How do I make the insts file to reflect this dependency? Do I have to put the layers with dependencies in different insts files and run them separately?
Q3. What does num_microBatch mean?
Also, if you have a sample traffic insts file or other code used in the paper, could you please update or share it?
thank you