Question about FuseLLM: Why do you use the first input_token in token alignment?

Hi! I was wondering since the LLM predict the next token, why would you use the first input_token in token alignment and get the aligned_logit and index? It seems that the first element in base_model_per_step_logit and blending_model_per_step_logit is corresponding to the second token in input_tokens/input_ids.