Hello,
I've been busy with the default fairseq examples/speech_recognition/infer.py and also this repo's recognize.py, to see if it is possible to run inference using a model we made ourselves by finetuning a base model. We can get the script infer.py to work, but I've noticed that it needs to be able to find the original base model on disk. Moving the checkpoint model to a different machine is cumbersome, the base model has to be in the same location on the target machine.
I've tried to study how the model loading works for almost a day now, but I can't wrap my head around it. I think it only needs some args from the original base model, there is a lot of exchange going on between formats and names cfg, w2v_args, OmegaConf and Namespace.
The recognize.py and recognize.hydra.py break on loading a checkpoint file (but they work on published finetuned models). I would be helped if there is a way to produce a model file that works with recognize.py from the original base model and a checkpoint. I have not been able to find such a tool—I believe it is as simple as adding the correct .cfg.w2v_args info to the checkpoint, but I don't understand how.
I can get recognize.py to work with a checkpoint file with the patch below, but then model loading still refers to the original base model.
@@ -139,13 +162,24 @@ class Wav2VecPredictor:
return feats
def _load_model(self, model_path, target_dict):
- w2v = torch.load(model_path)
-
+ #w2v = torch.load(model_path)
+ #if w2v['args'] is None:
+ # w2v['args'] = Namespace()
# Without create a FairseqTask
- args = base_architecture(w2v["args"])
- model = Wav2VecCtc(args, Wav2VecEncoder(args, target_dict))
- model.load_state_dict(w2v["model"], strict=True)
- return model
+ #args = base_architecture(w2v["args"])
+ #model = Wav2VecCtc(args, Wav2VecEncoder(args, target_dict))
+ #model.load_state_dict(w2v["model"], strict=True)
+
+ models, saved_cfg, task = load_model_ensemble_and_task(
+ utils.split_paths(model_path),
+ arg_overrides=None, # ast.literal_eval(args.model_overrides),
+ task=None,
+ suffix="",
+ strict=True,
+ num_shards=1,
+ state=None
+ )
+ return models[0]
Hello,
I've been busy with the default fairseq
examples/speech_recognition/infer.pyand also this repo'srecognize.py, to see if it is possible to run inference using a model we made ourselves by finetuning a base model. We can get the scriptinfer.pyto work, but I've noticed that it needs to be able to find the original base model on disk. Moving the checkpoint model to a different machine is cumbersome, the base model has to be in the same location on the target machine.I've tried to study how the model loading works for almost a day now, but I can't wrap my head around it. I think it only needs some
argsfrom the original base model, there is a lot of exchange going on between formats and namescfg,w2v_args, OmegaConf and Namespace.The recognize.py and recognize.hydra.py break on loading a checkpoint file (but they work on published finetuned models). I would be helped if there is a way to produce a model file that works with recognize.py from the original base model and a checkpoint. I have not been able to find such a tool—I believe it is as simple as adding the correct
.cfg.w2v_argsinfo to the checkpoint, but I don't understand how.I can get recognize.py to work with a checkpoint file with the patch below, but then model loading still refers to the original base model.