whisper-webui-translate

Sleeping

avans06 commited on Dec 14, 2023

Commit

1a74319

•

1 Parent(s): 43d9f32

Fixed the issue where ALMA running on CPU led to the occurrence of the "addmm_impl_cpu_ not implemented for 'Half'" exception.

Files changed (2) hide show

docs/translateModel.md CHANGED Viewed

@@ -73,6 +73,7 @@ The 'mt5-zh-ja-en-trimmed' model is finetuned from Google's 'mt5-base' model. Th
 ## ALMA
 ALMA is a many-to-many LLM-based translation model introduced by Haoran Xu and colleagues in September 2023. It is based on the fine-tuning of a large language model (LLaMA-2). The approach used for this model is referred to as Advanced Language Model-based trAnslator (ALMA). The paper is titled "`A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models`" ([arXiv:2309.11674](https://arxiv.org/abs/2309.11674)).
 The official support for ALMA currently includes 10 language directions: English↔German, English↔Czech, English↔Icelandic, English↔Chinese, and English↔Russian. However, the author hints that there might be surprises in other directions, so there are currently no restrictions on the languages that ALMA can be chosen for in the web UI.

 ## ALMA
+ALMA is an excellent translation model, but it is strongly discouraged to operate it on CPU.
 ALMA is a many-to-many LLM-based translation model introduced by Haoran Xu and colleagues in September 2023. It is based on the fine-tuning of a large language model (LLaMA-2). The approach used for this model is referred to as Advanced Language Model-based trAnslator (ALMA). The paper is titled "`A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models`" ([arXiv:2309.11674](https://arxiv.org/abs/2309.11674)).
 The official support for ALMA currently includes 10 language directions: English↔German, English↔Czech, English↔Icelandic, English↔Chinese, and English↔Russian. However, the author hints that there might be surprises in other directions, so there are currently no restrictions on the languages that ALMA can be chosen for in the web UI.

src/translation/translationModel.py CHANGED Viewed

@@ -163,8 +163,10 @@ class TranslationModel:
                 self.transTokenizer = transformers.AutoTokenizer.from_pretrained(self.modelPath, use_fast=True)
                 transModelConfig = transformers.AutoConfig.from_pretrained(self.modelPath)
                 if self.device == "cpu":
                     transModelConfig.quantization_config["use_exllama"] = False
-                    self.transModel = transformers.AutoModelForCausalLM.from_pretrained(self.modelPath, device_map="auto", low_cpu_mem_usage=True, trust_remote_code=False, revision=self.modelConfig.revision, config=transModelConfig)
                 else:
                     # transModelConfig.quantization_config["exllama_config"] = {"version":2} # After configuring to use ExLlamaV2, VRAM cannot be effectively released, which may be an issue. Temporarily not adopting the V2 version.
                     self.transModel = transformers.AutoModelForCausalLM.from_pretrained(self.modelPath, device_map="auto", low_cpu_mem_usage=True, trust_remote_code=False, revision=self.modelConfig.revision)

                 self.transTokenizer = transformers.AutoTokenizer.from_pretrained(self.modelPath, use_fast=True)
                 transModelConfig = transformers.AutoConfig.from_pretrained(self.modelPath)
                 if self.device == "cpu":
+                    # ALMA is an excellent translation model, but it is strongly discouraged to operate it on CPU.
+                    # set torch_dtype=torch.float32 to prevent the occurrence of the exception "addmm_impl_cpu_ not implemented for 'Half'."
                     transModelConfig.quantization_config["use_exllama"] = False
+                    self.transModel = transformers.AutoModelForCausalLM.from_pretrained(self.modelPath, device_map="auto", low_cpu_mem_usage=True, trust_remote_code=False, revision=self.modelConfig.revision, config=transModelConfig, torch_dtype=torch.float32)
                 else:
                     # transModelConfig.quantization_config["exllama_config"] = {"version":2} # After configuring to use ExLlamaV2, VRAM cannot be effectively released, which may be an issue. Temporarily not adopting the V2 version.
                     self.transModel = transformers.AutoModelForCausalLM.from_pretrained(self.modelPath, device_map="auto", low_cpu_mem_usage=True, trust_remote_code=False, revision=self.modelConfig.revision)