avans06 commited on
Commit
1a74319
1 Parent(s): 43d9f32

Fixed the issue where ALMA running on CPU led to the occurrence of the "addmm_impl_cpu_ not implemented for 'Half'" exception.

Browse files
docs/translateModel.md CHANGED
@@ -73,6 +73,7 @@ The 'mt5-zh-ja-en-trimmed' model is finetuned from Google's 'mt5-base' model. Th
73
 
74
  ## ALMA
75
 
 
76
  ALMA is a many-to-many LLM-based translation model introduced by Haoran Xu and colleagues in September 2023. It is based on the fine-tuning of a large language model (LLaMA-2). The approach used for this model is referred to as Advanced Language Model-based trAnslator (ALMA). The paper is titled "`A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models`" ([arXiv:2309.11674](https://arxiv.org/abs/2309.11674)).
77
  The official support for ALMA currently includes 10 language directions: English↔German, English↔Czech, English↔Icelandic, English↔Chinese, and English↔Russian. However, the author hints that there might be surprises in other directions, so there are currently no restrictions on the languages that ALMA can be chosen for in the web UI.
78
 
 
73
 
74
  ## ALMA
75
 
76
+ ALMA is an excellent translation model, but it is strongly discouraged to operate it on CPU.
77
  ALMA is a many-to-many LLM-based translation model introduced by Haoran Xu and colleagues in September 2023. It is based on the fine-tuning of a large language model (LLaMA-2). The approach used for this model is referred to as Advanced Language Model-based trAnslator (ALMA). The paper is titled "`A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models`" ([arXiv:2309.11674](https://arxiv.org/abs/2309.11674)).
78
  The official support for ALMA currently includes 10 language directions: English↔German, English↔Czech, English↔Icelandic, English↔Chinese, and English↔Russian. However, the author hints that there might be surprises in other directions, so there are currently no restrictions on the languages that ALMA can be chosen for in the web UI.
79
 
src/translation/translationModel.py CHANGED
@@ -163,8 +163,10 @@ class TranslationModel:
163
  self.transTokenizer = transformers.AutoTokenizer.from_pretrained(self.modelPath, use_fast=True)
164
  transModelConfig = transformers.AutoConfig.from_pretrained(self.modelPath)
165
  if self.device == "cpu":
 
 
166
  transModelConfig.quantization_config["use_exllama"] = False
167
- self.transModel = transformers.AutoModelForCausalLM.from_pretrained(self.modelPath, device_map="auto", low_cpu_mem_usage=True, trust_remote_code=False, revision=self.modelConfig.revision, config=transModelConfig)
168
  else:
169
  # transModelConfig.quantization_config["exllama_config"] = {"version":2} # After configuring to use ExLlamaV2, VRAM cannot be effectively released, which may be an issue. Temporarily not adopting the V2 version.
170
  self.transModel = transformers.AutoModelForCausalLM.from_pretrained(self.modelPath, device_map="auto", low_cpu_mem_usage=True, trust_remote_code=False, revision=self.modelConfig.revision)
 
163
  self.transTokenizer = transformers.AutoTokenizer.from_pretrained(self.modelPath, use_fast=True)
164
  transModelConfig = transformers.AutoConfig.from_pretrained(self.modelPath)
165
  if self.device == "cpu":
166
+ # ALMA is an excellent translation model, but it is strongly discouraged to operate it on CPU.
167
+ # set torch_dtype=torch.float32 to prevent the occurrence of the exception "addmm_impl_cpu_ not implemented for 'Half'."
168
  transModelConfig.quantization_config["use_exllama"] = False
169
+ self.transModel = transformers.AutoModelForCausalLM.from_pretrained(self.modelPath, device_map="auto", low_cpu_mem_usage=True, trust_remote_code=False, revision=self.modelConfig.revision, config=transModelConfig, torch_dtype=torch.float32)
170
  else:
171
  # transModelConfig.quantization_config["exllama_config"] = {"version":2} # After configuring to use ExLlamaV2, VRAM cannot be effectively released, which may be an issue. Temporarily not adopting the V2 version.
172
  self.transModel = transformers.AutoModelForCausalLM.from_pretrained(self.modelPath, device_map="auto", low_cpu_mem_usage=True, trust_remote_code=False, revision=self.modelConfig.revision)