Model Sources

Model Description

🔥 LLaMAX2-7B-MetaMath is fully fine-tuned on the MetaMathQA dataset based on the powerful multilingual model LLaMAX2-7B.

🔥 Compared with the MetaMath-7B, LLaMAX2-7B-MetaMath performs significantly better in mathematical reasoning in low-resource languages, improving the average accuracy of low-resource languages on MGSM dataset by up to 18.8%.

🔥 LLaMAX2-7B-MetaMath demonstrates good multilingual math reasoning capability in all languages, improving the average accuracy by 6.2% across all languages in MGSM dataset.

Experiments

We evaluated LLaMAX2-7B-MetaMath on the MGSM dataset. Compared with MetaMath-7B, LLaMAX-7B-MetaMath achieves a leading on both high-resource languages (Hrl.) and low-resource languages (Lrl.).

MGSM Avg. Lrl. Hrl. Bn Th Sw Ja Zh De Fr Ru Es En
MetaMath-7B (official) 38.32 6.9 51.8 6.8 7.2 6.8 36.4 38.4 55.2 54.4 52.0 57.2 68.8
MetaMath-7B (Reproduced) 38.08 6.8 51.5 6.0 10.0 4.4 36.4 42.8 52.8 56.0 48.8 58.8 64.8
LLaMAX2-7B-MetaMath 44.28 25.6 52.3 26.8 24.0 26.0 35.6 42.4 56.8 55.2 53.6 56.8 65.6

Model Usage

Prompt template:

def Prompt_template(query):
    prompt = (
         "Below is an instruction that describes a task. "
         "Write a response that appropriately completes the request.\n\n"
         f"### Instruction:\n{query}\n\n### Response: Let's think step by step."
    )
    return prompt

Code Example:

from transformers import AutoTokenizer, LlamaForCausalLM

model = LlamaForCausalLM.from_pretrained(PATH_TO_CONVERTED_WEIGHTS)
tokenizer = AutoTokenizer.from_pretrained(PATH_TO_CONVERTED_TOKENIZER)

query = "Bert fills out the daily crossword puzzle in the newspaper every day. He uses a pencil to fill out the puzzles every two weeks. On average, it takes him 1050 words to use up a pencil. How many words are in each crossword puzzle on average?"
prompt = Prompt_template(query)
inputs = tokenizer(prompt, return_tensors="pt")

generate_ids = model.generate(inputs.input_ids, max_length=30)
tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

# => "If Bert uses up a pencil to fill out the puzzles every two weeks and it takes him 1050
words to use up a pencil, then he must be filling out 1050 words of crossword puzzles every
two weeks. To find out how many words are in each daily crossword puzzle, we need to divide
the total number of words (1050) by the number of days in two weeks (14). So, there are
1050/14 = 75 words in each daily crossword puzzle on average. #### The answer is: 75“

Citation

if our model helps your work, please cite this paper:

@inproceedings{lu-etal-2024-llamax,
    title = "{LL}a{MAX}: Scaling Linguistic Horizons of {LLM} by Enhancing Translation Capabilities Beyond 100 Languages",
    author = "Lu, Yinquan  and
      Zhu, Wenhao  and
      Li, Lei  and
      Qiao, Yu  and
      Yuan, Fei",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-emnlp.631",
    doi = "10.18653/v1/2024.findings-emnlp.631",
    pages = "10748--10772",
    abstract = "Large Language Models (LLMs) demonstrate remarkable translation capabilities in high-resource language tasks, yet their performance in low-resource languages is hindered by insufficient multilingual data during pre-training. To address this, we conduct extensive multilingual continual pre-training on the LLaMA series models, enabling translation support across more than 100 languages. Through a comprehensive analysis of training strategies, such as vocabulary expansion and data augmentation, we develop LLaMAX. Remarkably, without sacrificing its generalization ability, LLaMAX achieves significantly higher translation performance compared to existing open-source LLMs (by more than 10 spBLEU points) and performs on-par with specialized translation model (M2M-100-12B) on the Flores-101 benchmark. Extensive experiments indicate that LLaMAX can serve as a robust multilingual foundation model. The code and the models are publicly available.",
}
Downloads last month
11
Safetensors
Model size
6.74B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for LLaMAX/LLaMAX2-7B-MetaMath

Quantizations
1 model