Minami-su's picture
Adding Evaluation Results
c6b0306 verified
|
raw
history blame
6.44 kB
metadata
language:
  - en
  - zh
license: other
library_name: transformers
tags:
  - mistral
  - qwen
  - qwen1.5
  - qwen2
license_name: qwen
license_link: >-
  https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20LICENSE%20AGREEMENT
pipeline_tag: text-generation
inference: false
model-index:
  - name: Qwen1.5-7B-Chat_mistral
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 24.49
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Minami-su/Qwen1.5-7B-Chat_mistral
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 26.69
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Minami-su/Qwen1.5-7B-Chat_mistral
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 25.78
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Minami-su/Qwen1.5-7B-Chat_mistral
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 52.33
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Minami-su/Qwen1.5-7B-Chat_mistral
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 53.67
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Minami-su/Qwen1.5-7B-Chat_mistral
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 0
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Minami-su/Qwen1.5-7B-Chat_mistral
          name: Open LLM Leaderboard

This is the Mistral version of Qwen1.5-7B-Chat model by Alibaba Cloud. The original codebase can be found at: (https://github.com/hiyouga/LLaMA-Factory/blob/main/tests/llamafy_qwen.py). I have made modifications to make it compatible with qwen1.5. This model is converted with https://github.com/Minami-su/character_AI_open/blob/main/mistral_qwen2.py

special

1.Before using this model, you need to modify modeling_mistral.py in transformers library

2.vim /root/anaconda3/envs/train/lib/python3.9/site-packages/transformers/models/mistral/modeling_mistral.py

3.find MistralAttention,

4.modify q,k,v,o bias=False ----->, bias=config.attention_bias

Before: image/png After: image/png

Differences between qwen2 mistral and qwen2 llamafy

Compared to qwen2 llamafy,qwen2 mistral can use sliding window attention,qwen2 mistral is faster than qwen2 llamafy, and the context length is better

Usage:


from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
tokenizer = AutoTokenizer.from_pretrained("Minami-su/Qwen1.5-7B-Chat_mistral")
model = AutoModelForCausalLM.from_pretrained("Minami-su/Qwen1.5-7B-Chat_mistral", torch_dtype="auto", device_map="auto")
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

messages = [
    {"role": "user", "content": "Who are you?"}
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
inputs = inputs.to("cuda")
generate_ids = model.generate(inputs,max_length=32768, streamer=streamer)

Test

load in 4bit

hf-causal (pretrained=Qwen1.5-7B-Chat), limit: None, provide_description: False, num_fewshot: 0, batch_size: 8
|    Task     |Version| Metric |Value |   |Stderr|
|-------------|------:|--------|-----:|---|-----:|
|arc_challenge|      0|acc     |0.4155|±  |0.0144|
|             |       |acc_norm|0.4480|±  |0.0145|
|truthfulqa_mc|      1|mc1     |0.3513|±  |0.0167|
|             |       |mc2     |0.5165|±  |0.0159|
|winogrande   |      0|acc     |0.6330|±  |0.0135|

load in 4bit

hf-causal (pretrained=Qwen1.5-7B-Chat_mistral), limit: None, provide_description: False, num_fewshot: 0, batch_size: 16
|    Task     |Version| Metric |Value |   |Stderr|
|-------------|------:|--------|-----:|---|-----:|
|arc_challenge|      0|acc     |0.4172|±  |0.0144|
|             |       |acc_norm|0.4480|±  |0.0145|
|truthfulqa_mc|      1|mc1     |0.3488|±  |0.0167|
|             |       |mc2     |0.5161|±  |0.0159|
|winogrande   |      0|acc     |0.6306|±  |0.0136|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Minami-su__Qwen1.5-7B-Chat_mistral)

|             Metric              |Value|
|---------------------------------|----:|
|Avg.                             |30.49|
|AI2 Reasoning Challenge (25-Shot)|24.49|
|HellaSwag (10-Shot)              |26.69|
|MMLU (5-Shot)                    |25.78|
|TruthfulQA (0-shot)              |52.33|
|Winogrande (5-shot)              |53.67|
|GSM8k (5-shot)                   | 0.00|