metadata

license: llama2
model-index:
  - name: speechless-tools-7b
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 38.91
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=uukuguy/speechless-tools-7b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 57.69
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=uukuguy/speechless-tools-7b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 33.24
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=uukuguy/speechless-tools-7b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 44.08
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=uukuguy/speechless-tools-7b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 58.56
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=uukuguy/speechless-tools-7b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 7.51
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=uukuguy/speechless-tools-7b
          name: Open LLM Leaderboard

The speechless-tools-7b model is fine-tuned on speechless-coding-7b-16k-tora, following the guidance of the ToolLlama project, aims to empower open-source LLMs with the ability to handle thousands of diverse real-world APIs.

Local Test

speechless-tools-7b-dfs vs chatgpt-cot

Dataset	Win Rate
G1_instruction	0.465
G1_category	0.495
G1_tool	0.505
G2_instruction	0.61
G2_category	0.585
G3_instruction	0.66

speechless-tools-7b-dfs vs toolllama-dfs

Dataset	Win Rate
G1_instruction	0.45
G1_category	0.45
G1_tool	0.51
G2_instruction	0.53
G2_category	0.575
G3_instruction	0.46

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	40.00
AI2 Reasoning Challenge (25-Shot)	38.91
HellaSwag (10-Shot)	57.69
MMLU (5-Shot)	33.24
TruthfulQA (0-shot)	44.08
Winogrande (5-shot)	58.56
GSM8k (5-shot)	7.51