--- license: other pipeline_tag: text-generation model-index: - name: internlm2_5-7b-chat results: - task: type: text-generation name: Text Generation dataset: name: IFEval (0-Shot) type: HuggingFaceH4/ifeval args: num_few_shot: 0 metrics: - type: inst_level_strict_acc and prompt_level_strict_acc value: 61.4 name: strict accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=internlm/internlm2_5-7b-chat name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: BBH (3-Shot) type: BBH args: num_few_shot: 3 metrics: - type: acc_norm value: 57.67 name: normalized accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=internlm/internlm2_5-7b-chat name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MATH Lvl 5 (4-Shot) type: hendrycks/competition_math args: num_few_shot: 4 metrics: - type: exact_match value: 8.31 name: exact match source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=internlm/internlm2_5-7b-chat name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GPQA (0-shot) type: Idavidrein/gpqa args: num_few_shot: 0 metrics: - type: acc_norm value: 10.63 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=internlm/internlm2_5-7b-chat name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MuSR (0-shot) type: TAUR-Lab/MuSR args: num_few_shot: 0 metrics: - type: acc_norm value: 14.35 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=internlm/internlm2_5-7b-chat name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU-PRO (5-shot) type: TIGER-Lab/MMLU-Pro config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 30.42 name: accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=internlm/internlm2_5-7b-chat name: Open LLM Leaderboard --- # InternLM
👋 join us on Discord and WeChat
## Introduction InternLM2.5 has open-sourced a 7 billion parameter base model and a chat model tailored for practical scenarios. The model has the following characteristics: - **Outstanding reasoning capability**: State-of-the-art performance on Math reasoning, surpassing models like Llama3 and Gemma2-9B. - **1M Context window**: Nearly perfect at finding needles in the haystack with 1M-long context, with leading performance on long-context tasks like LongBench. Try it with [LMDeploy](./chat/lmdeploy.md) for 1M-context inference. - **Stronger tool use**: InternLM2.5 supports gathering information from more than 100 web pages, corresponding implementation will be released in [Lagent](https://github.com/InternLM/lagent/tree/main) soon. InternLM2.5 has better tool utilization-related capabilities in instruction following, tool selection and reflection. See [examples](https://github.com/InternLM/InternLM/blob/main/agent/lagent.md). ## InternLM2.5-7B-Chat ### Performance Evaluation We conducted a comprehensive evaluation of InternLM using the open-source evaluation tool [OpenCompass](https://github.com/internLM/OpenCompass/). The evaluation covered five dimensions of capabilities: disciplinary competence, language competence, knowledge competence, inference competence, and comprehension competence. Here are some of the evaluation results, and you can visit the [OpenCompass leaderboard](https://rank.opencompass.org.cn) for more evaluation results. | Dataset\Models |Qwen2-7B-Instruct | Yi-1.5-9B-Chat | GLM-4-9B-Chat | Llama-3-8B-Instruct | Gemma2-9B-IT | InternLM2.5-7B-Chat | Llama-3-70B-Instruct | --- | --- | --- | --- | --- | --- | --- | --- | |MMLU | 70.8 | 71.0 | 71.4 | 68.4 | 70.9 | 72.8 | 80.5 |CMMLU | 80.9 | 74.5 | 74.5 | 53.3 | 60.3 | 78.0 | 70.1 |BBH |65 |69.6 |69.6 |65.4 |68.2 |71.6 |80.5 |MATH | 48.6 | 51.1 | 51.1 | 27.9 | 46.9 | 60.7 | 47.1 | GSM8K | 82.9 | 80.1 | 85.3 | 72.9 | 88.9 | 86.0 | 92.8 |GPQA | 38.4 | 37.9 | 36.9 | 26.3 | 33.8 | 38.4 | 38.9 - The evaluation results were obtained from [OpenCompass](https://github.com/internLM/OpenCompass/) (some data marked with *, which means come from the original papers), and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/internLM/OpenCompass/). - The evaluation data may have numerical differences due to the version iteration of [OpenCompass](https://github.com/internLM/OpenCompass/), so please refer to the latest evaluation results of [OpenCompass](https://github.com/internLM/OpenCompass/). **Limitations:** Although we have made efforts to ensure the safety of the model during the training process and to encourage the model to generate text that complies with ethical and legal requirements, the model may still produce unexpected outputs due to its size and probabilistic generation paradigm. For example, the generated responses may contain biases, discrimination, or other harmful content. Please do not propagate such content. We are not responsible for any consequences resulting from the dissemination of harmful information. ### Import from Transformers To load the InternLM2.5 7B Chat model using Transformers, use the following code: ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("internlm/internlm2_5-7b-chat", trust_remote_code=True) # Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and cause OOM Error. model = AutoModelForCausalLM.from_pretrained("internlm/internlm2_5-7b-chat", torch_dtype=torch.float16, trust_remote_code=True).cuda() model = model.eval() response, history = model.chat(tokenizer, "hello", history=[]) print(response) # Hello! How can I help you today? response, history = model.chat(tokenizer, "please provide three suggestions about time management", history=history) print(response) ``` The responses can be streamed using `stream_chat`: ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer model_path = "internlm/internlm2_5-7b-chat" model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, trust_remote_code=True).cuda() tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) model = model.eval() length = 0 for response, history in model.stream_chat(tokenizer, "Hello", history=[]): print(response[length:], flush=True, end="") length = len(response) ``` ## Deployment ### LMDeploy LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by the MMRazor and MMDeploy teams. ```bash pip install lmdeploy ``` You can run batch inference locally with the following python code: ```python import lmdeploy pipe = lmdeploy.pipeline("internlm/internlm2_5-7b-chat") response = pipe(["Hi, pls intro yourself", "Shanghai is"]) print(response) ``` Or you can launch an OpenAI compatible server with the following command: ```bash lmdeploy serve api_server internlm/internlm2_5-7b-chat --model-name internlm2_5-7b-chat --server-port 23333 ``` Then you can send a chat request to the server: ```bash curl http://localhost:23333/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "internlm2_5-7b-chat", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Introduce deep learning to me."} ] }' ``` Find more details in the [LMDeploy documentation](https://lmdeploy.readthedocs.io/en/latest/) ### vLLM Launch OpenAI compatible server with `vLLM>=0.3.2`: ```bash pip install vllm ``` ```bash python -m vllm.entrypoints.openai.api_server --model internlm/internlm2_5-7b-chat --served-model-name internlm2_5-7b-chat --trust-remote-code ``` Then you can send a chat request to the server: ```bash curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "internlm2_5-7b-chat", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Introduce deep learning to me."} ] }' ``` Find more details in the [vLLM documentation](https://docs.vllm.ai/en/latest/index.html) ## Open Source License The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow **free** commercial usage. To apply for a commercial license, please fill in the [application form (English)](https://wj.qq.com/s2/12727483/5dba/)/[申请表(中文)](https://wj.qq.com/s2/12725412/f7c1/). For other questions or collaborations, please contact