omkarthawakar commited on
Commit
20b2991
1 Parent(s): f09f1db

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +96 -1
README.md CHANGED
@@ -1,3 +1,98 @@
1
  ---
2
- license: apache-2.0
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: mit
3
+ license_link: https://huggingface.co/microsoft/phi-2/resolve/main/LICENSE
4
+ language:
5
+ - en
6
+ pipeline_tag: text-generation
7
+ tags:
8
+ - nlp
9
+ - code
10
+ datasets:
11
+ - LLM360/AmberDatasets
12
  ---
13
+ # MobiLlama-08B
14
+
15
+ <center><img src="MobileLLaMa.png" alt="mobillama logo" width="300"/></center>
16
+
17
+ MobiLlama-08B is a Small Language Model with **0.8 billion** parameters. It was trained using the Amber data sources [Amber-Dataset](https://huggingface.co/datasets/LLM360/AmberDatasets).
18
+
19
+
20
+ ## Model Summary
21
+
22
+ "Bigger the better" has been the predominant trend in recent Large Language Models (LLMs) development. However, LLMs do not suit well for scenarios that require on-device processing, energy efficiency, low memory footprint, and response efficiency. These requisites are crucial for privacy, security, and sustainable deployment. This paper explores the ‘less is more’ paradigm by addressing the challenge of designing accurate yet efficient Small Language Models (SLMs) for resource-constrained devices. Our primary contribution is the introduction of an accurate and fully transparent open-source 0.5 billion (0.5B) parameter SLM, named MobiLlama, catering to the specific needs of resource-constrained computing with an emphasis on enhanced performance with reduced resource demands. MobiLlama is a SLM design that initiates from a larger model and applies a careful parameter sharing scheme to reduce both the pre-training and the deployment cost. Our work strives to not only bridge the gap in open-source SLMs but also ensures full transparency, where complete training data pipeline, training code, model weights, and over 300 checkpoints along with evaluation codes are available on our [Github](https://github.com/mbzuai-oryx/MobiLlama).
23
+
24
+ [Arxiv Paper Link]('')
25
+
26
+ ## Model Description
27
+
28
+ - **Model type:** Small Language Model (SLM) built using the architecture design of LLaMA-7B
29
+ - **Language(s) (NLP):** English
30
+ - **License:** Apache 2.0
31
+ - **Resources for more information:**
32
+ - [Training Code](https://github.com/mbzuai-oryx/MobiLlama)
33
+ - [Data Preparation](https://github.com/LLM360/amber-data-prep)
34
+ - [Fully processed Amber pretraining data](https://huggingface.co/datasets/LLM360/AmberDatasets)
35
+
36
+
37
+ ## How to Use
38
+
39
+ ```python
40
+ from transformers import AutoModelForCausalLM, AutoTokenizer
41
+
42
+ tokenizer = AutoTokenizer.from_pretrained("MBZUAI/MobiLlama-08B", trust_remote_code=True)
43
+ model = AutoModelForCausalLM.from_pretrained("MBZUAI/MobiLlama-08B", trust_remote_code=True)
44
+
45
+ model.to('cuda')
46
+ text = "I was walking towards the river when "
47
+ input_ids = tokenizer(text, return_tensors="pt").to('cuda').input_ids
48
+ outputs = model.generate(input_ids, max_length=1000, repetition_penalty=1.2, pad_token_id=tokenizer.eos_token_id)
49
+ print(tokenizer.batch_decode(outputs[:, input_ids.shape[1]:-1])[0].strip())
50
+
51
+ ```
52
+
53
+ ## Training DataMix
54
+ | Subset | Tokens (Billion) |
55
+ | ----------- | ----------- |
56
+ | Arxiv | 30.00 |
57
+ | Book | 28.86 |
58
+ | C4 | 197.67 |
59
+ | Refined-Web | 665.01 |
60
+ | StarCoder | 291.92 |
61
+ | StackExchange | 21.75 |
62
+ | Wikipedia | 23.90 |
63
+ | Total | 1259.13 |
64
+
65
+ ## Hyperparameters
66
+ | Hyperparameter | Value |
67
+ | ----------- | ----------- |
68
+ | Total Parameters | 0.8B |
69
+ | Hidden Size | 2560 |
70
+ | Intermediate Size (MLPs) | 10240 |
71
+ | Number of Attention Heads | 32 |
72
+ | Number of Hidden Lyaers | 22 |
73
+ | RMSNorm ɛ | 1e^-5 |
74
+ | Max Seq Length | 2048 |
75
+ | Vocab Size | 32000 |
76
+
77
+
78
+ ## Evaluation
79
+
80
+ | Evaluation Benchmark | MobiLlama-0.5B | MobiLlama-0.8B | MobiLlama-1.2B |
81
+ | ----------- | ----------- | ----------- | ----------- |
82
+ | HellaSwag | 52.52 | 54.09 | 62.99 |
83
+ | MMLU | 26.45 | 26.92 | 24.23 |
84
+ | Arc Challenge | 29.52 | 30.20 | 34.55 |
85
+ | TruthfulQA | 38.05 | 38.48 | 35.57 |
86
+ | CrowsPairs | 64.03 | 64.82 | 68.12 |
87
+ | PIQA | 72.03 | 73.17 | 75.29 |
88
+ | Race | 33.68 | 33.37 | 35.31 |
89
+ | SIQA | 40.22 | 41.60 | 41.96 |
90
+ | Winogrande | 57.53 | 57.45 | 61.08 |
91
+
92
+
93
+ ## Citation
94
+ **BibTeX:**
95
+
96
+ ```bibtex
97
+ coming soon
98
+ ```