File size: 15,799 Bytes
2dd6909
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
---
datasets:
- nlpai-lab/kullm-v2
language:
- ko
- en
---

<br/>

# ๐Ÿ‡ฐ๐Ÿ‡ท๐Ÿฆ™ KoLlama2-7b ์ €์žฅ์†Œ์ž…๋‹ˆ๋‹ค ๐Ÿฆ™๐Ÿ‡ฐ๐Ÿ‡ท

โœ… KoLlama2 ์ฒซ๋ฒ„์ „์€ ๊ณ ๋ ค๋Œ€ํ•™๊ต NLP & AI ์—ฐ๊ตฌ์‹ค๊ณผ HIAI ์—ฐ๊ตฌ์†Œ๊ฐ€ ๊ณต๊ฐœํ•œ ํ•œ๊ตญ์–ด instruction dataset kullm-v2๋ฅผ ์‚ฌ์šฉํ•œ LoRA ํŒŒ์ธํŠœ๋‹์ž…๋‹ˆ๋‹ค.


<br/>

[Read English](#kollama2--open-source-language-model-based-on-llama2-optimized-for-korean)

# KoLlama2 : ํ•œ๊ตญ์–ด์— ์ตœ์ ํ™”๋œ Llama2 ๊ธฐ๋ฐ˜ ์˜คํ”ˆ์†Œ์Šค ์–ธ์–ด๋ชจ๋ธ

KoLlama2(Korean Large Language Model Meta AI 2)๋Š” ์˜์–ด ๊ธฐ๋ฐ˜ LLM์ธ Llama2์˜ ํ•œ๊ตญ์–ด ์„ฑ๋Šฅ์„ ํ–ฅ์ƒํ•˜๊ธฐ ์œ„ํ•œ ์˜คํ”ˆ์†Œ์Šค ํ”„๋กœ์ ํŠธ์ž…๋‹ˆ๋‹ค. 

<br/>

## ํ•„์š”์„ฑ

GPT3๋ถ€ํ„ฐ Bert, Llama2์— ์ด๋ฅด๊ธฐ๊นŒ์ง€ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด๋ชจ๋ธ์˜ ๋†€๋ผ์šด ๋ฐœ์ „์€ ๋ชจ๋“  ์ด์˜ ์ด๋ชฉ์„ ๋Œ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋Œ€๊ทœ๋ชจ ๋ง๋ญ‰์น˜๋ฅผ ์‚ฌ์ „ํ•™์Šตํ•˜๋Š” LLM์˜ ํŠน์„ฑ์ƒ ํ•™์Šต ๋ฐ์ดํ„ฐ ์ค‘ ๋Œ€๋‹ค์ˆ˜๋Š” ์˜์–ด๋กœ ๊ตฌ์ •๋˜๋ฉฐ, ํ•œ๊ตญ์–ด๋Š” ๋งค์šฐ ์ ์€ ๋น„์œจ์„ ์ฐจ์ง€ํ•ฉ๋‹ˆ๋‹ค.

- GPT3์˜ ์‚ฌ์ „ํ•™์Šต ๋ฐ์ดํ„ฐ ์ค‘ ํ•œ๊ตญ์–ด ๋น„์œจ: 0.01697%
<p align="center" style="color:gray">
  <img style="margin:20px 0 10px 0" src="https://github.com/psymon-dev/KoLlama2/assets/91517542/b50b9283-fb54-46b6-bc84-00bd363601c8" alt="image" width=482 />
  <br/>์ถœ์ฒ˜: https://github.com/openai/gpt-3/blob/master/dataset_statistics/languages_by_word_count.csv
</p> 

- Llama2 ๋ชจ๋ธ์˜ ์‚ฌ์ „ํ•™์Šต ๋ฐ์ดํ„ฐ ์ค‘ ํ•œ๊ตญ์–ด ๋น„์œจ: 0.06%
<p align="center" style="color:gray">
  <img style="margin:20px 0 10px 0" src="https://github.com/psymon-dev/KoLlama2/assets/91517542/79b72fee-3517-4a7e-a0a5-fda4c8f2a7ca" alt="image" width=482 />
  <br/>์ถœ์ฒ˜: 22p Table 10, Llama 2: Open Foundation and Fine-Tuned Chat Models, Hugo Touvron et al, July 18-2023.
</p> 

์ด ๋น„์œจ์€ ์ „์„ธ๊ณ„ ์ธ๊ตฌ(7.888 billion) ์ค‘ ํ•œ๊ตญ์–ด ํ™”์ž(81.7M) ๋น„์œจ(1.035%)๊ณผ ๋น„๊ตํ•ด๋„ ํฌ๊ฒŒ ๋‚ฎ์€ ์ˆ˜์น˜์ž…๋‹ˆ๋‹ค. ์ด๋Š” ๊ณ ๋ฆฝ์–ด๋ผ๋Š” ํ•œ๊ตญ์–ด ํŠน์„ฑ, ์ค€๋น„๋˜์ง€ ์•Š์€ ํ•œ๊ตญ์–ด ๋ง๋ญ‰์น˜ ๋“ฑ ์—ฌ๋Ÿฌ ์š”์ธ์— ๊ธฐ๋ฐ˜ํ•œ ๊ฒƒ์ด์ง€๋งŒ ๊ฒฐ๊ณผ์ ์œผ๋กœ ํ•œ๊ตญ์–ด ์‚ฌ์šฉ์ž๊ฐ€ LLM์˜ ํ’๋ถ€ํ•œ ๋Šฅ๋ ฅ์„ ๊ฒฝํ—˜ํ•˜๋Š” ๊ฒƒ์„ ๋งค์šฐ ์ œํ•œํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

<br/>

## ๊ธฐ์กด ์‹œ๋„๋“ค
### ํ•œ๊ตญ์–ด๊ธฐ๋ฐ˜ LLM ์‚ฌ์ „ํ•™์Šต
๊ฐ€์žฅ ์ข‹์€ ํ•ด๊ฒฐ์ฑ… ์ค‘ ํ•˜๋‚˜๋Š” ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ๋กœ ์‚ฌ์ „ํ•™์Šตํ•œ ์ž์ฒด ์–ธ์–ด๋ชจ๋ธ์„ ๋งŒ๋“œ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์‹œ๋„๋Š” ์ž๋ณธ๋ ฅ์„ ๊ฐ–์ถ˜ ๋Œ€๊ธฐ์—…์˜ ์ฃผ๋„๋กœ ์ง„ํ–‰๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

* Naver์˜ HyperCLOVA X : https://clova.ai/hyperclova
* Kakao์˜ KoGPT : https://github.com/kakaobrain/kogpt
* EleutherAI์˜ polyglot-ko : https://github.com/EleutherAI/polyglot

์ด๋Ÿฌํ•œ ์ ‘๊ทผ๋ฒ•์€ LLM์˜ ํ•œ๊ตญ์–ด ๋Šฅ๋ ฅ ๋ถ€์กฑ์„ ๊ฐ€์žฅ ํ™•์‹คํ•˜๊ฒŒ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค๋งŒ ๋ฌธ์ œ๋Š” LLM์˜ ๋ณ€ํ™”์†๋„๊ฐ€ ๋„ˆ๋ฌด ๋น ๋ฅด๋‹ค๋Š” ๋ฐ ์žˆ์Šต๋‹ˆ๋‹ค. LLaMA ๋ชจ๋ธ์ด ๊ณต๊ฐœ๋œ ํ›„ Llama2 ๋ชจ๋ธ์ด ๊ณต๊ฐœ๋˜๊ธฐ๊นŒ์ง€ ๊ณ ์ž‘ 5๊ฐœ์›” ๋ฐ–์— ๊ฑธ๋ฆฌ์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ๋งค์ฃผ ์ƒˆ๋กœ์šด ๊ธฐ์ˆ ์ด ๋ฐœํ‘œ๋˜๋Š” ํ˜„ ์ƒํ™ฉ์—์„œ ๋ฏธ๋ž˜ ๋ฐœ์ „ ๋ฐฉํ–ฅ์„ ์ •ํ™•ํžˆ ์˜ˆ์ธกํ•˜๊ฑฐ๋‚˜, ๋งค๋ฒˆ ์ƒˆ๋กœ์šด ๋ณ€ํ™”์— ๋งž์ถฐ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด๋ชจ๋ธ์„ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์€ ๋ถˆ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ ์šฐ๋ฆฌ๋Š” ์ž์ฒด ์–ธ์–ด๋ชจ๋ธ์„ ํ•™์Šตํ•˜๋Š” ๊ฒƒ๊ณผ ๋ณ‘ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ๋” ๊ฐ€๋ณ๊ณ  ๋น ๋ฅธ ๋ฐฉ๋ฒ•์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

### ์™ธ๊ตญ์–ด๊ธฐ๋ฐ˜ LLM ๋ฏธ์„ธ์กฐ์ •
์™ธ๊ตญ์–ด๊ธฐ๋ฐ˜ LLM์„ ํ•œ๊ตญ์–ด๋กœ ๋ฏธ์„ธ์กฐ์ •ํ•˜๋Š” ๊ฒƒ์€ ์ด ๋ฌธ์ œ์— ๋Œ€ํ•œ ์ข‹์€ ํ•ด๊ฒฐ ์ฑ…์ž…๋‹ˆ๋‹ค. LLaMa ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์•„๋ž˜์™€ ๊ฐ™์€ ์‹œ๋„๋“ค์ด ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

* KoAlpaca : https://github.com/Beomi/KoAlpaca
* KULLM : https://github.com/nlpai-lab/KULLM
* KoVicuna : https://github.com/melodysdreamj/KoVicuna
* KORani : https://github.com/krafton-ai/KORani

์ด๋Ÿฌํ•œ ์‹œ๋„๋“ค์€ ์˜คํ”ˆ์†Œ์Šค LLM์— ๋Œ€ํ•œ ๊ด€์‹ฌ์„ ๋Š˜๋ฆฌ๊ณ  ๋‹ค์–‘ํ•œ ๋ฏธ์„ธ์กฐ์ • ๋ฐฉ๋ฒ•์„ ์ดํ•ดํ•˜๋Š” ๋ฐ ๋„์›€์„ ์ฃผ์—ˆ์ง€๋งŒ ํ•œ๊ณ„์ ๋„ ๋ช…ํ™•ํ–ˆ์Šต๋‹ˆ๋‹ค.

1. LLaMA ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ ์‚ฌ์ „ํ•™์Šต ๋ฐ์ดํ„ฐ์— ํ•œ๊ตญ์–ด๊ฐ€ ์ œ์™ธ๋˜์–ด Full-Finetuning, LoRA, QLoRA ๋“ฑ ์–ด๋–ค ๋ฐฉ๋ฒ•์œผ๋กœ๋„ ๋งŒ์กฑ์Šค๋Ÿฌ์šด ํ•œ๊ตญ์–ด ์„ฑ๋Šฅ์„ ๋‚ด์ง€ ๋ชปํ–ˆ์Šต๋‹ˆ๋‹ค.
2. ํ†ต์ผ๋œ ํ•™๊ตญ์–ด ํ•™์Šต ํ‰๊ฐ€ ๋ฐฉ๋ฒ•์ด ์—†์–ด ์–ด๋–ค ํ•™์Šต ๋ฐฉ๋ฒ•์ด ๊ฐ€์žฅ ํšจ๊ณผ์ ์ธ์ง€ ํŒ๋‹จํ•˜๊ธฐ ์–ด๋ ค์› ์Šต๋‹ˆ๋‹ค.
3. ๊ฐ ํ”„๋กœ์ ํŠธ๊ฐ€ ๊ฐœ๋ณ„ ์ฃผ์ฒด์— ์˜ํ•ด ์‚ฐ๋ฐœ๋กœ ์ „๊ฐœ๋˜์–ด ์ค‘๋ณต๋œ ์‹œ๋„๊ฐ€ ๋ฐ˜๋ณต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.  

## KoLlama2 ํ”„๋กœ์ ํŠธ ์ œ์•ˆ
KoLlama2๋Š” LLaMA ๋ชจ๋ธ์—์„œ ์–ป์€ ๊ฒฝํ—˜์„ ๋ฐ”ํƒ•์œผ๋กœ ์™ธ๊ตญ์–ด ๊ธฐ๋ฐ˜ LLM์„ ํ•œ๊ตญ์–ด๋กœ ๋ฏธ์„ธ์กฐ์ •ํ•˜๋Š” ๊ฐ€์žฅ ์ข‹์€ ๋ฐฉ๋ฒ•์„ ์ฐพ๋Š” ํ”„๋กœ์ ํŠธ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์•„๋ž˜์™€ ๊ฐ™์€ ์‹œ๋„๋“ค์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

1. QLoRA, LoRA, Full-Finetuning ๋“ฑ ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•๋ก ์„ ์‹œ๋„ํ•˜์—ฌ Llama2์— ํฌํ•จ๋œ 0.01697% ํ•œ๊ตญ์–ด ๋Šฅ๋ ฅ์ด ์–ผ๋งˆ๋‚˜ ํ–ฅ์ƒ๋˜๋Š”์ง€ ํ™•์ธ.
2. Alpaca, Vicuna ๋“ฑ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ์„ธํŠธ๋ฅผ ์ ์šฉํ•˜์—ฌ ์–ด๋–ค ํ˜•ํƒœ ํ…Œ์ดํ„ฐ์„ธํŠธ๊ฐ€ ํ•œ๊ตญ์–ด ๋Šฅ๋ ฅํ–ฅ์ƒ์— ๊ฐ€์žฅ ํšจ๊ณผ์ ์ธ์ง€ ํ™•์ธ.
3. ๊ฐ„๋‹จํ•œ ํ•œ์˜ ๋ฒˆ์—ญ๋ถ€ํ„ฐ ์ ์ฐจ ๋‚œ์ด๋„๋ฅผ ์˜ฌ๋ฆฌ๋Š” Curriculum Learning, ๋Œ€๊ทœ๋ชจ ํ•œ๊ตญ์–ด ๋ง๋ญ‰์น˜๋กœ ์‚ฌ์ „ํ•™์Šต Step์„ ์ถ”๊ฐ€ ํ•™์Šต, Chinese-LLaMA์—์„œ ์‚ฌ์šฉํ•œ ์–ดํœ˜ํ™•์žฅ ๋“ฑ ์ƒˆ๋กœ์šด ๊ธฐ๋ฒ•๋“ค ์‹œ๋„.
4. ๊ฐ ๋ฐฉ๋ฒ•๋ก ์„ ํ‰๊ฐ€ํ•  ํ•ฉ๋ฆฌ์  ํ‰๊ฐ€๋ฒ• ๊ณ ์•ˆ.

## Benchmarks

## ์ฐธ๊ณ  ์ž๋ฃŒ


<br/>

---

# KoLlama2 : Open source language model based on Llama2 optimized for Korean
KoLlama2 (Korean Large Language Model Meta AI 2) is an open-source project to improve the Korean performance of Llama2, an English-based LLM. 


## Problem

From GPT3 to Bert to Llama2, the amazing advances in large-scale language models have captured everyone's attention. However, due to the nature of LLMs pre-training on large corpora, the vast majority of training data is spoken in English, with Korean representing a very small percentage.

- Percentage of Korean in GPT3's pretraining data: 0.01697

<p align="center" style="color:gray">
  <img style="margin:20px 0 10px 0" src="https://github.com/psymon-dev/KoLlama2/assets/91517542/b50b9283-fb54-46b6-bc84-00bd363601c8" alt="image" width=482 />
  <br/>https://github.com/openai/gpt-3/blob/master/dataset_statistics/languages_by_word_count.csv
</p> 

- Percentage of Korean in the Llama2 model's pre-training data: 0.06%.

<p align="center" style="color:gray">
  <img style="margin:20px 0 10px 0" src="https://github.com/psymon-dev/KoLlama2/assets/91517542/79b72fee-3517-4a7e-a0a5-fda4c8f2a7ca" alt="image" width=482 />
  <br/>22p Table 10, Llama 2: Open Foundation and Fine-Tuned Chat Models, Hugo Touvron et al, July 18-2023.
</p> 

This percentage is significantly lower than the percentage of Korean speakers (81.7M) in the world's population (7.888 billion) (1.035%). This is based on a number of factors, including the isolated nature of Korean, an unprepared Korean corpus, and more, but the end result is that Korean speakers are severely limited in experiencing the richness of LLM.

## Problem Statement
### Korean-based LLM Pretrain

One of the best solutions is to create your own language model, pre-trained with Korean data. This is being done by large, well-funded companies.

* Naver HyperCLOVA X : https://clova.ai/hyperclova
* Kakao KoGPT : https://github.com/kakaobrain/kogpt
* EleutherAI polyglot-ko : https://github.com/EleutherAI/polyglot

This approach would most certainly address the LLM's lack of Korean language skills. The problem is that the LLM is changing so fast. It took only five months between the release of the LLaMA model and the release of the Llama2 model. With new technologies being released every week, it's impossible to accurately predict future developments, or to train a large language model to adapt to each new change.

Therefore, we need a lighter and faster method that can be used in parallel with learning our own language models.

### Fine-tuning a English-based LLM
Fine-tuning a foreign language-based LLM into Korean is a good solution to this problem. The following attempts have been made based on the LLaMa model.


* KoAlpaca : https://github.com/Beomi/KoAlpaca
* KULLM : https://github.com/nlpai-lab/KULLM
* KoVicuna : https://github.com/melodysdreamj/KoVicuna
* KORani : https://github.com/krafton-ai/KORani

While these attempts have increased interest in open source LLMs and helped me understand the various ways to fine-tune them, the limitations are clear.

1. For the LLaMA model, Korean was excluded from the pre-training data, so no method, including Full-Finetuning, LoRA, and QLoRA, could produce satisfactory Korean performance.

2. There was no unified method for evaluating Korean language learning, making it difficult to determine which learning method was most effective.

3. Each project was developed sporadically by individual entities, resulting in redundant attempts.  

## KoLlama2 Project Suggested
KoLlama2 is a project to find the best way to fine-tune a English-based LLM into Korean based on the experience gained from the LLaMA model. To achieve this, the following attempts are required.

1. try different methodologies such as QLoRA, LoRA, and Full-Finetuning to see how much the 0.01697% Korean proficiency included in Llama2 improves.

2. Apply various datasets such as Alpaca and Vicuna to see which type of dataset is most effective for improving Korean proficiency.

3. try new techniques such as curriculum learning that gradually increases the difficulty from simple English to Korean translation, additional pre-learning steps with a large Korean corpus, and vocabulary expansion used in Chinese-LLaMA.

4. devising a reasonable evaluation method to assess each methodology.

## Benchmarks

## References

# Llama 2

We are unlocking the power of large language models. Our latest version of Llama is now accessible to individuals, creators, researchers and businesses of all sizes so that they can experiment, innovate and scale their ideas responsibly. 

This release includes model weights and starting code for pretrained and fine-tuned Llama language models โ€” ranging from 7B to 70B parameters.

This repository is intended as a minimal example to load [Llama 2](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/) models and run inference. For more detailed examples leveraging HuggingFace, see [llama-recipes](https://github.com/facebookresearch/llama-recipes/).

## Download

โš ๏ธ **7/18: We're aware of people encountering a number of download issues today. Anyone still encountering issues should remove all local files, re-clone the repository, and [request a new download link](https://ai.meta.com/resources/models-and-libraries/llama-downloads/). It's critical to do all of these in case you have local corrupt files. When you receive the email, copy *only* the link text - it should begin with https://download.llamameta.net and not with https://l.facebook.com, which will give errors.**



In order to download the model weights and tokenizer, please visit the [Meta AI website](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and accept our License.

Once your request is approved, you will receive a signed URL over email. Then run the download.sh script, passing the URL provided when prompted to start the download. Make sure that you copy the URL text itself, **do not use the 'Copy link address' option** when you right click the URL. If the copied URL text starts with: https://download.llamameta.net, you copied it correctly. If the copied URL text starts with: https://l.facebook.com, you copied it the wrong way.

Pre-requisites: make sure you have `wget` and `md5sum` installed. Then to run the script: `./download.sh`.

Keep in mind that the links expire after 24 hours and a certain amount of downloads. If you start seeing errors such as `403: Forbidden`, you can always re-request a link.

### Access on Hugging Face

We are also providing downloads on [Hugging Face](https://huggingface.co/meta-llama). You must first request a download from the Meta AI website using the same email address as your Hugging Face account. After doing so, you can request access to any of the models on Hugging Face and within 1-2 days your account will be granted access to all versions.

## Setup

In a conda env with PyTorch / CUDA available, clone the repo and run in the top-level directory:

```
pip install -e .
```

## Inference

Different models require different model-parallel (MP) values:

|  Model | MP |
|--------|----|
| 7B     | 1  |
| 13B    | 2  |
| 70B    | 8  |

All models support sequence length up to 4096 tokens, but we pre-allocate the cache according to `max_seq_len` and `max_batch_size` values. So set those according to your hardware.

### Pretrained Models

These models are not finetuned for chat or Q&A. They should be prompted so that the expected answer is the natural continuation of the prompt.

See `example_text_completion.py` for some examples. To illustrate, see command below to run it with the llama-2-7b model (`nproc_per_node` needs to be set to the `MP` value):

```
torchrun --nproc_per_node 1 example_text_completion.py \
    --ckpt_dir llama-2-7b/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 128 --max_batch_size 4
```

### Fine-tuned Chat Models

The fine-tuned models were trained for dialogue applications. To get the expected features and performance for them, a specific formatting defined in [`chat_completion`](https://github.com/facebookresearch/llama/blob/main/llama/generation.py#L212)
needs to be followed, including the `INST` and `<<SYS>>` tags, `BOS` and `EOS` tokens, and the whitespaces and breaklines in between (we recommend calling `strip()` on inputs to avoid double-spaces).

You can also deploy additional classifiers for filtering out inputs and outputs that are deemed unsafe. See the llama-recipes repo for [an example](https://github.com/facebookresearch/llama-recipes/blob/main/inference/inference.py) of how to add a safety checker to the inputs and outputs of your inference code.

Examples using llama-2-7b-chat:

```
torchrun --nproc_per_node 1 example_chat_completion.py \
    --ckpt_dir llama-2-7b-chat/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 512 --max_batch_size 4
```

Llama 2 is a new technology that carries potential risks with use. Testing conducted to date has not โ€” and could not โ€” cover all scenarios.
In order to help developers address these risks, we have created the [Responsible Use Guide](Responsible-Use-Guide.pdf). More details can be found in our research paper as well.

## Issues

Please report any software โ€œbug,โ€ or other problems with the models through one of the following means:
- Reporting issues with the model: [github.com/facebookresearch/llama](http://github.com/facebookresearch/llama)
- Reporting risky content generated by the model: [developers.facebook.com/llama_output_feedback](http://developers.facebook.com/llama_output_feedback)
- Reporting bugs and security concerns: [facebook.com/whitehat/info](http://facebook.com/whitehat/info)

## Model Card
See [MODEL_CARD.md](MODEL_CARD.md).

## License

Our model and weights are licensed for both researchers and commercial entities, upholding the principles of openness. Our mission is to empower individuals, and industry through this opportunity, while fostering an environment of discovery and ethical AI advancements. 

See the [LICENSE](LICENSE) file, as well as our accompanying [Acceptable Use Policy](USE_POLICY.md)

## References

1. [Research Paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/)
2. [Llama 2 technical overview](https://ai.meta.com/resources/models-and-libraries/llama)
3. [Open Innovation AI Research Community](https://ai.meta.com/llama/open-innovation-ai-research-community/)

## Original LLaMA
The repo for the original llama release is in the [`llama_v1`](https://github.com/facebookresearch/llama/tree/llama_v1) branch.