RichardErkhov commited on
Commit
a7808b2
β€’
1 Parent(s): 1e447b4

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +379 -0
README.md ADDED
@@ -0,0 +1,379 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ llama-3-Korean-Bllossom-8B - GGUF
11
+ - Model creator: https://huggingface.co/MLP-KTLim/
12
+ - Original model: https://huggingface.co/MLP-KTLim/llama-3-Korean-Bllossom-8B/
13
+
14
+
15
+ | Name | Quant method | Size |
16
+ | ---- | ---- | ---- |
17
+ | [llama-3-Korean-Bllossom-8B.Q2_K.gguf](https://huggingface.co/RichardErkhov/MLP-KTLim_-_llama-3-Korean-Bllossom-8B-gguf/blob/main/llama-3-Korean-Bllossom-8B.Q2_K.gguf) | Q2_K | 2.96GB |
18
+ | [llama-3-Korean-Bllossom-8B.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/MLP-KTLim_-_llama-3-Korean-Bllossom-8B-gguf/blob/main/llama-3-Korean-Bllossom-8B.IQ3_XS.gguf) | IQ3_XS | 3.28GB |
19
+ | [llama-3-Korean-Bllossom-8B.IQ3_S.gguf](https://huggingface.co/RichardErkhov/MLP-KTLim_-_llama-3-Korean-Bllossom-8B-gguf/blob/main/llama-3-Korean-Bllossom-8B.IQ3_S.gguf) | IQ3_S | 3.43GB |
20
+ | [llama-3-Korean-Bllossom-8B.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/MLP-KTLim_-_llama-3-Korean-Bllossom-8B-gguf/blob/main/llama-3-Korean-Bllossom-8B.Q3_K_S.gguf) | Q3_K_S | 3.41GB |
21
+ | [llama-3-Korean-Bllossom-8B.IQ3_M.gguf](https://huggingface.co/RichardErkhov/MLP-KTLim_-_llama-3-Korean-Bllossom-8B-gguf/blob/main/llama-3-Korean-Bllossom-8B.IQ3_M.gguf) | IQ3_M | 3.52GB |
22
+ | [llama-3-Korean-Bllossom-8B.Q3_K.gguf](https://huggingface.co/RichardErkhov/MLP-KTLim_-_llama-3-Korean-Bllossom-8B-gguf/blob/main/llama-3-Korean-Bllossom-8B.Q3_K.gguf) | Q3_K | 3.74GB |
23
+ | [llama-3-Korean-Bllossom-8B.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/MLP-KTLim_-_llama-3-Korean-Bllossom-8B-gguf/blob/main/llama-3-Korean-Bllossom-8B.Q3_K_M.gguf) | Q3_K_M | 3.74GB |
24
+ | [llama-3-Korean-Bllossom-8B.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/MLP-KTLim_-_llama-3-Korean-Bllossom-8B-gguf/blob/main/llama-3-Korean-Bllossom-8B.Q3_K_L.gguf) | Q3_K_L | 4.03GB |
25
+ | [llama-3-Korean-Bllossom-8B.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/MLP-KTLim_-_llama-3-Korean-Bllossom-8B-gguf/blob/main/llama-3-Korean-Bllossom-8B.IQ4_XS.gguf) | IQ4_XS | 4.18GB |
26
+ | [llama-3-Korean-Bllossom-8B.Q4_0.gguf](https://huggingface.co/RichardErkhov/MLP-KTLim_-_llama-3-Korean-Bllossom-8B-gguf/blob/main/llama-3-Korean-Bllossom-8B.Q4_0.gguf) | Q4_0 | 4.34GB |
27
+ | [llama-3-Korean-Bllossom-8B.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/MLP-KTLim_-_llama-3-Korean-Bllossom-8B-gguf/blob/main/llama-3-Korean-Bllossom-8B.IQ4_NL.gguf) | IQ4_NL | 4.38GB |
28
+ | [llama-3-Korean-Bllossom-8B.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/MLP-KTLim_-_llama-3-Korean-Bllossom-8B-gguf/blob/main/llama-3-Korean-Bllossom-8B.Q4_K_S.gguf) | Q4_K_S | 4.37GB |
29
+ | [llama-3-Korean-Bllossom-8B.Q4_K.gguf](https://huggingface.co/RichardErkhov/MLP-KTLim_-_llama-3-Korean-Bllossom-8B-gguf/blob/main/llama-3-Korean-Bllossom-8B.Q4_K.gguf) | Q4_K | 4.58GB |
30
+ | [llama-3-Korean-Bllossom-8B.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/MLP-KTLim_-_llama-3-Korean-Bllossom-8B-gguf/blob/main/llama-3-Korean-Bllossom-8B.Q4_K_M.gguf) | Q4_K_M | 4.58GB |
31
+ | [llama-3-Korean-Bllossom-8B.Q4_1.gguf](https://huggingface.co/RichardErkhov/MLP-KTLim_-_llama-3-Korean-Bllossom-8B-gguf/blob/main/llama-3-Korean-Bllossom-8B.Q4_1.gguf) | Q4_1 | 4.78GB |
32
+ | [llama-3-Korean-Bllossom-8B.Q5_0.gguf](https://huggingface.co/RichardErkhov/MLP-KTLim_-_llama-3-Korean-Bllossom-8B-gguf/blob/main/llama-3-Korean-Bllossom-8B.Q5_0.gguf) | Q5_0 | 5.21GB |
33
+ | [llama-3-Korean-Bllossom-8B.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/MLP-KTLim_-_llama-3-Korean-Bllossom-8B-gguf/blob/main/llama-3-Korean-Bllossom-8B.Q5_K_S.gguf) | Q5_K_S | 5.21GB |
34
+ | [llama-3-Korean-Bllossom-8B.Q5_K.gguf](https://huggingface.co/RichardErkhov/MLP-KTLim_-_llama-3-Korean-Bllossom-8B-gguf/blob/main/llama-3-Korean-Bllossom-8B.Q5_K.gguf) | Q5_K | 5.34GB |
35
+ | [llama-3-Korean-Bllossom-8B.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/MLP-KTLim_-_llama-3-Korean-Bllossom-8B-gguf/blob/main/llama-3-Korean-Bllossom-8B.Q5_K_M.gguf) | Q5_K_M | 5.34GB |
36
+ | [llama-3-Korean-Bllossom-8B.Q5_1.gguf](https://huggingface.co/RichardErkhov/MLP-KTLim_-_llama-3-Korean-Bllossom-8B-gguf/blob/main/llama-3-Korean-Bllossom-8B.Q5_1.gguf) | Q5_1 | 5.65GB |
37
+ | [llama-3-Korean-Bllossom-8B.Q6_K.gguf](https://huggingface.co/RichardErkhov/MLP-KTLim_-_llama-3-Korean-Bllossom-8B-gguf/blob/main/llama-3-Korean-Bllossom-8B.Q6_K.gguf) | Q6_K | 6.14GB |
38
+ | [llama-3-Korean-Bllossom-8B.Q8_0.gguf](https://huggingface.co/RichardErkhov/MLP-KTLim_-_llama-3-Korean-Bllossom-8B-gguf/blob/main/llama-3-Korean-Bllossom-8B.Q8_0.gguf) | Q8_0 | 7.95GB |
39
+
40
+
41
+
42
+
43
+ Original model description:
44
+ ---
45
+ base_model:
46
+ - meta-llama/Meta-Llama-3-8B
47
+ language:
48
+ - en
49
+ - ko
50
+ library_name: transformers
51
+ license: llama3
52
+ ---
53
+
54
+ <a href="https://github.com/MLP-Lab/Bllossom">
55
+ <img src="https://github.com/teddysum/bllossom/blob/main//bllossom_icon.png?raw=true" width="40%" height="50%">
56
+ </a>
57
+
58
+
59
+
60
+ # Update!
61
+ * ~~[2024.08.09] Llama3.1 버전을 κΈ°λ°˜μœΌλ‘œν•œ Bllossom-8B둜 λͺ¨λΈμ„ μ—…λ°μ΄νŠΈ ν–ˆμŠ΅λ‹ˆλ‹€. κΈ°μ‘΄ llama3기반 Bllossom 보닀 평균 5%정도 μ„±λŠ₯ ν–₯상이 μžˆμ—ˆμŠ΅λ‹ˆλ‹€.~~(μˆ˜μ •μ€‘μ— μžˆμŠ΅λ‹ˆλ‹€.)
62
+ * [2024.06.18] μ‚¬μ „ν•™μŠ΅λŸ‰μ„ **250GB**κΉŒμ§€ 늘린 Bllossom ELOλͺ¨λΈλ‘œ μ—…λ°μ΄νŠΈ λ˜μ—ˆμŠ΅λ‹ˆλ‹€. λ‹€λ§Œ 단어확μž₯은 ν•˜μ§€ μ•Šμ•˜μŠ΅λ‹ˆλ‹€. κΈ°μ‘΄ 단어확μž₯된 long-context λͺ¨λΈμ„ ν™œμš©ν•˜κ³  μ‹ΆμœΌμ‹ λΆ„μ€ κ°œμΈμ—°λ½μ£Όμ„Έμš”!
63
+ * [2024.06.18] Bllossom ELO λͺ¨λΈμ€ 자체 κ°œλ°œν•œ ELOμ‚¬μ „ν•™μŠ΅ 기반으둜 μƒˆλ‘œμš΄ ν•™μŠ΅λœ λͺ¨λΈμž…λ‹ˆλ‹€. [LogicKor](https://github.com/StableFluffy/LogicKor) 벀치마크 κ²°κ³Ό ν˜„μ‘΄ν•˜λŠ” ν•œκ΅­μ–΄ 10Bμ΄ν•˜ λͺ¨λΈμ€‘ SOTA점수λ₯Ό λ°›μ•˜μŠ΅λ‹ˆλ‹€.
64
+
65
+ LogicKor μ„±λŠ₯ν‘œ :
66
+ | Model | Math | Reasoning | Writing | Coding | Understanding | Grammar | Single ALL | Multi ALL | Overall |
67
+ |:---------:|:-----:|:------:|:-----:|:-----:|:----:|:-----:|:-----:|:-----:|:----:|
68
+ | gpt-3.5-turbo-0125 | 7.14 | 7.71 | 8.28 | 5.85 | 9.71 | 6.28 | 7.50 | 7.95 | 7.72 |
69
+ | gemini-1.5-pro-preview-0215 | 8.00 | 7.85 | 8.14 | 7.71 | 8.42 | 7.28 | 7.90 | 6.26 | 7.08 |
70
+ | llama-3-Korean-Bllossom-8B | 5.43 | 8.29 | 9.0 | 4.43 | 7.57 | 6.86 | 6.93 | 6.93 | 6.93 |
71
+
72
+
73
+
74
+ # Bllossom | [Demo]() | [Homepage](https://www.bllossom.ai/) | [Github](https://github.com/MLP-Lab/Bllossom) |
75
+
76
+ <!-- [GPU용 Colab μ½”λ“œμ˜ˆμ œ](https://colab.research.google.com/drive/1fBOzUVZ6NRKk_ugeoTbAOokWKqSN47IG?usp=sharing) | -->
77
+ <!-- [CPU용 Colab μ–‘μžν™”λͺ¨λΈ μ½”λ“œμ˜ˆμ œ](https://colab.research.google.com/drive/129ZNVg5R2NPghUEFHKF0BRdxsZxinQcJ?usp=drive_link) -->
78
+
79
+ ```bash
80
+ 저희 BllossomνŒ€ μ—μ„œ ν•œκ΅­μ–΄-μ˜μ–΄ 이쀑 μ–Έμ–΄λͺ¨λΈμΈ Bllossom을 κ³΅κ°œν–ˆμŠ΅λ‹ˆλ‹€!
81
+ μ„œμšΈκ³ΌκΈ°λŒ€ μŠˆνΌμ»΄ν“¨νŒ… μ„Όν„°μ˜ μ§€μ›μœΌλ‘œ 100GBκ°€λ„˜λŠ” ν•œκ΅­μ–΄λ‘œ λͺ¨λΈμ „체λ₯Ό ν’€νŠœλ‹ν•œ ν•œκ΅­μ–΄ κ°•ν™” 이쀑언어 λͺ¨λΈμž…λ‹ˆλ‹€!
82
+ ν•œκ΅­μ–΄ μž˜ν•˜λŠ” λͺ¨λΈ μ°Ύκ³  μžˆμ§€ μ•ŠμœΌμ…¨λ‚˜μš”?
83
+ - ν•œκ΅­μ–΄ 졜초! 무렀 3λ§Œκ°œκ°€ λ„˜λŠ” ν•œκ΅­μ–΄ μ–΄νœ˜ν™•μž₯
84
+ - Llama3λŒ€λΉ„ λŒ€λž΅ 25% 더 κΈ΄ 길이의 ν•œκ΅­μ–΄ Context μ²˜λ¦¬κ°€λŠ₯
85
+ - ν•œκ΅­μ–΄-μ˜μ–΄ Pararell Corpusλ₯Ό ν™œμš©ν•œ ν•œκ΅­μ–΄-μ˜μ–΄ 지식연결 (μ‚¬μ „ν•™μŠ΅)
86
+ - ν•œκ΅­μ–΄ λ¬Έν™”, μ–Έμ–΄λ₯Ό κ³ λ €ν•΄ μ–Έμ–΄ν•™μžκ°€ μ œμž‘ν•œ 데이터λ₯Ό ν™œμš©ν•œ λ―Έμ„Έμ‘°μ •
87
+ - κ°•ν™”ν•™μŠ΅
88
+ 이 λͺ¨λ“ κ²Œ ν•œκΊΌλ²ˆμ— 적용되고 상업적 이용이 κ°€λŠ₯ν•œ Bllossom을 μ΄μš©ν•΄ μ—¬λŸ¬λΆ„ 만의 λͺ¨λΈμ„ λ§Œλ“€μ–΄λ³΄μ„Έμš₯!
89
+ 무렀 Colab 무료 GPU둜 ν•™μŠ΅μ΄ κ°€λŠ₯ν•©λ‹ˆλ‹€. ν˜Ήμ€ μ–‘μžν™” λͺ¨λΈλ‘œ CPUμ—μ˜¬λ €λ³΄μ„Έμš” [μ–‘μžν™”λͺ¨λΈ](https://huggingface.co/MLP-KTLim/llama-3-Korean-Bllossom-8B-4bit)
90
+
91
+ 1. Bllossom-8BλŠ” μ„œμšΈκ³ΌκΈ°λŒ€, ν…Œλ””μΈ, μ—°μ„ΈλŒ€ μ–Έμ–΄μžμ› μ—°κ΅¬μ‹€μ˜ μ–Έμ–΄ν•™μžμ™€ ν˜‘μ—…ν•΄ λ§Œλ“  μ‹€μš©μ£Όμ˜κΈ°λ°˜ μ–Έμ–΄λͺ¨λΈμž…λ‹ˆλ‹€! μ•žμœΌλ‘œ 지속적인 μ—…λ°μ΄νŠΈλ₯Ό 톡해 κ΄€λ¦¬ν•˜κ² μŠ΅λ‹ˆλ‹€ 많이 ν™œμš©ν•΄μ£Όμ„Έμš” πŸ™‚
92
+ 2. 초 κ°•λ ₯ν•œ Advanced-Bllossom 8B, 70Bλͺ¨λΈ, μ‹œκ°-μ–Έμ–΄λͺ¨λΈμ„ λ³΄μœ ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€! (κΆκΈˆν•˜μ‹ λΆ„μ€ κ°œλ³„ μ—°λ½μ£Όμ„Έμš”!!)
93
+ 3. Bllossom은 NAACL2024, LREC-COLING2024 (ꡬ두) λ°œν‘œλ‘œ μ±„νƒλ˜μ—ˆμŠ΅λ‹ˆλ‹€.
94
+ 4. 쒋은 μ–Έμ–΄λͺ¨λΈ 계속 μ—…λ°μ΄νŠΈ ν•˜κ² μŠ΅λ‹ˆλ‹€!! ν•œκ΅­μ–΄ κ°•ν™”λ₯Όμœ„ν•΄ 곡동 μ—°κ΅¬ν•˜μ‹€λΆ„(νŠΉνžˆλ…Όλ¬Έ) μ–Έμ œλ“  ν™˜μ˜ν•©λ‹ˆλ‹€!!
95
+ 특히 μ†ŒλŸ‰μ˜ GPU라도 λŒ€μ—¬ κ°€λŠ₯ν•œνŒ€μ€ μ–Έμ œλ“  μ—°λ½μ£Όμ„Έμš”! λ§Œλ“€κ³  싢은거 λ„μ™€λ“œλ €μš”.
96
+ ```
97
+
98
+ The Bllossom language model is a Korean-English bilingual language model based on the open-source LLama3. It enhances the connection of knowledge between Korean and English. It has the following features:
99
+
100
+ * **Knowledge Linking**: Linking Korean and English knowledge through additional training
101
+ * **Vocabulary Expansion**: Expansion of Korean vocabulary to enhance Korean expressiveness.
102
+ * **Instruction Tuning**: Tuning using custom-made instruction following data specialized for Korean language and Korean culture
103
+ * **Human Feedback**: DPO has been applied
104
+ * **Vision-Language Alignment**: Aligning the vision transformer with this language model
105
+
106
+ **This model developed by [MLPLab at Seoultech](http://mlp.seoultech.ac.kr), [Teddysum](http://teddysum.ai/) and [Yonsei Univ](https://sites.google.com/view/hansaemkim/hansaem-kim)**
107
+
108
+ ## Demo Video
109
+
110
+ <div style="display: flex; justify-content: space-between;">
111
+ <!-- 첫 번째 컬럼 -->
112
+ <div style="width: 49%;">
113
+ <a>
114
+ <img src="https://github.com/lhsstn/lhsstn/blob/main/x-llava_dem.gif?raw=true" style="width: 100%; height: auto;">
115
+ </a>
116
+ <p style="text-align: center;">Bllossom-V Demo</p>
117
+ </div>
118
+
119
+ <!-- 두 번째 컬럼 (ν•„μš”ν•˜λ‹€λ©΄) -->
120
+ <div style="width: 49%;">
121
+ <a>
122
+ <img src="https://github.com/lhsstn/lhsstn/blob/main/bllossom_demo_kakao.gif?raw=true" style="width: 70%; height: auto;">
123
+ </a>
124
+ <p style="text-align: center;">Bllossom Demo(Kakao)γ…€γ…€γ…€γ…€γ…€γ…€γ…€γ…€</p>
125
+ </div>
126
+ </div>
127
+
128
+
129
+
130
+ # NEWS
131
+ * [2024.06.18] We have reverted to the non-vocab-expansion model. However, we have significantly increased the amount of pre-training data to 250GB.
132
+ * [2024.05.08] Vocab Expansion Model Update
133
+ * [2024.04.25] We released Bllossom v2.0, based on llama-3
134
+
135
+ ## Example code
136
+
137
+ ### Colab Tutorial
138
+ - [Inference-Code-Link](https://colab.research.google.com/drive/1fBOzUVZ6NRKk_ugeoTbAOokWKqSN47IG?usp=sharing)
139
+
140
+ ### Install Dependencies
141
+ ```bash
142
+ pip install torch transformers==4.40.0 accelerate
143
+ ```
144
+
145
+ ### Python code with Pipeline
146
+ ```python
147
+ import transformers
148
+ import torch
149
+
150
+ model_id = "MLP-KTLim/llama-3-Korean-Bllossom-8B"
151
+
152
+ pipeline = transformers.pipeline(
153
+ "text-generation",
154
+ model=model_id,
155
+ model_kwargs={"torch_dtype": torch.bfloat16},
156
+ device_map="auto",
157
+ )
158
+
159
+ pipeline.model.eval()
160
+
161
+ PROMPT = '''You are a helpful AI assistant. Please answer the user's questions kindly. 당신은 유λŠ₯ν•œ AI μ–΄μ‹œμŠ€ν„΄νŠΈ μž…λ‹ˆλ‹€. μ‚¬μš©μžμ˜ μ§ˆλ¬Έμ— λŒ€ν•΄ μΉœμ ˆν•˜κ²Œ λ‹΅λ³€ν•΄μ£Όμ„Έμš”.'''
162
+ instruction = "μ„œμšΈμ˜ 유λͺ…ν•œ κ΄€κ΄‘ μ½”μŠ€λ₯Ό λ§Œλ“€μ–΄μ€„λž˜?"
163
+
164
+ messages = [
165
+ {"role": "system", "content": f"{PROMPT}"},
166
+ {"role": "user", "content": f"{instruction}"}
167
+ ]
168
+
169
+ prompt = pipeline.tokenizer.apply_chat_template(
170
+ messages,
171
+ tokenize=False,
172
+ add_generation_prompt=True
173
+ )
174
+
175
+ terminators = [
176
+ pipeline.tokenizer.eos_token_id,
177
+ pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
178
+ ]
179
+
180
+ outputs = pipeline(
181
+ prompt,
182
+ max_new_tokens=2048,
183
+ eos_token_id=terminators,
184
+ do_sample=True,
185
+ temperature=0.6,
186
+ top_p=0.9
187
+ )
188
+
189
+ print(outputs[0]["generated_text"][len(prompt):])
190
+ ```
191
+ ```
192
+ # 물둠이죠! μ„œμšΈμ€ λ‹€μ–‘ν•œ 문화와 역사, μžμ—°μ„ κ²ΈλΉ„ν•œ λ„μ‹œλ‘œ, λ§Žμ€ κ΄€κ΄‘ λͺ…μ†Œλ₯Ό μžλž‘ν•©λ‹ˆλ‹€. μ—¬κΈ° μ„œμšΈμ˜ 유λͺ…ν•œ κ΄€κ΄‘ μ½”μŠ€λ₯Ό μ†Œκ°œν•΄ λ“œλ¦΄κ²Œμš”.
193
+
194
+ ### μ½”μŠ€ 1: 역사와 λ¬Έν™” 탐방
195
+
196
+ 1. **경볡ꢁ**
197
+ - μ„œμšΈμ˜ λŒ€ν‘œμ μΈ ꢁꢐ둜, μ‘°μ„  μ™•μ‘°μ˜ 역사와 λ¬Έν™”λ₯Ό μ²΄ν—˜ν•  수 μžˆλŠ” κ³³μž…λ‹ˆλ‹€.
198
+
199
+ 2. **뢁촌 ν•œμ˜₯λ§ˆμ„**
200
+ - 전톡 ν•œμ˜₯이 잘 보쑴된 λ§ˆμ„λ‘œ, μ‘°μ„ μ‹œλŒ€μ˜ μƒν™œμƒμ„ λŠλ‚„ 수 μžˆμŠ΅λ‹ˆλ‹€.
201
+
202
+ 3. **인사동**
203
+ - 전톡 문화와 ν˜„λŒ€ 예술이 κ³΅μ‘΄ν•˜λŠ” 거리둜, λ‹€μ–‘ν•œ κ°€λŸ¬λ¦¬μ™€ 전톡 μŒμ‹μ μ΄ μžˆμŠ΅λ‹ˆλ‹€.
204
+
205
+ 4. **μ²­κ³„μ²œ**
206
+ - μ„œμšΈμ˜ 쀑심에 μœ„μΉ˜ν•œ 천문으둜, μ‘°κΉ…κ³Ό 산책을 즐길 수 μžˆλŠ” κ³³μž…λ‹ˆλ‹€.
207
+
208
+ ### μ½”μŠ€ 2: μžμ—°κ³Ό μ‡Όν•‘
209
+
210
+ 1. **남산 μ„œμšΈνƒ€μ›Œ**
211
+ - μ„œμšΈμ˜ 전경을 ν•œλˆˆμ— λ³Ό 수 μžˆλŠ” 곳으둜, 특히 저녁 μ‹œκ°„λŒ€μ— 일λͺ°μ„ κ°μƒν•˜λŠ” 것이 μ’‹μŠ΅λ‹ˆλ‹€.
212
+
213
+ 2. **λͺ…동**
214
+ - μ‡Όν•‘κ³Ό μŒμ‹μ μ΄ μ¦λΉ„ν•œ μ§€μ—­μœΌλ‘œ, λ‹€μ–‘ν•œ λΈŒλžœλ“œμ™€ 전톡 μŒμ‹μ„ 맛볼 수 μžˆμŠ΅λ‹ˆλ‹€.
215
+
216
+ 3. **ν•œκ°•κ³΅μ›**
217
+ - μ„œμšΈμ˜ μ£Όμš” 곡원 쀑 ν•˜λ‚˜λ‘œ, μ‘°κΉ…, μžμ „κ±° 타기, λ°°λ‚­ 여행을 즐길 수 μžˆμŠ΅λ‹ˆλ‹€.
218
+
219
+ 4. **ν™λŒ€**
220
+ - μ Šμ€μ΄λ“€μ΄ 즐겨 μ°ΎλŠ” μ§€μ—­μœΌλ‘œ, λ‹€μ–‘ν•œ 카페, λ ˆμŠ€ν† λž‘, 클럽이 μžˆμŠ΅λ‹ˆλ‹€.
221
+
222
+ ### μ½”μŠ€ 3: ν˜„λŒ€μ™€ μ „ν†΅μ˜ μ‘°ν™”
223
+
224
+ 1. **λ™λŒ€λ¬Έ λ””μžμΈ ν”ŒλΌμž (DDP)**
225
+ - ν˜„λŒ€μ μΈ κ±΄μΆ•λ¬Όλ‘œ, λ‹€μ–‘ν•œ μ „μ‹œμ™€ μ΄λ²€νŠΈκ°€ μ—΄λ¦¬λŠ” κ³³μž…λ‹ˆλ‹€.
226
+
227
+ 2. **μ΄νƒœμ›**
228
+ - λ‹€μ–‘ν•œ ꡭ제 μŒμ‹κ³Ό μΉ΄νŽ˜κ°€ μžˆλŠ” μ§€μ—­μœΌλ‘œ, λ‹€μ–‘ν•œ λ¬Έν™”λ₯Ό κ²½ν—˜ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
229
+
230
+ 3. **κ΄‘ν™”λ¬Έ**
231
+ - μ„œμšΈμ˜ 쀑심에 μœ„μΉ˜ν•œ κ΄‘μž₯으둜, λ‹€μ–‘ν•œ 곡연과 행사가 μ—΄λ¦½λ‹ˆλ‹€.
232
+
233
+ 4. **μ„œμšΈλžœλ“œ**
234
+ - μ„œμšΈ 외곽에 μœ„μΉ˜ν•œ ν…Œλ§ˆνŒŒν¬λ‘œ, κ°€μ‘±λ‹¨μœ„ κ΄€κ΄‘κ°λ“€μ—κ²Œ 인기 μžˆλŠ” κ³³μž…λ‹ˆλ‹€.
235
+
236
+ 이 μ½”μŠ€λ“€μ€ μ„œμšΈμ˜ λ‹€μ–‘ν•œ λ©΄λͺ¨λ₯Ό κ²½ν—˜ν•  수 μžˆλ„λ‘ κ΅¬μ„±λ˜μ–΄ μžˆμŠ΅λ‹ˆλ‹€. 각 μ½”μŠ€λ§ˆλ‹€ μ‹œκ°„μ„ μ‘°μ ˆν•˜κ³ , 개인의 관심사에 맞게 μ„ νƒν•˜μ—¬ λ°©λ¬Έν•˜λ©΄ 쒋을 것 κ°™μŠ΅λ‹ˆλ‹€. 즐거운 μ—¬ν–‰ λ˜μ„Έμš”!
237
+ ```
238
+
239
+ ### Python code with AutoModel
240
+ ```python
241
+
242
+ import os
243
+ import torch
244
+ from transformers import AutoTokenizer, AutoModelForCausalLM
245
+
246
+ model_id = 'MLP-KTLim/llama-3-Korean-Bllossom-8B'
247
+
248
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
249
+ model = AutoModelForCausalLM.from_pretrained(
250
+ model_id,
251
+ torch_dtype=torch.bfloat16,
252
+ device_map="auto",
253
+ )
254
+
255
+ model.eval()
256
+
257
+ PROMPT = '''You are a helpful AI assistant. Please answer the user's questions kindly. 당신은 유λŠ₯ν•œ AI μ–΄μ‹œμŠ€ν„΄νŠΈ μž…λ‹ˆλ‹€. μ‚¬μš©μžμ˜ μ§ˆλ¬Έμ— λŒ€ν•΄ μΉœμ ˆν•˜κ²Œ λ‹΅λ³€ν•΄μ£Όμ„Έμš”.'''
258
+ instruction = "μ„œμšΈμ˜ 유λͺ…ν•œ κ΄€κ΄‘ μ½”μŠ€λ₯Ό λ§Œλ“€μ–΄μ€„λž˜?"
259
+
260
+ messages = [
261
+ {"role": "system", "content": f"{PROMPT}"},
262
+ {"role": "user", "content": f"{instruction}"}
263
+ ]
264
+
265
+ input_ids = tokenizer.apply_chat_template(
266
+ messages,
267
+ add_generation_prompt=True,
268
+ return_tensors="pt"
269
+ ).to(model.device)
270
+
271
+ terminators = [
272
+ tokenizer.eos_token_id,
273
+ tokenizer.convert_tokens_to_ids("<|eot_id|>")
274
+ ]
275
+
276
+ outputs = model.generate(
277
+ input_ids,
278
+ max_new_tokens=2048,
279
+ eos_token_id=terminators,
280
+ do_sample=True,
281
+ temperature=0.6,
282
+ top_p=0.9
283
+ )
284
+
285
+ print(tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True))
286
+ ```
287
+ ```
288
+ # 물둠이죠! μ„œμšΈμ€ λ‹€μ–‘ν•œ 문화와 역사, μžμ—°μ„ κ²ΈλΉ„ν•œ λ„μ‹œλ‘œ, λ§Žμ€ κ΄€κ΄‘ λͺ…μ†Œλ₯Ό μžλž‘ν•©λ‹ˆλ‹€. μ—¬κΈ° μ„œμšΈμ˜ 유λͺ…ν•œ κ΄€κ΄‘ μ½”μŠ€λ₯Ό μ†Œκ°œν•΄ λ“œλ¦΄κ²Œμš”.
289
+
290
+ ### μ½”μŠ€ 1: 역사와 λ¬Έν™” 탐방
291
+
292
+ 1. **경볡ꢁ**
293
+ - μ„œμšΈμ˜ λŒ€ν‘œμ μΈ ꢁꢐ둜, μ‘°μ„  μ™•μ‘°μ˜ 역사와 λ¬Έν™”λ₯Ό μ²΄ν—˜ν•  수 μžˆλŠ” κ³³μž…λ‹ˆλ‹€.
294
+
295
+ 2. **뢁촌 ν•œμ˜₯λ§ˆμ„**
296
+ - 전톡 ν•œμ˜₯이 잘 보쑴된 λ§ˆμ„λ‘œ, μ‘°μ„ μ‹œλŒ€μ˜ μƒν™œμƒμ„ λŠλ‚„ 수 μžˆμŠ΅λ‹ˆλ‹€.
297
+
298
+ 3. **인사동**
299
+ - 전톡 문화와 ν˜„λŒ€ 예술이 κ³΅μ‘΄ν•˜λŠ” 거리둜, λ‹€μ–‘ν•œ κ°€λŸ¬λ¦¬μ™€ 전톡 μŒμ‹μ μ΄ μžˆμŠ΅λ‹ˆλ‹€.
300
+
301
+ 4. **μ²­κ³„μ²œ**
302
+ - μ„œμšΈμ˜ 쀑심에 μœ„μΉ˜ν•œ 천문으둜, μ‘°κΉ…κ³Ό 산책을 즐길 수 μžˆλŠ” κ³³μž…λ‹ˆλ‹€.
303
+
304
+ ### μ½”μŠ€ 2: μžμ—°κ³Ό μ‡Όν•‘
305
+
306
+ 1. **남산 μ„œμšΈνƒ€μ›Œ**
307
+ - μ„œμšΈμ˜ 전경을 ν•œλˆˆμ— λ³Ό 수 μžˆλŠ” 곳으둜, 특히 저녁 μ‹œκ°„λŒ€μ— 일λͺ°μ„ κ°μƒν•˜λŠ” 것이 μ’‹μŠ΅λ‹ˆλ‹€.
308
+
309
+ 2. **λͺ…동**
310
+ - μ‡Όν•‘κ³Ό μŒμ‹μ μ΄ μ¦λΉ„ν•œ μ§€μ—­μœΌλ‘œ, λ‹€μ–‘ν•œ λΈŒλžœλ“œμ™€ 전톡 μŒμ‹μ„ 맛볼 수 μžˆμŠ΅λ‹ˆλ‹€.
311
+
312
+ 3. **ν•œκ°•κ³΅μ›**
313
+ - μ„œμšΈμ˜ μ£Όμš” 곡원 쀑 ν•˜λ‚˜λ‘œ, μ‘°κΉ…, μžμ „κ±° 타기, λ°°λ‚­ 여행을 즐길 수 μžˆμŠ΅λ‹ˆλ‹€.
314
+
315
+ 4. **ν™λŒ€**
316
+ - μ Šμ€μ΄λ“€μ΄ 즐겨 μ°ΎλŠ” μ§€μ—­μœΌλ‘œ, λ‹€μ–‘ν•œ 카페, λ ˆμŠ€ν† λž‘, 클럽이 μžˆμŠ΅λ‹ˆλ‹€.
317
+
318
+ ### μ½”μŠ€ 3: ν˜„λŒ€μ™€ μ „ν†΅μ˜ μ‘°ν™”
319
+
320
+ 1. **λ™λŒ€λ¬Έ λ””μžμΈ ν”ŒλΌμž (DDP)**
321
+ - ν˜„λŒ€μ μΈ κ±΄μΆ•λ¬Όλ‘œ, λ‹€μ–‘ν•œ μ „μ‹œμ™€ μ΄λ²€νŠΈκ°€ μ—΄λ¦¬λŠ” κ³³μž…λ‹ˆλ‹€.
322
+
323
+ 2. **μ΄νƒœμ›**
324
+ - λ‹€μ–‘ν•œ ꡭ제 μŒμ‹κ³Ό μΉ΄νŽ˜κ°€ μžˆλŠ” μ§€μ—­μœΌλ‘œ, λ‹€μ–‘ν•œ λ¬Έν™”λ₯Ό κ²½ν—˜ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
325
+
326
+ 3. **κ΄‘ν™”λ¬Έ**
327
+ - μ„œμšΈμ˜ 쀑심에 μœ„μΉ˜ν•œ κ΄‘μž₯으둜, λ‹€μ–‘ν•œ 곡연과 행사가 μ—΄λ¦½λ‹ˆλ‹€.
328
+
329
+ 4. **μ„œμšΈλžœλ“œ**
330
+ - μ„œμšΈ 외곽에 μœ„μΉ˜ν•œ ν…Œλ§ˆνŒŒν¬λ‘œ, κ°€μ‘±λ‹¨μœ„ κ΄€κ΄‘κ°λ“€μ—κ²Œ 인기 μžˆλŠ” κ³³μž…λ‹ˆλ‹€.
331
+
332
+ 이 μ½”μŠ€λ“€μ€ μ„œμšΈμ˜ λ‹€μ–‘ν•œ λ©΄λͺ¨λ₯Ό κ²½ν—˜ν•  수 μžˆλ„λ‘ κ΅¬μ„±λ˜μ–΄ μžˆμŠ΅λ‹ˆλ‹€. 각 μ½”μŠ€λ§ˆλ‹€ μ‹œκ°„μ„ μ‘°μ ˆν•˜κ³ , 개인의 관심사에 맞게 μ„ νƒν•˜μ—¬ λ°©λ¬Έν•˜λ©΄ 쒋을 것 κ°™μŠ΅λ‹ˆλ‹€. 즐거운 μ—¬ν–‰ λ˜μ„Έμš”!
333
+ ```
334
+
335
+
336
+
337
+ ## Citation
338
+ **Language Model**
339
+ ```text
340
+ @misc{bllossom,
341
+ author = {ChangSu Choi, Yongbin Jeong, Seoyoon Park, InHo Won, HyeonSeok Lim, SangMin Kim, Yejee Kang, Chanhyuk Yoon, Jaewan Park, Yiseul Lee, HyeJin Lee, Younggyun Hahm, Hansaem Kim, KyungTae Lim},
342
+ title = {Optimizing Language Augmentation for Multilingual Large Language Models: A Case Study on Korean},
343
+ year = {2024},
344
+ journal = {LREC-COLING 2024},
345
+ paperLink = {\url{https://arxiv.org/pdf/2403.10882}},
346
+ },
347
+ }
348
+ ```
349
+
350
+ **Vision-Language Model**
351
+ ```text
352
+ @misc{bllossom-V,
353
+ author = {Dongjae Shin, Hyunseok Lim, Inho Won, Changsu Choi, Minjun Kim, Seungwoo Song, Hangyeol Yoo, Sangmin Kim, Kyungtae Lim},
354
+ title = {X-LLaVA: Optimizing Bilingual Large Vision-Language Alignment},
355
+ year = {2024},
356
+ publisher = {GitHub},
357
+ journal = {NAACL 2024 findings},
358
+ paperLink = {\url{https://arxiv.org/pdf/2403.11399}},
359
+ },
360
+ }
361
+ ```
362
+
363
+ ## Contact
364
+ - μž„κ²½νƒœ(KyungTae Lim), Professor at Seoultech. `ktlim@seoultech.ac.kr`
365
+ - ν•¨μ˜κ· (Younggyun Hahm), CEO of Teddysum. `hahmyg@teddysum.ai`
366
+ - κΉ€ν•œμƒ˜(Hansaem Kim), Professor at Yonsei. `khss@yonsei.ac.kr`
367
+
368
+ ## Contributor
369
+ - 졜창수(Chansu Choi), choics2623@seoultech.ac.kr
370
+ - 김상민(Sangmin Kim), sangmin9708@naver.com
371
+ - μ›μΈν˜Έ(Inho Won), wih1226@seoultech.ac.kr
372
+ - κΉ€λ―Όμ€€(Minjun Kim), mjkmain@seoultech.ac.kr
373
+ - μ†‘μŠΉμš°(Seungwoo Song), sswoo@seoultech.ac.kr
374
+ - μ‹ λ™μž¬(Dongjae Shin), dylan1998@seoultech.ac.kr
375
+ - μž„ν˜„μ„(Hyeonseok Lim), gustjrantk@seoultech.ac.kr
376
+ - μœ‘μ •ν›ˆ(Jeonghun Yuk), usually670@gmail.com
377
+ - μœ ν•œκ²°(Hangyeol Yoo), 21102372@seoultech.ac.kr
378
+ - μ†‘μ„œν˜„(Seohyun Song), alexalex225225@gmail.com
379
+