|
--- |
|
license: cc-by-nc-sa-4.0 |
|
language: |
|
- ko |
|
- en |
|
tags: |
|
- moe |
|
--- |
|
|
|
|
|
# The license is cc-by-nc-sa-4.0. |
|
|
|
- Commercializing is not allowed. |
|
|
|
![mark1](Markr_AI.png) |
|
|
|
--- |
|
# Not based on Synatra model, we pre-train and full-finetuning Mixtralx2 to enhance Korean abilities. |
|
|
|
# Developer |
|
|
|
Seungyoo Lee (DopeorNope), Kyujin Han(kyujinpy) |
|
|
|
--- |
|
# DATASET. |
|
|
|
- Continuous pre-train was performed using AI hub corpus, and we applied instruct-tune using AI hub datasets. |
|
|
|
- Using a Self-supervised learning manner, we converted raw corpus to instruct tuned data. |
|
|
|
- We used text-mining techniques to create the train data. |
|
|
|
- Here is some examples... |
|
|
|
- **Mask prediction Task** |
|
|
|
```python |
|
|
|
#Mask prediction |
|
|
|
text='์ง๋ฅ(ๆบ่ฝ) ๋๋ ์ธํ
๋ฆฌ์ ์ค(intelligence)๋ ์ธ๊ฐ์ <MASK> ๋ฅ๋ ฅ์ ๋งํ๋ค.' |
|
|
|
response='์ง์ ' |
|
|
|
complete_text='์ง๋ฅ(ๆบ่ฝ) ๋๋ ์ธํ
๋ฆฌ์ ์ค(intelligence)๋ ์ธ๊ฐ์ ์ง์ ๋ฅ๋ ฅ์ ๋งํ๋ค.' |
|
|
|
``` |
|
- **Text allign Task** |
|
|
|
```python |
|
|
|
#Text-allign Task |
|
|
|
text_list=['๋ณต์๋ช
๋ น-๋ณต์์๋ฃ(MIMD,Multiple Instruction, Multiple Data)์ ์ ์ฐ์์ ๋ณ๋ ฌํ์ ํ ๊ธฐ๋ฒ์ด๋ค.', |
|
'๋ถ์ฐ ๋ฉ๋ชจ๋ฆฌ์ ์๋ MPP(massively parallel processors)์ COW (Clusters of Workstations)์ด๋ค.', |
|
'MIMD๊ธฐ๊ณ๋ ๊ณต์ ๋ฉ๋ชจ๋ฆฌ์ด๊ฑฐ๋ ๋ถ์ฐ ๋ฉ๋ชจ๋ฆฌ์ด๋ฉฐ ์ด๋ฌํ ๋ถ๋ฅ๋ MIMD๊ฐ ์ด๋ป๊ฒ ๋ฉ๋ชจ๋ฆฌ๋ฅผ ์ด์ฉํ๋๋์ ๋ฐ๋ผ ๋๋๋ค.'] |
|
|
|
|
|
|
|
response='๋ณต์๋ช
๋ น-๋ณต์์๋ฃ(MIMD,Multiple Instruction, Multiple Data)์ ์ ์ฐ์์ ๋ณ๋ ฌํ์ ํ ๊ธฐ๋ฒ์ด๋ค. \ |
|
MIMD๊ธฐ๊ณ๋ ๊ณต์ ๋ฉ๋ชจ๋ฆฌ์ด๊ฑฐ๋ ๋ถ์ฐ ๋ฉ๋ชจ๋ฆฌ์ด๋ฉฐ ์ด๋ฌํ ๋ถ๋ฅ๋ MIMD๊ฐ ์ด๋ป๊ฒ ๋ฉ๋ชจ๋ฆฌ๋ฅผ ์ด์ฉํ๋๋์ ๋ฐ๋ผ ๋๋๋ค. \ |
|
๋ถ์ฐ ๋ฉ๋ชจ๋ฆฌ์ ์๋ MPP(massively parallel processors)์ COW (Clusters of Workstations)์ด๋ค.' |
|
|
|
``` |
|
|
|
- **Text completion Task** |
|
|
|
```python |
|
|
|
#Text Completion |
|
|
|
text= '๊ทธ๋ฆฐ๋ธ๋ผ์ฐ์ (GreenBrowser)๋ ์ธํฐ๋ท ์ต์คํ๋ก๋ฌ์์ ์ฌ์ฉํ๋ ํธ๋ผ์ด๋ํธ ๋ ์ด์์ ์์ง์ ๋ฐํ์ผ๋ก ํ๋ฉฐ ์ค๊ตญ์ ๊ธฐ๋ฐ์ ๋ ์ํํธ์จ์ด ํ์ฌ์ธ ๋ชจ์ดํต(morequick)์์ ๋ง๋ ๋ฌด๋ฃ ์น ๋ธ๋ผ์ฐ์ ๋ค. ๊ฐ์ฒด์ ์ค๊ตญ์ด๊ฐ ์น ๋ธ๋ผ์ฐ์ ์ ๋ด์ฅ๋์ด ์๋ค. |
|
๋งฅ์คํค ์น ๋ธ๋ผ์ฐ์ ์ ๋น์ทํ์ฌ MyIE์ ๋ฐ์ ํ๊ฒ ๊ด๋ จ๋์ด ์๋ค. ๋งฅ์คํค์ฉ์ ์ผ๋ถ ํ๋ฌ๊ทธ์ธ์ด ๊ทธ๋ฆฐ๋ธ๋ผ์ฐ์ ์์๋ ์๋ํ ๊ฒ์ด๋ค.' |
|
|
|
|
|
|
|
response= '์๋ ์คํฌ๋กค, ์๋ ๋ฆฌํ๋ ์, ์๋ ์ ์ฅ, ์๋ ํผ ์ฑ์ฐ๊ธฐ์ ๊ฐ์ ๋ง์ ์๋ํ ๊ธฐ๋ฅ์ด ์๋ค.' |
|
|
|
``` |
|
--- |
|
|
|
# Acknoledgement |
|
|
|
|
|
Markr AI is in constant communication with numerous open-source developers and researchers. We would also like to express our gratitude to **Beomi** and **Maywell**, who have provided many insights through extensive discussions in the development of the model. |