DopeorNope's picture
Update README.md
8915c4e verified
---
license: cc-by-nc-sa-4.0
language:
- ko
- en
tags:
- moe
---
# The license is cc-by-nc-sa-4.0.
- Commercializing is not allowed.
![mark1](Markr_AI.png)
---
# Not based on Synatra model, we pre-train and full-finetuning Mixtralx2 to enhance Korean abilities.
# Developer
Seungyoo Lee (DopeorNope), Kyujin Han(kyujinpy)
---
# DATASET.
- Continuous pre-train was performed using AI hub corpus, and we applied instruct-tune using AI hub datasets.
- Using a Self-supervised learning manner, we converted raw corpus to instruct tuned data.
- We used text-mining techniques to create the train data.
- Here is some examples...
- **Mask prediction Task**
```python
#Mask prediction
text='์ง€๋Šฅ(ๆ™บ่ƒฝ) ๋˜๋Š” ์ธํ…”๋ฆฌ์ „์Šค(intelligence)๋Š” ์ธ๊ฐ„์˜ <MASK> ๋Šฅ๋ ฅ์„ ๋งํ•œ๋‹ค.'
response='์ง€์ '
complete_text='์ง€๋Šฅ(ๆ™บ่ƒฝ) ๋˜๋Š” ์ธํ…”๋ฆฌ์ „์Šค(intelligence)๋Š” ์ธ๊ฐ„์˜ ์ง€์  ๋Šฅ๋ ฅ์„ ๋งํ•œ๋‹ค.'
```
- **Text allign Task**
```python
#Text-allign Task
text_list=['๋ณต์ˆ˜๋ช…๋ น-๋ณต์ˆ˜์ž๋ฃŒ(MIMD,Multiple Instruction, Multiple Data)์€ ์ „์‚ฐ์—์„œ ๋ณ‘๋ ฌํ™”์˜ ํ•œ ๊ธฐ๋ฒ•์ด๋‹ค.',
'๋ถ„์‚ฐ ๋ฉ”๋ชจ๋ฆฌ์˜ ์˜ˆ๋Š” MPP(massively parallel processors)์™€ COW (Clusters of Workstations)์ด๋‹ค.',
'MIMD๊ธฐ๊ณ„๋Š” ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ์ด๊ฑฐ๋‚˜ ๋ถ„์‚ฐ ๋ฉ”๋ชจ๋ฆฌ์ด๋ฉฐ ์ด๋Ÿฌํ•œ ๋ถ„๋ฅ˜๋Š” MIMD๊ฐ€ ์–ด๋–ป๊ฒŒ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ด์šฉํ•˜๋Š๋ƒ์— ๋”ฐ๋ผ ๋‚˜๋‰œ๋‹ค.']
response='๋ณต์ˆ˜๋ช…๋ น-๋ณต์ˆ˜์ž๋ฃŒ(MIMD,Multiple Instruction, Multiple Data)์€ ์ „์‚ฐ์—์„œ ๋ณ‘๋ ฌํ™”์˜ ํ•œ ๊ธฐ๋ฒ•์ด๋‹ค. \
MIMD๊ธฐ๊ณ„๋Š” ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ์ด๊ฑฐ๋‚˜ ๋ถ„์‚ฐ ๋ฉ”๋ชจ๋ฆฌ์ด๋ฉฐ ์ด๋Ÿฌํ•œ ๋ถ„๋ฅ˜๋Š” MIMD๊ฐ€ ์–ด๋–ป๊ฒŒ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ด์šฉํ•˜๋Š๋ƒ์— ๋”ฐ๋ผ ๋‚˜๋‰œ๋‹ค. \
๋ถ„์‚ฐ ๋ฉ”๋ชจ๋ฆฌ์˜ ์˜ˆ๋Š” MPP(massively parallel processors)์™€ COW (Clusters of Workstations)์ด๋‹ค.'
```
- **Text completion Task**
```python
#Text Completion
text= '๊ทธ๋ฆฐ๋ธŒ๋ผ์šฐ์ €(GreenBrowser)๋Š” ์ธํ„ฐ๋„ท ์ต์Šคํ”Œ๋กœ๋Ÿฌ์—์„œ ์‚ฌ์šฉํ•˜๋Š” ํŠธ๋ผ์ด๋˜ํŠธ ๋ ˆ์ด์•„์›ƒ ์—”์ง„์„ ๋ฐ”ํƒ•์œผ๋กœ ํ•˜๋ฉฐ ์ค‘๊ตญ์— ๊ธฐ๋ฐ˜์„ ๋‘” ์†Œํ”„ํŠธ์›จ์–ด ํšŒ์‚ฌ์ธ ๋ชจ์–ดํ€ต(morequick)์—์„œ ๋งŒ๋“  ๋ฌด๋ฃŒ ์›น ๋ธŒ๋ผ์šฐ์ €๋‹ค. ๊ฐ„์ฒด์ž ์ค‘๊ตญ์–ด๊ฐ€ ์›น ๋ธŒ๋ผ์šฐ์ €์— ๋‚ด์žฅ๋˜์–ด ์žˆ๋‹ค.
๋งฅ์Šคํ†ค ์›น ๋ธŒ๋ผ์šฐ์ €์™€ ๋น„์Šทํ•˜์—ฌ MyIE์™€ ๋ฐ€์ ‘ํ•˜๊ฒŒ ๊ด€๋ จ๋˜์–ด ์žˆ๋‹ค. ๋งฅ์Šคํ†ค์šฉ์˜ ์ผ๋ถ€ ํ”Œ๋Ÿฌ๊ทธ์ธ์ด ๊ทธ๋ฆฐ๋ธŒ๋ผ์šฐ์ €์—์„œ๋„ ์ž‘๋™ํ•  ๊ฒƒ์ด๋‹ค.'
response= '์ž๋™ ์Šคํฌ๋กค, ์ž๋™ ๋ฆฌํ”„๋ ˆ์‹œ, ์ž๋™ ์ €์žฅ, ์ž๋™ ํผ ์ฑ„์šฐ๊ธฐ์™€ ๊ฐ™์€ ๋งŽ์€ ์ž๋™ํ™” ๊ธฐ๋Šฅ์ด ์žˆ๋‹ค.'
```
---
# Acknoledgement
Markr AI is in constant communication with numerous open-source developers and researchers. We would also like to express our gratitude to **Beomi** and **Maywell**, who have provided many insights through extensive discussions in the development of the model.