DopeorNope's picture
Update README.md
8915c4e verified
metadata
license: cc-by-nc-sa-4.0
language:
  - ko
  - en
tags:
  - moe

The license is cc-by-nc-sa-4.0.

  • Commercializing is not allowed.

mark1


Not based on Synatra model, we pre-train and full-finetuning Mixtralx2 to enhance Korean abilities.

Developer

Seungyoo Lee (DopeorNope), Kyujin Han(kyujinpy)


DATASET.

  • Continuous pre-train was performed using AI hub corpus, and we applied instruct-tune using AI hub datasets.

  • Using a Self-supervised learning manner, we converted raw corpus to instruct tuned data.

  • We used text-mining techniques to create the train data.

  • Here is some examples...

  • Mask prediction Task


#Mask prediction

text='์ง€๋Šฅ(ๆ™บ่ƒฝ) ๋˜๋Š” ์ธํ…”๋ฆฌ์ „์Šค(intelligence)๋Š” ์ธ๊ฐ„์˜ <MASK> ๋Šฅ๋ ฅ์„ ๋งํ•œ๋‹ค.'

response='์ง€์ '

complete_text='์ง€๋Šฅ(ๆ™บ่ƒฝ) ๋˜๋Š” ์ธํ…”๋ฆฌ์ „์Šค(intelligence)๋Š” ์ธ๊ฐ„์˜ ์ง€์  ๋Šฅ๋ ฅ์„ ๋งํ•œ๋‹ค.'
  • Text allign Task

#Text-allign Task

text_list=['๋ณต์ˆ˜๋ช…๋ น-๋ณต์ˆ˜์ž๋ฃŒ(MIMD,Multiple Instruction, Multiple Data)์€ ์ „์‚ฐ์—์„œ ๋ณ‘๋ ฌํ™”์˜ ํ•œ ๊ธฐ๋ฒ•์ด๋‹ค.',
           '๋ถ„์‚ฐ ๋ฉ”๋ชจ๋ฆฌ์˜ ์˜ˆ๋Š” MPP(massively parallel processors)์™€ COW (Clusters of Workstations)์ด๋‹ค.',
           'MIMD๊ธฐ๊ณ„๋Š” ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ์ด๊ฑฐ๋‚˜ ๋ถ„์‚ฐ ๋ฉ”๋ชจ๋ฆฌ์ด๋ฉฐ ์ด๋Ÿฌํ•œ ๋ถ„๋ฅ˜๋Š” MIMD๊ฐ€ ์–ด๋–ป๊ฒŒ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ด์šฉํ•˜๋Š๋ƒ์— ๋”ฐ๋ผ ๋‚˜๋‰œ๋‹ค.']



response='๋ณต์ˆ˜๋ช…๋ น-๋ณต์ˆ˜์ž๋ฃŒ(MIMD,Multiple Instruction, Multiple Data)์€ ์ „์‚ฐ์—์„œ ๋ณ‘๋ ฌํ™”์˜ ํ•œ ๊ธฐ๋ฒ•์ด๋‹ค. \
          MIMD๊ธฐ๊ณ„๋Š” ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ์ด๊ฑฐ๋‚˜ ๋ถ„์‚ฐ ๋ฉ”๋ชจ๋ฆฌ์ด๋ฉฐ ์ด๋Ÿฌํ•œ ๋ถ„๋ฅ˜๋Š” MIMD๊ฐ€ ์–ด๋–ป๊ฒŒ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ด์šฉํ•˜๋Š๋ƒ์— ๋”ฐ๋ผ ๋‚˜๋‰œ๋‹ค. \
          ๋ถ„์‚ฐ ๋ฉ”๋ชจ๋ฆฌ์˜ ์˜ˆ๋Š” MPP(massively parallel processors)์™€ COW (Clusters of Workstations)์ด๋‹ค.'
  • Text completion Task

#Text Completion

text= '๊ทธ๋ฆฐ๋ธŒ๋ผ์šฐ์ €(GreenBrowser)๋Š” ์ธํ„ฐ๋„ท ์ต์Šคํ”Œ๋กœ๋Ÿฌ์—์„œ ์‚ฌ์šฉํ•˜๋Š” ํŠธ๋ผ์ด๋˜ํŠธ ๋ ˆ์ด์•„์›ƒ ์—”์ง„์„ ๋ฐ”ํƒ•์œผ๋กœ ํ•˜๋ฉฐ ์ค‘๊ตญ์— ๊ธฐ๋ฐ˜์„ ๋‘” ์†Œํ”„ํŠธ์›จ์–ด ํšŒ์‚ฌ์ธ ๋ชจ์–ดํ€ต(morequick)์—์„œ ๋งŒ๋“  ๋ฌด๋ฃŒ ์›น ๋ธŒ๋ผ์šฐ์ €๋‹ค. ๊ฐ„์ฒด์ž ์ค‘๊ตญ์–ด๊ฐ€ ์›น ๋ธŒ๋ผ์šฐ์ €์— ๋‚ด์žฅ๋˜์–ด ์žˆ๋‹ค.
      ๋งฅ์Šคํ†ค ์›น ๋ธŒ๋ผ์šฐ์ €์™€ ๋น„์Šทํ•˜์—ฌ MyIE์™€ ๋ฐ€์ ‘ํ•˜๊ฒŒ ๊ด€๋ จ๋˜์–ด ์žˆ๋‹ค. ๋งฅ์Šคํ†ค์šฉ์˜ ์ผ๋ถ€ ํ”Œ๋Ÿฌ๊ทธ์ธ์ด ๊ทธ๋ฆฐ๋ธŒ๋ผ์šฐ์ €์—์„œ๋„ ์ž‘๋™ํ•  ๊ฒƒ์ด๋‹ค.'



response= '์ž๋™ ์Šคํฌ๋กค, ์ž๋™ ๋ฆฌํ”„๋ ˆ์‹œ, ์ž๋™ ์ €์žฅ, ์ž๋™ ํผ ์ฑ„์šฐ๊ธฐ์™€ ๊ฐ™์€ ๋งŽ์€ ์ž๋™ํ™” ๊ธฐ๋Šฅ์ด ์žˆ๋‹ค.'

Acknoledgement

Markr AI is in constant communication with numerous open-source developers and researchers. We would also like to express our gratitude to Beomi and Maywell, who have provided many insights through extensive discussions in the development of the model.