metadata

license: cc-by-nc-sa-4.0
language:
  - ko
  - en
tags:
  - moe

The license is cc-by-nc-sa-4.0.

Commercializing is not allowed.

Not based on Synatra model, we pre-train and full-finetuning Mixtralx2 to enhance Korean abilities.

Developer

Seungyoo Lee (DopeorNope), Kyujin Han(kyujinpy)

DATASET.

Continuous pre-train was performed using AI hub corpus, and we applied instruct-tune using AI hub datasets.
Using a Self-supervised learning manner, we converted raw corpus to instruct tuned data.
We used text-mining techniques to create the train data.
Here is some examples...
Mask prediction Task


#Mask prediction

text='지능(智能) 또는 인텔리전스(intelligence)는 인간의 <MASK> 능력을 말한다.'

response='지적'

complete_text='지능(智能) 또는 인텔리전스(intelligence)는 인간의 지적 능력을 말한다.'

Text allign Task


#Text-allign Task

text_list=['복수명령-복수자료(MIMD,Multiple Instruction, Multiple Data)은 전산에서 병렬화의 한 기법이다.',
           '분산 메모리의 예는 MPP(massively parallel processors)와 COW (Clusters of Workstations)이다.',
           'MIMD기계는 공유 메모리이거나 분산 메모리이며 이러한 분류는 MIMD가 어떻게 메모리를 이용하느냐에 따라 나뉜다.']



response='복수명령-복수자료(MIMD,Multiple Instruction, Multiple Data)은 전산에서 병렬화의 한 기법이다. \
          MIMD기계는 공유 메모리이거나 분산 메모리이며 이러한 분류는 MIMD가 어떻게 메모리를 이용하느냐에 따라 나뉜다. \
          분산 메모리의 예는 MPP(massively parallel processors)와 COW (Clusters of Workstations)이다.'

Text completion Task


#Text Completion

text= '그린브라우저(GreenBrowser)는 인터넷 익스플로러에서 사용하는 트라이던트 레이아웃 엔진을 바탕으로 하며 중국에 기반을 둔 소프트웨어 회사인 모어퀵(morequick)에서 만든 무료 웹 브라우저다. 간체자 중국어가 웹 브라우저에 내장되어 있다.
      맥스톤 웹 브라우저와 비슷하여 MyIE와 밀접하게 관련되어 있다. 맥스톤용의 일부 플러그인이 그린브라우저에서도 작동할 것이다.'



response= '자동 스크롤, 자동 리프레시, 자동 저장, 자동 폼 채우기와 같은 많은 자동화 기능이 있다.'

Acknoledgement

Markr AI is in constant communication with numerous open-source developers and researchers. We would also like to express our gratitude to Beomi and Maywell, who have provided many insights through extensive discussions in the development of the model.