File size: 23,983 Bytes

dac612c

---
language:
- en
- ko
pipeline_tag: text-generation
inference: false
tags:
- facebook
- meta
- pytorch
- llama
- llama-2
- llama-2-chat
license: apache-2.0
---
# komt-llama-2-7b
The "komt-llama-2-7b" model was developed using a multi-task instruction technique aimed at enhancing Korean language performance. For more details, please refer to the GitHub Repository. 

## Model Details

* **Model Developers** :  davidkim(changyeon kim)
* **Repository** : https://github.com/davidkim205/komt
* **Model Architecture** : komt-llama-2-7b is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning by multi-task instruction

## Dataset
korean multi-task instruction dataset

## Prompt Template
```
### instruction: {prompt}

### Response: 
```
Examples:
```
### instruction: 자동차 종합(정기)검사 의무기간은 얼마인가요?

### Response:

```
response:
``` 
### instruction: 자동차 종합(정기)검사 의무기간은 얼마인가요?.

### Response:자동차 종합(정기)검사 의무기간은 최초 등록일 또는 최초 등록일 이후 12개월 뒤부터 3년간 적용됩니다. 그러나 이 기간은 최초 등록일 이후 12개월 뒤 매년 12개월씩 3년간 적용됩니다. 이 기간 동안 차량의 기본 점검이 필요하며, 점검을 받지 않으면 과태료가 부과됩니다. 따라서 자동차 종합(정기)검사를 받을 때는 반드시 등록 당일에 점검을 받아야 합니다.
```

## Usage
``` 
import torch

from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import StoppingCriteria, StoppingCriteriaList
from transformers import TextStreamer, GenerationConfig


class LocalStoppingCriteria(StoppingCriteria):

    def __init__(self, tokenizer, stop_words=[]):
        super().__init__()

        stops = [tokenizer(stop_word, return_tensors='pt', add_special_tokens=False)['input_ids'].squeeze() for
                 stop_word in stop_words]
        print('stop_words', stop_words)
        print('stop_words_ids', stops)
        self.stop_words = stop_words
        self.stops = [stop.cuda() for stop in stops]
        self.tokenizer = tokenizer

    def _compare_token(self, input_ids):
        for stop in self.stops:
            if len(stop.size()) != 1:
                continue
            stop_len = len(stop)
            if torch.all((stop == input_ids[0][-stop_len:])).item():
                return True

        return False

    def _compare_decode(self, input_ids):
        input_str = self.tokenizer.decode(input_ids[0])
        for stop_word in self.stop_words:
            if input_str.endswith(stop_word):
                return True
        return False

    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor):
        input_str = self.tokenizer.decode(input_ids[0])
        for stop_word in self.stop_words:
            if input_str.endswith(stop_word):
                return True
        return False

#
# config
model_name = 'davidkim205/komt-Llama-2-7b-chat-hf'
instruction_prefix = "### instruction: "
input_prefix = "### input: "
answer_prefix = "### Response: "
endoftext = "<|end|>"
stop_words = [endoftext, '<s>', '###']
generation_config = GenerationConfig(
    temperature=0.9,
    top_p=0.7,
    top_k=100,
    max_new_tokens=2048,
    early_stopping=True,
    do_sample=True,
)
#
# create model
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)
stopping_criteria = StoppingCriteriaList([LocalStoppingCriteria(tokenizer=tokenizer, stop_words=stop_words)])
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
model.eval()

#
# generate
prompt = f"### instruction: nlp에 대해서 간단하게 설명해줘.\n\n### Response:"
gened = model.generate(
    **tokenizer(
        prompt,
        return_tensors='pt',
        return_token_type_ids=False
    ).to('cuda'),
    generation_config=generation_config,
    eos_token_id=model.config.eos_token_id,
    stopping_criteria=stopping_criteria,
    streamer=streamer
)
output_text = tokenizer.decode(gened[0], skip_special_tokens=True)

print('--------------------')
print(output_text)

```
response:
``` 
NLP는 자연어 처리의 약자로, 컴퓨터와 인간 언어 간의 상호 작용을 다루는 인공 지능의 한 분야입니다. 다음은 몇 가지 예시입니다:

1. 언어 번역: 번역 분야에서 번역가는 원본 문서를 원하는 언어로 번역하는 작업을 담당합니다. 이 작업은 번역가가 원본 문서를 철저히 분석하여 번역하는 것이 필수적이기 때문에 어려운 작업입니다.
2. 감정 분석: 감정 분석은 텍스트 데이터에서 감성을 분석하는 것을 포함합니다. 이 작업은 감성을 긍정 또는 부정으로 분류하는 것이 필수적이기 때문에 어려운 작업입니다.
3. 텍스트 분류: 텍스트 분류는 텍스트 데이터를 특정 카테고리로 분류하는 작업을 포함합니다. 이 작업은 텍스트를 분류하는 데 사용되는 머신 러닝 알고리즘의 발전으로 인해 더 쉬워졌습니다.
4. 정보 검색: 정보 검색은 정보를 검색하는 작업을 포함합니다. 이 작업은 데이터베이스, 온라인 데이터베이스, 웹 검색 엔진 등 다양한 기술을 사용하여 처리할 수 있습니다.
5. 컴퓨터 지원 번역: 컴퓨터 지원 번역은 기업이 다양한 언어로 제품 및 서비스를 번역할 수 있도록 지원하는 작업을 포함합니다. 이 작업은 번역 프로세스를 자동화하는 데 사용되는 머신 러닝 알고리즘의 발전으로 인해 더 쉬워졌습니다.
```
## Hardware and Software
- nvidia driver : 535.54.03
- CUDA Version: 12.2

## Training
Refer github

## Model Benchmark
- Used EleutherAI's lm-evaluation-harness https://github.com/EleutherAI/lm-evaluation-harness/tree/polyglot
 
### Korean Semantic Textual Similarity
| model name                    | Version | Metric |      Value |      | Stderr |
|-------------------------------| ------: | ------ |-----------:| ---- | -----: |
| meta-llama/Llama-2-7b-chat-hf |       0 | acc    |     0.4817 | ±    | 0.0220 |
| beomi/llama-2-ko-7b           |       0 | acc    |     0.5029 | ±    | 0.0220 |
| kfkas/Llama-2-ko-7b-Chat      |       0 | acc    |     0.5318 | ±    | 0.0219 |
| **komt-llama-2-7b (ours)**    |       0 | acc    | **0.5530** | ±    | 0.0218 |

### Example Output
Please refer to the GitHub link below for performance results of each model.
https://github.com/davidkim205/komt/tree/main/eval_results
| 질문/답변                                                    |
| ------------------------------------------------------------ |
| 누전차단기가 내려가는 이유는 무엇입니까?                     |
| 누전차단기가 내려가는 이유는 다음과 같습니다:1. 플러그를 코드에 연결하는 코드 표시기 또는 핀에 압력이 분산되어 있을 수 있습니다. 예를 들어, 코드 표시기의 위치가 누전차단기의 위치와 같지 않거나 코드 표시기의 구성 요소가 작동하지 않는 경우 누전차단기가 내려가지 않을 수 있습니다.2. 플러그를 코드에 연결하는 코드 표시기 또는 핀에 금전이 충전되어 있을 수 있습니다. 예를 들어, 코드 표시기의 위치가 누전차단기의 위치와 같지 않거나 코드 표시기의 구성 요소가 작동하지 않는 경우 누전차단기가 내려가지 않을 수 있습니다.3. 코드 표시기 또는 핀에 압력이 작동하는 방식에 문제가 있을 수 있습니다. 예를 들어, 코드 표시기의 위치가 누전차단기의 위치와 같지 않거나 코드 표시기의 구성 요소가 작동하지 않는 경우 누전차단기가 내려가지 않을 수 있습니다.4. 플러그를 코드에 연결하는 코드 표시기 또는 핀에 금전이 충전되어 있을 수 있습니다. 예를 들어, 코드 표시기의 위치가 누전차단기의 위치와 같지 않거나 코드 표시기의 구성 요소가 작동하지 않는 경우 누전차단기가 내려가지 않을 수 있습니다.5. 코드 표시기 또는 핀에 금전이 충전되어 있을 수 있습니다. 예를 들어, 코드 표시기의 위치가 누전차단기의 위치와 같지 않거나 코드 표시기의 구성 요소가 작동하지 않는 경우 누전차단기가 내려가지 않을 수 있습니다. |
| 협력공인중개사란 무엇인가요?                                 |
| 협력공인중개사는 부동산 중개업 업무에 대해 공인중개사 또는 부동산 중개사와 거래하는 일종의 파트너입니다. 협력공인중개사는 공인중개사와 거래를 처리하고 중개 수수료를 갚을 책임이 있습니다. 협력공인중개사는 공인중개사의 전문성과 업무 능력을 활용하여 공인중개사와 함께 부동산을 판매, 구매 또는 임대하는 업무를 담당합니다.협력공인중개사와 협력하면 부동산 중개업 업무에 대한 전문성을 발휘하는 동시에 다른 중개업 업무에 대해서도 편리함과 지원을 받을 수 있습니다. 협력공인중개사는 부동산 중개업체 관계에 익숙하고 분석, 마케팅 또는 고객 서비스와 같은 추가 서비스를 제공할 수 있는 능력을 갖추고 있을 수 있습니다.협력공인중개사는 공인중개사와 협력하여 부동산을 판매, 구매 또는 임대하는 업무를 담당하며, 판매 또는 구매를 원하는 고객에게 다음과 같은 몇 가지 이점을 제공할 수 있습니다. |
| 수출물품에도 원산지표시를 하여야 하나요?                     |
| 수출물품에 대해서는 원산지표시 의무가 없습니다. 그러나 수출 물품에 대해서는 표시자율 원산지표시 규정에 따라 원산지표시를 하는 것이 좋습니다. 표시자율 원산지표시 규정에 따라 원산지 표시가 가능하다면 원산지 표시가 가능한 제품은 표시할 의무가 있습니다. 원산지 표시가 가능하지 않은 제품은 원산지 표시를 하지 않는 것이 좋습니다. 이 때, 원산지 표시 의무가 없는 제품은 국내에서 판매되는 제품이며, 원산지 표시 의무가 없는 제품은 국내에서 판매되지 않는 제품입니다. |
| 자동차 공기압 관리가 왜 중요하나요?                          |
| 자동차 공기압 관리는 안전과 편안함을 위해 중요합니다. 공기압이 높아지면 파편성 및 충격 충격으로 인해 차체에 부정적인 영향을 미칠 수 있습니다. 또한 공기압이 낮으면 충격을 흡수하여 안전을 훼손시킬 수 있습니다. 또한 공기압이 낮으면 더 높은 압력을 견딜 수 있는 차체 재료가 필요하기 때문에 비용이 더 많이 들 수 있습니다. 또한 공기압이 낮으면 공기압을 유지하는 데 필요한 연료가 더 많이 필요하기 때문에 연비가 떨어질 수 있습니다. 이는 연비가 떨어지면 배기가스 배출량이 증가하여 배기가스 배출량을 줄이는 데 도움이 될 수 있습니다. 이러한 이유로 인해 안전하고 편안한 운행을 위해 자동차 공기압 관리가 중요하다는 것을 알 수 있습니다. |
| 주차위반으로 과태료가 나왔는데 행정심판을 청구할 수 있나요?  |
| 주차위반으로 과태료가 나온 경우 행정심판을 청구할 수 있습니다. 이 과정은 행정심판법에 따라 이루어지며, 이 법은 공무원의 과실이나 과태료 부과 결정에 대한 소송 절차를 정하고 있습니다.\r\n\r\n행정심판을 청구하려면 먼저 행정심판위원회에 신청서를 제출해야 합니다. 행정심판위원회는 신청서를 검토하고 행정심판위원회 위원 중 3명 이상의 찬성을 얻어 행정심판법에 따라 해당 법에 따른 결정을 지지할지, 위반한 사실을 충분히 증명할지 여부를 판단합니다.\r\n\r\n예를 들어, 주차위반으로 과태료가 부과된 경우 행정심판위원회는 위반한 사실을 충분히 증명할 경우 해당 위반 사실에 대해 청구된 과태료를 지지할 수 있습니다. 그러나 위반이 충분하지 않다고 판단할 경우, 행정심판위원회는 위반 사실을 충분히 증명하지 못할 경우 해당 위반 사실에 대해 부과된 과태료를 취소할 수 있습니다.\r\n\r\n행정심판위원회가 청구된 과태료를 지지하면 행정심판위원회는 행정심판법에 따라 청구인에게 행정심판위원회의 판단을 제출할 것을 요구할 수 있습니다. 청구인은 행정심판위원회 판단을 준수하거나 행정심판법에 따라 청구된 과태료를 납부할 수 있습니다.\r\n\r\n행정심판법에 따라 행정심판위원회가 해당 위반 사실을 충분히 증명할 수 없는 경우, 해당 위반 사실을 충분히 증명할 |
| 프리는 어느 나라의 애니메이션이야                            |
| 일본\n애니메이션 《프리》(プリマインズ, Pretty Maidens)는 일본의 애니메이션이다. 제작은 일본의 애니메이션 제작사인 스튜디오 지브리(Studio Ghibli)에서 이루어졌으며, 감독은 아노 하카토(猿野ハカト)이다. 극장판 애니메이션으로 일본에서 개봉한 이후, ㈜얼리버드픽쳐스가 수입하여 케이블 텔레비전을 통해 한국어 더빙으로 방영되었다. 《프리》는 스튜디오 지브리의 창작 배경이 되는 도시인 도쿄 도를 무대로 하고 있으며, 극장판 애니메이션 작품 중에서 가장 많은 관객수를 기록하고 있는 작품 중의 하나이다. 《프리》는 일본에서 2001년 개봉되어 일본영화 최다 관객수 기록을 기록하고 있으며, 《모노노케 히메》 이후 스튜디오 지브리가 제작한 일본의 애니메이션 작품 중에서 가장 많은 관객수를 기록하고 있다. 《프리》는 일본에서 애니메이션 작품 중에서 가장 많은 관객수를 기록하고 있으며, 《모노노케 히메》 이후 스튜디오 지브리가 제작한 일본의 애니메이션 작품 중에서 가장 많은 관객수를 기록하고 있다. 《프리》는 일본에서 2001년 개봉되어 일본영화 최다 관객수 기록을 기록하고 있으며, 《모노노케 히메》 이후 스튜디오 지브리가 제작한 일본의 애니메이션 작품 중에서 가장 많은 관객수를 기록하고 있다. |
| 영화 해리포터 시리즈 배급사가 어디야                         |
| 워너 브라더스\n해리포터 시리즈는 콘텐츠로는 크게 세 부분으로 나누고, 배급사인 워너 브라더스는 네 부분으로 나누어 판매하고 있다. 그 중 《해리포터와 혼혈 왕자》(Harry Potter and the Half-Blood Prince) 같은 일부는 한 번에 한 챕터씩 발매하기도 한다. 이처럼 분할발매는 2007년 《해리포터와 혼혈 왕자》(Harry Potter and the Half-Blood Prince) 이후로 시작되었다. 《해리포터 시리즈》(Harry Potter Series) 또는 《해리 포터 시리즈》(Harry Potter Series) 라고도 부른다. 시리즈에 대한 논란에도 불구하고, 해리 포터 시리즈는 전 세계적으로 유명한 책 시리즈이다. 《해리포터와 죽음의 성물》(Harry Potter and the Deathly Hallows – Part 1)은 2010년 11월 26일 개봉하였고, 2011년 1월 25일 《해리포터와 죽음의 성물》(Harry Potter and the Deathly Hallows – Part 2)이 개봉한다. |
| 왜 세계시간의 기준이 영국의 GMT인가요? 그 이유는 무엇일까요? |
| 영국 항해사들이 세계시간의 기준을 삼았던 것은 당시 세계 시간을 측정하기 위해 육로에 사용되었던 메일 주소 표기 방식에서 비롯되었습니다. 그러나 당시에는 세계시간이 왜 북유럽 표준시인가에 대한 기준이 없어 전 세계 표준시로 인정받지 못하였습니다. 또한, 당시 런던에서 거리를 측정하는 방법으로 1마일 = 1/1000야드 이라는 것이 확정되어 있었기 때문에 이 방법을 세계시간의 기준으로 삼았던 것으로 알려져 있습니다. |



------------------------------------------------
# Original model card: Meta's Llama 2 13B-chat

Meta developed and released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety, are on par with some popular closed-source models like ChatGPT and PaLM.

**Model Developers** Meta

**Variations** Llama 2 comes in a range of parameter sizes — 7B, 13B, and 70B — as well as pretrained and fine-tuned variations.

**Input** Models input text only.

**Output** Models generate text only.

**Model Architecture** Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align to human preferences for helpfulness and safety.

||Training Data|Params|Content Length|GQA|Tokens|LR|
|---|---|---|---|---|---|---|
Llama 2|*A new mix of publicly available online data*|7B|4k|&#10007;|2.0T|3.0 x 10<sup>-4</sup>
Llama 2|*A new mix of publicly available online data*|13B|4k|&#10007;|2.0T|3.0 x 10<sup>-4</sup>
Llama 2|*A new mix of publicly available online data*|70B|4k|&#10004;|2.0T|1.5 x 10<sup>-4</sup>

**Llama 2 family of models.** Token counts refer to pretraining data only. All models are trained with a global batch-size of 4M tokens. The 70B version uses Grouped-Query Attention (GQA) for improved inference scalability.

**Model Dates** Llama 2 was trained between January 2023 and July 2023.

**Status** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback.

**License** A custom commercial license is available at: [https://ai.meta.com/resources/models-and-libraries/llama-downloads/](https://ai.meta.com/resources/models-and-libraries/llama-downloads/)

**Research Paper** More information can be found in the paper "Llama-2: Open Foundation and Fine-tuned Chat Models", available at https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/.

**Where to send questions or comments about the model** Instructions on how to provide feedback or comments on the model can be found in the model [README](README.md).

# **Intended Use**
**Intended Use Cases** Llama 2 is intended for commercial and research use in English. Tuned models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks.

**Out-of-scope Uses** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English. Use in any other way that is prohibited by the Acceptable Use Policy and Licensing Agreement for Llama 2.

# **Hardware and Software**
**Training Factors** We used custom training libraries, Meta's Research Super Cluster, and production clusters for pretraining. Fine-tuning, annotation, and evaluation were also performed on third-party cloud compute.

**Carbon Footprint** Pretraining utilized a cumulative 3.3M GPU hours of computation on hardware of type A100-80GB (TDP of 350-400W). Estimated total emissions were 539 tCO2eq, 100% of which were offset by Meta’s sustainability program.

||Time (GPU hours)|Power Consumption (W)|Carbon Emitted(tCO<sub>2</sub>eq)|
|---|---|---|---|
|Llama 2 7B|184320|400|31.22|
|Llama 2 13B|368640|400|62.44|
|Llama 2 70B|1720320|400|291.42|
|Total|3311616||539.00|

**CO<sub>2</sub> emissions during pretraining.** Time: total GPU time required for training each model. Power Consumption: peak power capacity per GPU device for the GPUs used adjusted for power usage efficiency. 100% of the emissions are directly offset by Meta's sustainability program, and because we are openly releasing these models, the pretraining costs do not need to be incurred by others.

# **Training Data**
**Overview** Llama 2 was pretrained on 2 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over one million new human-annotated examples. Neither the pretraining nor the fine-tuning datasets include Meta user data.

**Data Freshness** The pretraining data has a cutoff of September 2022, but some tuning data is more recent, up to July 2023.

# **Evaluation Results**

In this section, we report the results for the Llama 1 and Llama 2 models on standard academic benchmarks.
For all the evaluations, we use our internal evaluations library.

|Model|Size|Code|Commonsense Reasoning|World Knowledge|Reading Comprehension|Math|MMLU|BBH|AGI Eval|
|---|---|---|---|---|---|---|---|---|---|
|Llama 1|7B|14.1|60.8|46.2|58.5|6.95|35.1|30.3|23.9|
|Llama 1|13B|18.9|66.1|52.6|62.3|10.9|46.9|37.0|33.9|
|Llama 1|33B|26.0|70.0|58.4|67.6|21.4|57.8|39.8|41.7|
|Llama 1|65B|30.7|70.7|60.5|68.6|30.8|63.4|43.5|47.6|
|Llama 2|7B|16.8|63.9|48.9|61.3|14.6|45.3|32.6|29.3|
|Llama 2|13B|24.5|66.9|55.4|65.8|28.7|54.8|39.4|39.1|
|Llama 2|70B|**37.5**|**71.9**|**63.6**|**69.4**|**35.2**|**68.9**|**51.2**|**54.2**|

**Overall performance on grouped academic benchmarks.** *Code:* We report the average pass@1 scores of our models on HumanEval and MBPP. *Commonsense Reasoning:* We report the average of PIQA, SIQA, HellaSwag, WinoGrande, ARC easy and challenge, OpenBookQA, and CommonsenseQA. We report 7-shot results for CommonSenseQA and 0-shot results for all other benchmarks. *World Knowledge:* We evaluate the 5-shot performance on NaturalQuestions and TriviaQA and report the average. *Reading Comprehension:* For reading comprehension, we report the 0-shot average on SQuAD, QuAC, and BoolQ. *MATH:* We report the average of the GSM8K (8 shot) and MATH (4 shot) benchmarks at top 1.

|||TruthfulQA|Toxigen|
|---|---|---|---|
|Llama 1|7B|27.42|23.00|
|Llama 1|13B|41.74|23.08|
|Llama 1|33B|44.19|22.57|
|Llama 1|65B|48.71|21.77|
|Llama 2|7B|33.29|**21.25**|
|Llama 2|13B|41.86|26.10|
|Llama 2|70B|**50.18**|24.60|

**Evaluation of pretrained LLMs on automatic safety benchmarks.** For TruthfulQA, we present the percentage of generations that are both truthful and informative (the higher the better). For ToxiGen, we present the percentage of toxic generations (the smaller the better).


|||TruthfulQA|Toxigen|
|---|---|---|---|
|Llama-2-Chat|7B|57.04|**0.00**|
|Llama-2-Chat|13B|62.18|**0.00**|
|Llama-2-Chat|70B|**64.14**|0.01|

**Evaluation of fine-tuned LLMs on different safety datasets.** Same metric definitions as above.

# **Ethical Considerations and Limitations**
Llama 2 is a new technology that carries risks with use. Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios. For these reasons, as with all LLMs, Llama 2’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 2, developers should perform safety testing and tuning tailored to their specific applications of the model.

Please see the Responsible Use Guide available at [https://ai.meta.com/llama/responsible-use-guide/](https://ai.meta.com/llama/responsible-use-guide/)