File size: 3,428 Bytes
32f2637
dfcaa6c
 
 
06db899
dfcaa6c
 
 
06db899
32f2637
 
 
 
 
dfcaa6c
 
32f2637
 
 
dfcaa6c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
language:
- en
- ko
library_name: peft
tags:
- translation
- gemma
base_model: google/gemma-7b
---

# Model Card for Model ID
## Model Details
### Model Description
- **Developed by:** [Kang Seok Ju]
- **Contact:** [brildev7@gmail.com]

## Training Details
### Training Data
https://huggingface.co/datasets/traintogpb/aihub-koen-translation-integrated-tiny-100k

# Inference Examples
```
import os
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

model_id = "google/gemma-7b"
peft_model_id = "brildev7/gemma-7b-translation-enko-sft-qlora"
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=False
)

model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    quantization_config=quantization_config, 
    torch_dtype=torch.float16,
    attn_implementation="flash_attention_2",
    token=os.environ['HF_TOKEN'],
    device_map="auto"
)
model = PeftModel.from_pretrained(model, peft_model_id)

tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
tokenizer.pad_token_id = tokenizer.eos_token_id

# example
prompt_template = """Translate the following sentences into Korean language:
{}

translation:
"""
sentences = "Apple is facing a crisis in one of its key markets, China, as it is being challenged by local smartphone manufacturers. In a bid to counter the threat, Apple CEO Tim Cook is reportedly planning to visit China to meet with local smartphone manufacturers and discuss a joint investment. Apple is also reportedly considering installing an AI model from Baidu, the Chinese search giant, on its iPhone. The move comes as Apple is facing a price war in China, with local smartphone manufacturers offering steep discounts on their products."
texts = prompt_template.format(sentences)
inputs = tokenizer(texts, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- μ• ν”Œμ€ κ΅­μ‚° 슀마트폰 μ œμ‘°μ‚¬λ“€μ˜ λ„λ°œμ— μ€‘κ΅­μ—μ„œ ν•˜λ‚˜μ˜ 핡심 μ‹œμž₯에 μœ„κΈ°λ₯Ό 맞고 μžˆλ‹€. 이 μœ„ν˜‘μ„ νƒ€κ°œν•˜κΈ° μœ„ν•΄ μ• ν”Œμ˜ 졜고 경영자인 νŒ€ 쿑은 쀑ꡭ을 λ°©λ¬Έν•΄ ν˜„μ§€ 슀마트폰 μ œμ‘°μ‚¬λ“€κ³Ό 접촉해 곡동 투자λ₯Ό λ…Όμ˜ν•˜λŠ” κ²ƒμœΌλ‘œ μ•Œλ €μ‘Œλ‹€. μ• ν”Œμ€ λ˜ν•œ 쀑ꡭ μ΅œλŒ€ 검색사 바이두(Baidu)의 인곡 지λŠ₯(AI) λͺ¨λΈμ„ 아이폰에 νƒ‘μž¬ν•˜λŠ” 것을 κ²€ν†  쀑인 κ²ƒμœΌλ‘œ μ „ν•΄μ‘Œλ‹€. μ• ν”Œμ€ κ΅­λ‚΄ 슀마트폰 μ œμ‘°μ‚¬λ“€μ΄ μžμ‹ λ“€μ˜ μ œν’ˆμ— κΈ‰ν•œ 할인을 λ‚΄λ†“μœΌλ©΄μ„œ μ€‘κ΅­μ—μ„œ κ°€κ²©μ „μŸμ— 직면해 μžˆλŠ” 것이닀.

# example
sentences = "Is it safe to drink milk and eat chicken?"
texts = prompt_template.format(sentences)
inputs = tokenizer(texts, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- μš°μœ μ™€ λ‹­κ³ κΈ°λŠ” μ•ˆμ „ν•œκ°€μš”?

# example
sentences = "What precautions to take during the bird flu outbreak"
texts = prompt_template.format(sentences)
inputs = tokenizer(texts, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- μ‘°λ₯˜ 독감 μœ ν–‰ μ‹œ μ–΄λ– ν•œ 주의 사항을 ν•΄μ•Ό ν•˜λŠ”μ§€

```