FLFL / README.md
Calvin-Xu's picture
Update README.md
efce917 verified
---
license: mit
datasets:
- Calvin-Xu/FLFL-Aozora-Speech-Train
language:
- ja
metrics:
- sacrebleu
pipeline_tag: text2text-generation
---
# FLFL ใƒ•ใƒชใƒ•ใƒช
Furigana (ruby) generation model.
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
torch_dtype = torch.bfloat16 if torch.cuda.is_available() and hasattr(torch.cuda, "is_bf16_supported") and torch.cuda.is_bf16_supported() else torch.float16
model = AutoModelForCausalLM.from_pretrained("Calvin-Xu/FLFL", device_map="auto", torch_dtype=torch_dtype)
tokenizer = AutoTokenizer.from_pretrained("Calvin-Xu/FLFL")
prompt_template = """[INST] {instruction}\n{input}\n[/INST]\n"""
sentence = "ๅ›ฝๅขƒใฎ้•ทใ„ใƒˆใƒณใƒใƒซใ‚’ๆŠœใ‘ใ‚‹ใจ้›ชๅ›ฝใงใ‚ใฃใŸ"
inputs = tokenizer(prompt_template.format(instruction="ๆฌกใฎๆ–‡ใซๆญฃ็ขบใซๆŒฏใ‚Šไปฎๅใ‚’ไป˜ใ‘ใฆใใ ใ•ใ„", input=sentence), return_tensors="pt").to(model.device)
with torch.no_grad():
tokens = model.generate(**inputs, max_new_tokens=512, do_sample=False)
output = tokenizer.decode(tokens[0], skip_special_tokens=False)
print(output)
# <ruby>ๅ›ฝๅขƒ<rt>ใใซใ–ใ‹ใ„</rt></ruby>ใฎ<ruby>้•ท<rt>ใชใŒ</rt></ruby>ใ„ใƒˆใƒณใƒใƒซใ‚’<ruby>ๆŠœ<rt>ใฌ</rt></ruby>ใ‘ใ‚‹ใจ<ruby>้›ชๅ›ฝ<rt>ใ‚†ใใใซ</rt></ruby>ใงใ‚ใฃใŸ<|endoftext|>
```
### Finetuned from
[stockmark/gpt-neox-japanese-1.4b](https://huggingface.co/stockmark/gpt-neox-japanese-1.4b)
### Training Dataset
Trained for slightly over one epoch on [Calvin-Xu/FLFL-Aozora-Speech-Train](https://huggingface.co/datasets/Calvin-Xu/FLFL-Aozora-Speech-Train)
### Training Settings
HuggingFace Trainer, PEFT (r=64, alpha=128)
Control tokens added: `[INST]`, ` [/INST]`, `<ruby>`, `</ruby>`, `<rt>`, `</rt>`
### Output Examples
```
[INST] ๆฌกใฎๆ–‡ใซๆญฃ็ขบใซๆŒฏใ‚Šไปฎๅใ‚’ไป˜ใ‘ใฆใใ ใ•ใ„
ๅ›ฝๅขƒใฎ้•ทใ„ใƒˆใƒณใƒใƒซใ‚’ๆŠœใ‘ใ‚‹ใจ้›ชๅ›ฝใงใ‚ใฃใŸ
[/INST]
<ruby>ๅ›ฝๅขƒ<rt>ใใซใ–ใ‹ใ„</rt></ruby>ใฎ<ruby>้•ท<rt>ใชใŒ</rt></ruby>ใ„ใƒˆใƒณใƒใƒซใ‚’<ruby>ๆŠœ<rt>ใฌ</rt></ruby>ใ‘ใ‚‹ใจ<ruby>้›ชๅ›ฝ<rt>ใ‚†ใใใซ</rt></ruby>ใงใ‚ใฃใŸ<|endoftext|>
```
- <ruby>้ฐค<rt>ใถใ‚Š</rt></ruby>ใฎ<ruby>็…ง<rt>ใฆ</rt></ruby>ใ‚Š<ruby>็„ผ<rt>ใ‚„</rt></ruby>ใใ€<ruby>ๅ…ซๅฎ่œ<rt>ใฏใฃใฝใ†ใ•ใ„</rt></ruby>ใ€ใƒใƒณใƒใƒผใ‚ฐใ€‚<|endoftext|>
- <ruby>ไธป่œ<rt>ใ—ใ‚…ใ•ใ„</rt></ruby><ruby>้–ข้€ฃ<rt>ใ‹ใ‚“ใ‚Œใ‚“</rt></ruby>ใฏใ€<ruby>่ฆ‹ไบ‹<rt>ใฟใ”ใจ</rt></ruby>ใชใพใงใฎ<ruby>ๅ’Œๆด‹<rt>ใ‚ใ‚ˆใ†</rt></ruby><ruby>ไธญ<rt>ใกใ‚…ใ†</rt></ruby><ruby>ๆŠ˜่กท<rt>ใ›ใฃใกใ‚…ใ†</rt></ruby>ใ€‚<|endoftext|>
- <ruby>ๅˆฅ<rt>ในใค</rt></ruby>ใฎ<ruby>่€…<rt>ใ‚‚ใฎ</rt></ruby>ใฎ<ruby>็›ฎ<rt>ใ‚</rt></ruby>ใ‚’<ruby>้€š<rt>ใคใ†</rt></ruby>ใ˜ใฆ<ruby>ๆญดๅฒ<rt>ใ‚Œใใ—</rt></ruby>ใ‚’<ruby>ๅžฃ้–“่ฆ‹<rt>ใ‹ใ„ใพใฟ</rt></ruby>ใ‚‰ใ‚Œใ‚‹ใจใฏใ€<ruby>ๆƒณๅƒ<rt>ใใ†ใžใ†</rt></ruby>ใ‚’<ruby>่ถ…<rt>ใ“</rt></ruby>ใˆใ‚‹<ruby>ไฝ“้จ“<rt>ใŸใ„ใ‘ใ‚“</rt></ruby>ใซ<ruby>้•<rt>ใกใŒ</rt></ruby>ใ„ใชใ„!<|endoftext|>
- <ruby>ๆญข<rt>ใจ</rt></ruby>ใ‚ใ‚‹ใชใ‚‰ใ€ใใฎ<ruby>ๅคงๆœฌ<rt>ใŠใŠใ‚‚ใจ</rt></ruby>ใ‚’<ruby>ๆ น็ตถ<rt>ใญใ </rt></ruby>ใ‚„ใ—ใซใ—ใชใ„ใจ<ruby>ๅŠนๆžœ<rt>ใ“ใ†ใ‹</rt></ruby>ใŒใชใ„ใ‚<|endoftext|>
- <ruby>ไธไบบๆฐ—<rt>ใตใซใ‚“ใ</rt></ruby><ruby>้Š˜ๆŸ„<rt>ใ‹ใถ</rt></ruby>ใงใ“ใ‚Œ<ruby>ไปฅไธŠ<rt>ใ„ใ˜ใ‚‡ใ†</rt></ruby><ruby>ไพกๅ€ค<rt>ใ‹ใก</rt></ruby>ใŒ<ruby>ไธ‹<rt>ใ•</rt></ruby>ใŒใ‚Šใ‚ˆใ†ใชใ„ใ‹ใ‚‰ใ€ใปใจใ‚“ใฉ<ruby>ๅบ•ๅ€ค<rt>ใใ“ใญ</rt></ruby>ใ <|endoftext|>
- <ruby>ๆ™‚้–“<rt>ใ˜ใ‹ใ‚“</rt></ruby>ใฎ<ruby>ๆพฑ<rt>ใŠใ‚Š</rt></ruby>ใฎ<ruby>ไธญ<rt>ใชใ‹</rt></ruby>ใซ<ruby>ๆฒˆๆฎฟ<rt>ใกใ‚“ใŸใ„</rt></ruby>ใ—ใฆใ„ใŸใ‚ˆใ†ใ ใ€‚<|endoftext|>