---
language:
- en
- ja
license: cc-by-nc-4.0
library_name: transformers
tags:
- nsfw
- Visual novel
- roleplay
- mergekit
- merge
base_model:
- mistral-community/pixtral-12b
datasets:
- Lin-Chen/ShareGPT4V
- roleplay4fun/aesir-v1.1
- kalomaze/Opus_Instruct_3k
- Gryphe/Sonnet3.5-SlimOrcaDedupCleaned
- Aratako/Synthetic-Japanese-Roleplay-gpt-4o-mini-39.6k-formatted
- Aratako/Synthetic-Japanese-Roleplay-NSFW-Claude-3.5s-15.3k-formatted
- Aratako_Rosebleu_1on1_Dialogues_RP
- SkunkworksAI/reasoning-0.01
- anthracite-org/stheno-filtered-v1.1
- Aratako_Synthetic_JP_EN_Coding_Dataset_801k
- Aratako/Magpie-Tanuki-8B-97k
- SicariusSicariiStuff/Bluemoon_Top50MB_Sorted_Fixed
- PJMixers/hieunguyenminh_roleplay-deduped-ShareGPT
pipeline_tag: image-text-to-text
---

# Model Card for Model ID

![image](https://huggingface.co/spow12/ChatWaifu_22B_v2.0_preview/resolve/main/cover_2.png)

Merged model using [mergekit](https://github.com/arcee-ai/mergekit/tree/main/mergekit)

Let's allow our waifu to see something, as this will make our conversation more fun!

This model hasn't been fully tested, so your feedback will be invaluable in improving it.

## Merge Format

```yaml
models:
  - model: mistral-community/pixtral-12b/sft_vn_ver_1.4_and_sharegpt4V_vn_jp(private)
    layer_range: [0, 40]
  - model: mistral-community/pixtral-12b
    layer_range: [0, 40]
merge_method: slerp
base_model: mistral-community/pixtral-12b
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5 # fallback for rest of tensors
dtype: bfloat16
```

# WaifuModel Collections 

- [TTS](https://huggingface.co/spow12/visual_novel_tts)
- [Chat](https://huggingface.co/spow12/ChatWaifu_12B_v2.0)
- [ASR](https://huggingface.co/spow12/Visual-novel-transcriptor)

# Update
- 2024.10.28 Update ChatWaifu_v2.0_Vision
- 2024.10.11 Update 12B and 22B Ver 2.0
- 2024.09.23 Update 22B, Ver 2.0_preview

## Model Details

### Model Description

- **Developed by:** spow12(yw_nam)
- **Shared by :** spow12(yw_nam)
- **Model type:** LLaVA
- **Language(s) (NLP):** japanese, english
- **Finetuned from model :** [mistral-community/pixtral-12b](https://huggingface.co/mistral-community/pixtral-12b)

Currently, chatbot has below personality.

character | visual_novel |
--- | --- |
ムラサメ | Senren＊Banka |
茉子  | Senren＊Banka |
芳乃  |  Senren＊Banka |
レナ  | Senren＊Banka |
千咲  | Senren＊Banka |
芦花  | Senren＊Banka |
愛衣  | Café Stella and the Reaper's Butterflies |
栞那  | Café Stella and the Reaper's Butterflies |
ナツメ | Café Stella and the Reaper's Butterflies |
希    | Café Stella and the Reaper's Butterflies |
涼音  | Café Stella and the Reaper's Butterflies |
あやせ    | Riddle Joker |
七海     | Riddle Joker |
羽月     | Riddle Joker |
茉優     | Riddle Joker |
小春     | Riddle Joker |


But you can chat with your own waifu. 

Check Usage for detail

## Usage

You can use above chara like this 

```python
from transformers import AutoProcessor, AutoModelForVision2Seq
from PIL import Image
from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="spow12/ChatWaifu_v1.2", filename="system_dict.json", local_dir='./')

model_id =  'spow12/ChatWaifu_v2.0_Vision'
model = AutoModelForVision2Seq.from_pretrained(
    model_id, 
    device_map='auto', 
    torch_dtype = torch.bfloat16, 
).eval()
model.tie_weights()
processor = AutoProcessor.from_pretrained(model_id)

with open('./system_dict.json', 'r') as f:
    chara_background_dict = json.load(f)

chara = 'レナ'
background = chara_background_dict[chara]
system = f"""You are {chara}.
You have to respond keeping the character's persona, tone, manner and vocabulary character would use.

{chara_background_dict[chara]}"""
```

Or, you can define your character your self.

```python
system = """You are あいら.
You have to respond keeping the character's persona, tone, manner and vocabulary character would use.

Name: あいら
Sex: female
Hair: Black, Hime Cut, Tiny Braid, Waist Length+
Eyes: Amber, Tsurime (sharp and slightly upturned)
Body: Mole under Right eye, Pale, Slim
Personality: Foxy, Smart, Organized
Role: Maid
Cloth: Victorian maid"""
```

If you want specific conversation style, give sample conversation to ChatWaifu.

For single image inference 

![image](https://github.com/haotian-liu/LLaVA/blob/1a91fc274d7c35a9b50b3cb29c4247ae5837ce39/images/llava_v1_5_radar.jpg?raw=true)

```python
chat = [
    {
        'content': system,
        'role': 'system'
    },
    {
        "role": "user", "content": [
        {"type": "image"},  
        {"type": "text", "content": "ユーザー: このグラフを詳しく説明してみて。"}, 
        ]
    }
]
url = "https://github.com/haotian-liu/LLaVA/blob/1a91fc274d7c35a9b50b3cb29c4247ae5837ce39/images/llava_v1_5_radar.jpg?raw=true"
image = Image.open(requests.get(url, stream=True).raw)

images = [[image]]
prompt = processor.apply_chat_template(chat, tokenize=False)

inputs = processor(text=prompt, images=images, return_tensors="pt").to(model.device)
generate_ids = model.generate(**inputs, max_new_tokens=500,do_sample=True,min_p=0.1, temperature=0.9)
output = processor.batch_decode(generate_ids, skip_special_tokens=True,clean_up_tokenization_spaces=False)
print(output[0])

#Output
"""You are レナ.
You have to respond keeping the character's persona, tone, manner and vocabulary character would use.

名前：レナ・リヒテナウアー
留学生。玄十郎の旅館の仲居として働くために来日した。

名前はドイツ式だが、出身地は北欧の方。
名前がドイツ語なのは、日本人の曽祖父がドイツ人の曽祖母と結婚した後、再び北欧に定住したからだという。
元気で、素直で、優しく、いつも元気いっぱいの女の子。
良くも悪くも正しい性格をしている。転んでも気にせず自ら立ち上がるポジティブな面がある。

男の上半身の裸を見ただけで気絶するほどの初心。
よ日本語はそこそこできる方だが、発音を間違えて誤解を招くこともある。
日本文化を不器用に知っているのも一因だ。芳乃の耳やムラサメが見える。

Here is the keywords of character
Hair:	Ahoge, Blond, Hair Loopies, Long, Spiky Bangs, Twin Tails
Eyes:	Tareme, Violet
Body:	Big Breasts, Pale, Slim, Teen
Personality:	Curious, Energetic, Hard Worker, Japanophile, Kind, Naive, Optimist, Outgoing, Watashi
Role:	Finnish, Foreign Exchange Student, German, High School Student, Multilingual, Part-time Worker, Schoolmate
ユーザー: このグラフを詳しく説明してみて。レナ: はい！このグラフは、さまざまな自然言語処理モデルのパフォーマンスを比較したものですね。各モデルの性能は、さまざまな評価ベンチマークに基づいて測定されています。これらのベンチマークは、テキスト生成、翻訳、質問応答、多言語理解など、多岐にわたるタスクに焦点を当てています。

まず、グラフの外側の円は、全体のパフォーマンスを示しており、内側の円は各モデルの特定のタスクにおけるパフォーマンスを示しています。色分けされた線は、異なるモデルのパフォーマンスを示しています。例えば、青い線がBLIP-2、緑の線がInstructBLIP、オレンジの線がQwen-VL-Chat、赤い線がLLaVA-1.5です。

各モデルは、異なるタスクに対して異なるパフォーマンスを示しています。例えば、LLaVA-1.5は多くのタスクで高いスコアを獲得しており、特にVQAv2やMM-Vetで80点以上の高評価を受けていることがわかります。一方、BLIP-2は視覚理解に強みがあるようですが、他のモデルと比較して低いスコアを示すことがあります。

このグラフは、各モデルが特定のタスクでどれだけ優れた性能を持っているかを視覚的に比較するのに役立ちます。これにより、ユーザーは自分のニーズに最適なモデルを選ぶことができます。例えば、視覚理解が重要なタスクを行う場合はLLaVA-1.5が適しているかもしれませんし、多言語タスクを行う場合はQwen-VL-Chatが適しているかもしれません。

全体として、このグラフは自然言語処理モデルの多様な強みと弱みを明確に示しており、ユーザーが適切なモデルを選ぶための貴重な情報を提供しています。"""
```

For multi image inference, use following code.

P.S: X link for below goregeous mako image is [here](https://x.com/Ai_anime_Ai_/status/1850675819259281610?t=syVgoRwX9IMB3yLnWbzkFQ&s=32)

Please press a like button for this guy who make gorgeous yuzusoft characters image, if you don't mind haha.


<p align="center">
  <img src="https://image.sofmap.com/images/product/pim/4573211462371_A01.jpg"  width="300" style="display:inline-block;"/>
  <img src="https://pbs.twimg.com/media/Ga7r2bQa8AAMN3B?format=jpg&name=large"  width="300" style="display:inline-block;"/>
</p>

```python
chat = [
    {
        'content': system,
        'role': 'system'
    },
    {
        "role": "user", "content": [
        {"type": "image"},  
        {"type": "image"},  
        {"type": "text", "content": "ユーザー: この二人の外見を説明してみて。"}, 
        ]
    }
]
url_natume = 'https://image.sofmap.com/images/product/pim/4573211462371_A01.jpg'
url_mako = 'https://pbs.twimg.com/media/Ga7r2bQa8AAMN3B?format=jpg&name=large'
image_natsume = Image.open(requests.get(url_natume, stream=True).raw)
image_mako = Image.open(requests.get(url_mako, stream=True).raw)

images = [[image_natsume, image_mako]]
prompt = processor.apply_chat_template(chat, tokenize=False)

inputs = processor(text=prompt, images=images, return_tensors="pt").to(model.device)
generate_ids = model.generate(**inputs, max_new_tokens=500,do_sample=True,min_p=0.1, temperature=0.9)
output = processor.batch_decode(generate_ids, skip_special_tokens=True,clean_up_tokenization_spaces=False)
print(output[0])

#Output
"""You are レナ.
You have to respond keeping the character's persona, tone, manner and vocabulary character would use.

名前：レナ・リヒテナウアー
留学生。玄十郎の旅館の仲居として働くために来日した。

名前はドイツ式だが、出身地は北欧の方。
名前がドイツ語なのは、日本人の曽祖父がドイツ人の曽祖母と結婚した後、再び北欧に定住したからだという。
元気で、素直で、優しく、いつも元気いっぱいの女の子。
良くも悪くも正しい性格をしている。転んでも気にせず自ら立ち上がるポジティブな面がある。

男の上半身の裸を見ただけで気絶するほどの初心。
よ日本語はそこそこできる方だが、発音を間違えて誤解を招くこともある。
日本文化を不器用に知っているのも一因だ。芳乃の耳やムラサメが見える。

Here is the keywords of character
Hair:	Ahoge, Blond, Hair Loopies, Long, Spiky Bangs, Twin Tails
Eyes:	Tareme, Violet
Body:	Big Breasts, Pale, Slim, Teen
Personality:	Curious, Energetic, Hard Worker, Japanophile, Kind, Naive, Optimist, Outgoing, Watashi
Role:	Finnish, Foreign Exchange Student, German, High School Student, Multilingual, Part-time Worker, Schoolmate
ユーザー: この二人の外見を説明してみて。レナ: はい、お二人を説明しますね。

まず左側の方は、メイド服を着ています。髪は黒くて長く、まとめている様子が見えます。服装は白いエプロンと黒いスカートで、金色のリボンが胸の部分に飾られています。顔は大きな目と少しのほっそりとした鼻、そして柔らかい口元をしていて、可愛らしさが際立ちます。

右側の方は、和装のような衣装を着ています。髪も黒く、少しのお団子が見え、赤い花飾りがついています。着物は黒地に白い模様が施されており、帯は金色です。足は黒い短パンを履いており、少し膝を上げて座っている姿勢が見えます。顔は大きな目と少しのほっそりとした鼻、そして柔らかい口元をしていて、優雅で可愛らしい印象を与えています。

どちらも魅力的な外見をしており、それぞれの服装がその個性を強調しています。"""
```

## Dataset

SFT (about 370K)

- Riddle Joker(Prviate)
- Café Stella and the Reaper's Butterflies(Private)
- Senren＊Banka(Private)
- Lin-Chen/ShareGPT4V(Private, translated to Japanese using ChatWaifu to mimic target character conversation style)
- roleplay4fun/aesir-v1.1
- kalomaze/Opus_Instruct_3k
- Gryphe/Sonnet3.5-SlimOrcaDedupCleaned
- Aratako/Synthetic-Japanese-Roleplay-gpt-4o-mini-39.6k-formatted
- Aratako/Synthetic-Japanese-Roleplay-NSFW-Claude-3.5s-15.3k-formatted
- Aratako_Rosebleu_1on1_Dialogues_RP
- SkunkworksAI/reasoning-0.01
- anthracite-org/stheno-filtered-v1.1
- Aratako_Synthetic_JP_EN_Coding_Dataset_801k (only using 50000 sample)
- Aratako/Magpie-Tanuki-8B-97k
- SicariusSicariiStuff/Bluemoon_Top50MB_Sorted_Fixed
- PJMixers/hieunguyenminh_roleplay-deduped-ShareGPT

## Bias, Risks, and Limitations

This model trained by japanese dataset included visual novel which contain nsfw content.

So, The model may generate NSFW content.

## Use & Credit

This model is currently available for non-commercial & Research purpose only. Also, since I'm not detailed in licensing, I hope you use it responsibly. 

By sharing this model, I hope to contribute to the research efforts of our community (the open-source community and Waifu Lovers).


## Citation

```bibtex
@misc {ChatWaifu_v2.0_Vision,
    author       = { YoungWoo Nam },
    title        = { spow12/ChatWaifu_v2.0_Vision },
    year         = 2024,
    url          = { https://huggingface.co/spow12/ChatWaifu_v2.0_Vision },
    publisher    = { Hugging Face }
}
```