Hermes-3-Llama-3.1-8B-lorablated-exl2

Model: Hermes-3-Llama-3.1-8B-lorablated
Created by: mlabonne
Based on: Hermes-3-Llama-3.1-8B

Quants

4bpw h6
4.5bpw h6
5bpw h6
6bpw h6
8bpw h8

Quantization notes

Made with Exllamav2 0.1.8 with the default dataset.
I'm not sure how well it works with Text-Generation-WebUI considering that this model uses some unusual RoPE mechanics and I have no idea how TGW handles it.
For some reason this model worked extremely slow with my TGW install but was perfectly fine with TabbyAPI.

How to run

I recommend using TabbyAPI for this model. The model requires a decent Nvidia RTX card on Windows/Linux or a decent AMD GPU on Linux.
It requires to be fully loaded in GPU to work, so if your GPU has too small VRAM you should use GGUF version instead.
If you have Nvidia GTX card you should also use GGUF instead.

Orignal model card

Hermes-3-Llama-3.1-8B-lorablated

70B version: mlabonne/Hermes-3-Llama-3.1-70B-lorablated

This is an uncensored version of NousResearch/Hermes-3-Llama-3.1-8B using lorablation.

You can see in the following example how Hermes 3 refuses to answer a legitimate question while the abliterated model complies:

The recipe is based on @grimjim's grimjim/Llama-3.1-8B-Instruct-abliterated_via_adapter (special thanks):

Extraction: We extract a LoRA adapter by comparing two models: a censored Llama 3.1 (meta-llama/Meta-Llama-3.1-8B-Instruct) and an abliterated Llama 3.1 (mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated).
Merge: We merge this new LoRA adapter using task arithmetic to the censored NousResearch/Hermes-3-Llama-3.1-8B to abliterate it.

See this article to learn more about abliteration.

⚡ Quantization

GGUF: https://huggingface.co/mlabonne/Hermes-3-Llama-3.1-8B-lorablated-GGUF

🧩 Configuration

This model was merged using the task arithmetic merge method using NousResearch/Hermes-3-Llama-3.1-8B + Llama-3.1-8B-Instruct-abliterated-LORA as a base.

The following YAML configuration was used to produce this model:

base_model: NousResearch/Hermes-3-Llama-3.1-8B+Llama-3.1-8B-Instruct-abliterated-LORA
dtype: bfloat16
merge_method: task_arithmetic
parameters:
  normalize: false
slices:
- sources:
  - layer_range: [0, 32]
    model: NousResearch/Hermes-3-Llama-3.1-8B+Llama-3.1-8B-Instruct-abliterated-LORA
    parameters:
      weight: 1.0

You can reproduce this model using the following commands:

# Setup
git clone https://github.com/arcee-ai/mergekit.git
cd mergekit && pip install -e .
pip install bitsandbytes

# Extraction
mergekit-extract-lora mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated meta-llama/Meta-Llama-3.1-8B-Instruct Llama-3.1-8B-Instruct-abliterated-LORA --rank=64

# Merge using previous config
mergekit-yaml config.yaml Hermes-3-Llama-3.1-8B-lorablated --allow-crimes --lora-merge-cache=./cache

cgus
/

Hermes-3-Llama-3.1-8B-lorablated-exl2