File size: 5,713 Bytes
810b5e1 38258a1 3c46ffe 38258a1 3c46ffe 9d046a7 3c46ffe 810b5e1 38258a1 f1edb79 38258a1 f1edb79 38258a1 f1edb79 5dea472 38258a1 f1edb79 38258a1 ef5df60 38258a1 8859b97 38258a1 8859b97 38258a1 9d046a7 38258a1 5dea472 38258a1 5dea472 38258a1 5dea472 38258a1 9d046a7 38258a1 8859b97 38258a1 8859b97 38258a1 9d046a7 38258a1 5dea472 38258a1 5dea472 38258a1 5dea472 38258a1 8859b97 38258a1 9d046a7 38258a1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 |
---
language:
- en
- id
license: cc-by-nc-4.0
library_name: peft
tags:
- qlora
- wizardlm
- uncensored
- instruct
- alpaca
datasets:
- MBZUAI/Bactrian-X
pipeline_tag: text-generation
base_model: nferroukhi/WizardLM-Uncensored-Falcon-7b-sharded-bf16
---
# DukunLM - Indonesian Language Model ๐งโโ๏ธ
๐ Welcome to the DukunLM repository! DukunLM is an open-source language model trained to generate Indonesian text using the power of AI. DukunLM, meaning "WizardLM" in Indonesian, is here to revolutionize language generation with its massive 7 billion parameters! ๐
## Model Details
[![Open in Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1WYhhfvFzQukGzEqWHu3gKmigStJTjWxV?usp=sharing)
- Model: [nferroukhi/WizardLM-Uncensored-Falcon-7b-sharded-bf16](https://huggingface.co/nferroukhi/WizardLM-Uncensored-Falcon-7b-sharded-bf16)
- Base Model: [ehartford/WizardLM-Uncensored-Falcon-7b](https://huggingface.co/ehartford/WizardLM-Uncensored-Falcon-7b)
- Fine-tuned with: [MBZUAI/Bactrian-X (Indonesian subset)](https://huggingface.co/datasets/MBZUAI/Bactrian-X/viewer/id/train)
- Prompt Format: [Alpaca](https://github.com/tatsu-lab/stanford_alpaca)
- Fine-tuned method: [QLoRA](https://github.com/artidoro/qlora)
โ ๏ธ **Warning**: DukunLM is an uncensored model without filters or alignment. Please use it responsibly as it may contain errors, cultural biases, and potentially offensive content. โ ๏ธ
## Installation
To use DukunLM, ensure that PyTorch has been installed and that you have an Nvidia GPU (or use Google Colab). After that you need to install the required dependencies:
```bash
pip install -U git+https://github.com/huggingface/transformers.git
pip install -U git+https://github.com/huggingface/peft.git
pip install -U bitsandbytes==0.39.0
pip install -U einops==0.6.1
```
## How to Use
### Stream Output
```python
import torch
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer, BitsAndBytesConfig, TextStreamer
model = AutoPeftModelForCausalLM.from_pretrained(
"azale-ai/DukunLM-Uncensored-7B",
load_in_4bit=True,
torch_dtype=torch.float32,
trust_remote_code=True,
quantization_config=BitsAndBytesConfig(
load_in_4bit=True,
llm_int8_threshold=6.0,
llm_int8_has_fp16_weight=False,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
)
)
tokenizer = AutoTokenizer.from_pretrained("azale-ai/DukunLM-Uncensored-7B")
streamer = TextStreamer(tokenizer)
instruction_prompt = "Jelaskan mengapa air penting bagi kehidupan manusia."
input_prompt = ""
if input_prompt == "":
text = f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{instruction_prompt}
### Response:
"""
else:
text = f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{instruction_prompt}
### Input:
{input_prompt}
### Response:
"""
inputs = tokenizer(text, return_tensors="pt").to("cuda")
_ = model.generate(
inputs=inputs.input_ids,
streamer=streamer,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
max_length=2048, temperature=0.7,
do_sample=True, top_k=4, top_p=0.95
)
```
### No Stream Output
```python
import torch
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer, BitsAndBytesConfig
model = AutoPeftModelForCausalLM.from_pretrained(
"azale-ai/DukunLM-Uncensored-7B",
load_in_4bit=True,
torch_dtype=torch.float32,
trust_remote_code=True,
quantization_config=BitsAndBytesConfig(
load_in_4bit=True,
llm_int8_threshold=6.0,
llm_int8_has_fp16_weight=False,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
)
)
tokenizer = AutoTokenizer.from_pretrained("azale-ai/DukunLM-Uncensored-7B")
instruction_prompt = "Bangun dialog chatbot untuk layanan pelanggan yang ingin membantu pelanggan memesan produk tertentu."
input_prompt = "Produk: Sepatu Nike Air Max"
if input_prompt == "":
text = f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{instruction_prompt}
### Response:
"""
else:
text = f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{instruction_prompt}
### Input:
{input_prompt}
### Response:
"""
inputs = tokenizer(text, return_tensors="pt").to("cuda")
_ = model.generate(
inputs=inputs.input_ids,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
max_length=2048, temperature=0.7,
do_sample=True, top_k=4, top_p=0.95
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Limitations
- The base model language is English and fine-tuned to Indonesia
- Cultural and contextual biases
## License
DukunLM is licensed under the [Creative Commons NonCommercial (CC BY-NC 4.0) license](https://creativecommons.org/licenses/by-nc/4.0/legalcode).
## Contributing
We welcome contributions to enhance and improve DukunLM. If you have any suggestions or find any issues, please feel free to open an issue or submit a pull request.
## Contact Us
[contact@azale.ai](mailto:contact@azale.ai) |