Commit
75e3d06
1 Parent(s): fb510db

Update readme.md

Browse files
Files changed (1) hide show
  1. README.md +94 -0
README.md CHANGED
@@ -1,3 +1,97 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - ar
5
+ tags:
6
+ - alpaca
7
+ - llama3
8
+ - arabic
9
  ---
10
+
11
+ # 🚀 al-baka-llama3-8b
12
+ Al Baka is an Experimental Fine Tuned Model based on the new released LLAMA3-8B Model on the Stanford Alpaca dataset Arabic version [Yasbok/Alpaca_arabic_instruct](https://huggingface.co/datasets/Yasbok/Alpaca_arabic_instruct).
13
+
14
+ ## Model Summary
15
+
16
+ - **Model Type:** Causal decoder-only
17
+ - **Language(s):** Arabic
18
+ - **Base Model:** [LLAMA-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B))
19
+ - **Dataset:** [Yasbok/Alpaca_arabic_instruct](https://huggingface.co/datasets/Yasbok/Alpaca_arabic_instruct)
20
+
21
+ ## Model Details
22
+
23
+ - The model was fine-tuned in 4-bit precision using [unsloth](git+https://github.com/unslothai/unsloth.git)
24
+
25
+ - The run is performed only for 1000 steps with a single Google Colab T4 GPU NVIDIA GPU with 15 GB of available memory.
26
+
27
+
28
+ <span style="color:red">The model is currently being Experimentally Fine Tuned to assess LLaMA-3's response to Arabic, following a brief period of fine-tuning. Larger and more sophisticated models will be introduced soon.</span>
29
+
30
+ ## How to Get Started with the Model
31
+
32
+ ### Setup
33
+ ```python
34
+ # Install packages
35
+ !pip install accelerate bitsandbytes
36
+ %%capture
37
+ import torch
38
+ major_version, minor_version = torch.cuda.get_device_capability()
39
+ !pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
40
+ if major_version >= 8:
41
+ # Use this for new GPUs like Ampere, Hopper GPUs (RTX 30xx, RTX 40xx, A100, H100, L40)
42
+ !pip install --no-deps packaging ninja einops flash-attn xformers trl peft accelerate bitsandbytes
43
+ else:
44
+ # Use this for older GPUs (V100, Tesla T4, RTX 20xx)
45
+ !pip install --no-deps xformers trl peft accelerate bitsandbytes
46
+ pass
47
+ ```
48
+
49
+ ### First, Load the Model
50
+ ```python
51
+ from unsloth import FastLanguageModel
52
+ import torch
53
+ max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
54
+ dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
55
+ load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
56
+
57
+
58
+ model, tokenizer = FastLanguageModel.from_pretrained(
59
+ model_name = "Omartificial-Intelligence-Space/al-baka-16bit-llama3-8b",
60
+ max_seq_length = max_seq_length,
61
+ dtype = dtype,
62
+ load_in_4bit = load_in_4bit,
63
+ # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
64
+ )
65
+ ```
66
+
67
+ ### Second, Try the model
68
+ ```python
69
+ alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
70
+
71
+ ### Instruction:
72
+ {}
73
+
74
+ ### Input:
75
+ {}
76
+
77
+ ### Response:
78
+ {}"""
79
+
80
+ # alpaca_prompt = Copied from above
81
+ FastLanguageModel.for_inference(model) # Enable native 2x faster inference
82
+ inputs = tokenizer(
83
+ [
84
+ alpaca_prompt.format(
85
+ "استخدم البيانات المعطاة لحساب الوسيط.", # instruction
86
+ "[2 ، 3 ، 7 ، 8 ، 10]", # input
87
+ "", # output - leave this blank for generation!
88
+ )
89
+ ], return_tensors = "pt").to("cuda")
90
+
91
+ outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
92
+ tokenizer.batch_decode(outputs)
93
+ ```
94
+
95
+ ### Recommendations
96
+
97
+ - [unsloth](git+https://github.com/unslothai/unsloth.git) for finetuning models. You can get a 2x faster finetuned model which can be exported to any format or uploaded to Hugging Face.