npc0 commited on
Commit
2b3c4db
1 Parent(s): fadde25

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -4
README.md CHANGED
@@ -26,9 +26,9 @@ and outperform many of the available open source and closed chat models on commo
26
 
27
  This repository stores a experimental IQ_1S quantized GGUF Llama 3.1 instruction tuned 70B model.
28
 
29
- ** Model developer **: Meta
30
 
31
- ** Model Architecture **: Llama 3.1 is an auto-regressive language model
32
  that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT)
33
  and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness
34
  and safety.
@@ -37,7 +37,7 @@ and safety.
37
  |---------------------|--------------------------------------------|------|-----------------|--------------------------|--------------|---|-----------|----------------|
38
  |Llama 3.1 (text only)|A new mix of publicly available online data.|70B |Multilingual Text|Multilingual Text and code|128k |Yes|15T+ |December 2023 |
39
 
40
- ** Supported languages **: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
41
 
42
  # Quantization Information
43
  |Weight Quantization| PPL |
@@ -47,4 +47,23 @@ and safety.
47
 
48
  Dataset used for re-calibration: Mix of [standard_cal_data](https://github.com/turboderp/exllamav2/tree/master/exllamav2/conversion/standard_cal_data)
49
 
50
- The generated `imatrix` can be downloaded from [imatrix.dat](https://huggingface.co/npc0/Meta-Llama-3.1-70B-Instruct-IQ_1S/resolve/main/imatrix.dat)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  This repository stores a experimental IQ_1S quantized GGUF Llama 3.1 instruction tuned 70B model.
28
 
29
+ **Model developer**: Meta
30
 
31
+ **Model Architecture**: Llama 3.1 is an auto-regressive language model
32
  that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT)
33
  and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness
34
  and safety.
 
37
  |---------------------|--------------------------------------------|------|-----------------|--------------------------|--------------|---|-----------|----------------|
38
  |Llama 3.1 (text only)|A new mix of publicly available online data.|70B |Multilingual Text|Multilingual Text and code|128k |Yes|15T+ |December 2023 |
39
 
40
+ **Supported languages**: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
41
 
42
  # Quantization Information
43
  |Weight Quantization| PPL |
 
47
 
48
  Dataset used for re-calibration: Mix of [standard_cal_data](https://github.com/turboderp/exllamav2/tree/master/exllamav2/conversion/standard_cal_data)
49
 
50
+ The generated `imatrix` can be downloaded from [imatrix.dat](https://huggingface.co/npc0/Meta-Llama-3.1-70B-Instruct-IQ_1S/resolve/main/imatrix.dat)
51
+
52
+ **Usage**: with `llama-cpp-python`
53
+ ```python
54
+ from llama_cpp import Llama
55
+
56
+ llm = Llama.from_pretrained(
57
+ repo_id="npc0/Meta-Llama-3.1-70B-Instruct-IQ_1S",
58
+ filename="GGUF_FILE",
59
+ )
60
+
61
+ llm.create_chat_completion(
62
+ messages = [
63
+ {
64
+ "role": "user",
65
+ "content": "What is the capital of France?"
66
+ }
67
+ ]
68
+ )
69
+ ```