Upload 21 files

Files changed (21) hide show

Merak-7B-v2.ggmlv3.q2_K.bin +3 -0
Merak-7B-v2.ggmlv3.q3_K.bin +3 -0
Merak-7B-v2.ggmlv3.q3_K_L.bin +3 -0
Merak-7B-v2.ggmlv3.q3_K_M.bin +3 -0
Merak-7B-v2.ggmlv3.q3_K_S.bin +3 -0
Merak-7B-v2.ggmlv3.q4_0.bin +3 -0
Merak-7B-v2.ggmlv3.q4_1.bin +3 -0
Merak-7B-v2.ggmlv3.q4_K.bin +3 -0
Merak-7B-v2.ggmlv3.q4_K_M.bin +3 -0
Merak-7B-v2.ggmlv3.q4_K_S.bin +3 -0
Merak-7B-v2.ggmlv3.q5_0.bin +3 -0
Merak-7B-v2.ggmlv3.q5_1.bin +3 -0
Merak-7B-v2.ggmlv3.q5_K.bin +3 -0
Merak-7B-v2.ggmlv3.q5_K_M.bin +3 -0
Merak-7B-v2.ggmlv3.q5_K_S.bin +3 -0
Merak-7B-v2.ggmlv3.q6_K.bin +3 -0
Merak-7B-v2.ggmlv3.q8_0.bin +3 -0
Notice +1 -0
README.md +226 -0
USE_POLICY.md +50 -0
config.json +25 -0

Merak-7B-v2.ggmlv3.q2_K.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:61df1704d3931a41c39f4cae1fd61b098c2e743b199457b39543a953282e6276
+size 2866807424

Merak-7B-v2.ggmlv3.q3_K.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dd685a94e7eb2c675a2cc7dd551e9a27f19972a34cf4edea5102d847296c3b93
+size 3282248320

Merak-7B-v2.ggmlv3.q3_K_L.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c7e34d9e2a45eb437c2dcffaf8b2e9f083de7f3df990c381eff3972008b8dfdb
+size 3596821120

Merak-7B-v2.ggmlv3.q3_K_M.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dd685a94e7eb2c675a2cc7dd551e9a27f19972a34cf4edea5102d847296c3b93
+size 3282248320

Merak-7B-v2.ggmlv3.q3_K_S.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ddc5dbd282f3421010d2ae25bb7592690d0d652fa58152a3fd610937ffd343ed
+size 2948014720

Merak-7B-v2.ggmlv3.q4_0.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ef369ae99188fa31efa8ce8064dfb1f2569bc2f0b7d88df007954006e23d4c73
+size 3825517184

Merak-7B-v2.ggmlv3.q4_1.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2302c2d294ff540f8478c03b7dd3fe89ef2ff92c2dcab2b3be1225c9bafe90fc
+size 4238459520

Merak-7B-v2.ggmlv3.q4_K.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a6f7833e2105ff45600090c6e1b151e12f31edf4e661690e69d2465920aaa3c7
+size 4080714368

Merak-7B-v2.ggmlv3.q4_K_M.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a6f7833e2105ff45600090c6e1b151e12f31edf4e661690e69d2465920aaa3c7
+size 4080714368

Merak-7B-v2.ggmlv3.q4_K_S.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:302fa7eb5f309f29e518749a3f212a9899c2718b5989e1806c1bd5b4922882b2
+size 3825517184

Merak-7B-v2.ggmlv3.q5_0.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:30235f60a776ab67bab45bed8f4b59ac245060ba458a4a1c208636023764cdd7
+size 4651401856

Merak-7B-v2.ggmlv3.q5_1.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a6da242005ae7e49fb1b673440d7487b06250872e9e39aa8f59502d2a19ebba6
+size 5064344192

Merak-7B-v2.ggmlv3.q5_K.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d385dbd36822e630d2a283c8a45e18657cfa8cb689fda83173798c3bc1bc51a0
+size 4782867072

Merak-7B-v2.ggmlv3.q5_K_M.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d385dbd36822e630d2a283c8a45e18657cfa8cb689fda83173798c3bc1bc51a0
+size 4782867072

Merak-7B-v2.ggmlv3.q5_K_S.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:efd2ccf1ec630b9697d1d9cec399856f21653803f01954e7fead471a82ef2f8f
+size 4651401856

Merak-7B-v2.ggmlv3.q6_K.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:daf9bd82304f1aad497c94b21ef18c5ff1b3e78060183fddbc5e56f0473d0c04
+size 5528904320

Merak-7B-v2.ggmlv3.q8_0.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:be25a29674d376ec78da75ffad692920bd13f1d213b5e9ebba41ef7e366329ad
+size 7129055872

Notice ADDED Viewed

	@@ -0,0 +1 @@


1	+ Llama 2 is licensed under the LLAMA 2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.

README.md CHANGED Viewed

@@ -1,3 +1,229 @@
 ---
 license: llama2
 ---

 ---
 license: llama2
+model_type: llama
+inference: false
+datasets:
+- wikipedia
+language:
+- id
+- en
+pipeline_tag: text-generation
+tags:
+- facebook
+- meta
+- pytorch
+- llama
+- llama-2
 ---
+# MERAK-7B-V2 GGML
+readme adapted from [TheBloke](https://huggingface.co/TheBloke)
+These files are GGML format model files for [MERAK-7B-V2](https://huggingface.co/Ichsan2895/Merak-7B-v2).
+GGML files are for CPU + GPU inference using [llama.cpp](https://github.com/ggerganov/llama.cpp) and libraries and UIs which support this format, such as:
+* [KoboldCpp](https://github.com/LostRuins/koboldcpp), a powerful GGML web UI with full GPU acceleration out of the box. Especially good for story telling.
+* [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui), a great web UI with GPU acceleration via the c_transformers backend.
+* [LM Studio](https://lmstudio.ai/), a fully featured local GUI. Supports full GPU accel on macOS. Also supports Windows, without GPU accel.
+* [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most popular web UI. Requires extra steps to enable GPU accel via llama.cpp backend.
+* [ctransformers](https://github.com/marella/ctransformers), a Python library with LangChain support and OpenAI-compatible AI server.
+* [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), a Python library with OpenAI-compatible API server.
+<!-- compatibility_ggml start -->
+## Compatibility
+### Original llama.cpp quant methods: `q4_0, q4_1, q5_0, q5_1, q8_0`
+These are guaranteed to be compatible with any UIs, tools and libraries released since late May. They may be phased out soon, as they are largely superseded by the new k-quant methods.
+### New k-quant methods: `q2_K, q3_K_S, q3_K_M, q3_K_L, q4_K_S, q4_K_M, q5_K_S, q6_K`
+These new quantisation methods are compatible with llama.cpp as of June 6th, commit `2d43387`.
+They are now also compatible with recent releases of text-generation-webui, KoboldCpp, llama-cpp-python, ctransformers, rustformers and most others. For compatibility with other tools and libraries, please check their documentation.
+## Explanation of the new k-quant methods
+<details>
+  <summary>Click to see details</summary>
+The new methods available are:
+* GGML_TYPE_Q2_K - "type-1" 2-bit quantization in super-blocks containing 16 blocks, each block having 16 weight. Block scales and mins are quantized with 4 bits. This ends up effectively using 2.5625 bits per weight (bpw)
+* GGML_TYPE_Q3_K - "type-0" 3-bit quantization in super-blocks containing 16 blocks, each block having 16 weights. Scales are quantized with 6 bits. This end up using 3.4375 bpw.
+* GGML_TYPE_Q4_K - "type-1" 4-bit quantization in super-blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 4.5 bpw.
+* GGML_TYPE_Q5_K - "type-1" 5-bit quantization. Same super-block structure as GGML_TYPE_Q4_K resulting in 5.5 bpw
+* GGML_TYPE_Q6_K - "type-0" 6-bit quantization. Super-blocks with 16 blocks, each block having 16 weights. Scales are quantized with 8 bits. This ends up using 6.5625 bpw
+* GGML_TYPE_Q8_K - "type-0" 8-bit quantization. Only used for quantizing intermediate results. The difference to the existing Q8_0 is that the block size is 256. All 2-6 bit dot products are implemented for this quantization type.
+Refer to the Provided Files table below to see what files use which methods, and how.
+</details>
+<!-- compatibility_ggml end -->
+## Provided files
+| Name | Quant method | Bits | Use case |
+| ---- | ---- | ---- | ----- |
+| Merak-7B-v2.ggmlv3.q2_K.bin | q2_K | 2 | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. |
+| Merak-7B-v2.ggmlv3.q3_K_L.bin | q3_K_L | 3 | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K |
+| Merak-7B-v2.ggmlv3.q3_K_M.bin | q3_K_M | 3 | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K |
+| Merak-7B-v2.ggmlv3.q3_K_S.bin | q3_K_S | 3 | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors |
+| Merak-7B-v2.ggmlv3.q4_0.bin | q4_0 | 4 | Original quant method, 4-bit. |
+| Merak-7B-v2.ggmlv3.q4_1.bin | q4_1 | 4 | Original quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. |
+| Merak-7B-v2.ggmlv3.q4_K_M.bin | q4_K_M | 4 | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K |
+| Merak-7B-v2.ggmlv3.q4_K_S.bin | q4_K_S | 4 | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors |
+| Merak-7B-v2.ggmlv3.q5_0.bin | q5_0 | 5 | Original quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. |
+| Merak-7B-v2.ggmlv3.q5_1.bin | q5_1 | 5 | Original quant method, 5-bit. Even higher accuracy, resource usage and slower inference. |
+| Merak-7B-v2.ggmlv3.q5_K_M.bin | q5_K_M | 5 | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K |
+| Merak-7B-v2.ggmlv3.q5_K_S.bin | q5_K_S | 5 | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors |
+| Merak-7B-v2.ggmlv3.q6_K.bin | q6_K | 6 | New k-quant method. Uses GGML_TYPE_Q8_K for all tensors - 6-bit quantization |
+| lMerak-7B-v2.ggmlv3.q8_0.bin | q8_0 | 8 | Original quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. |
+## How to run in `text-generation-webui`
+Further instructions here: [text-generation-webui/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md).
+# Original model card: 6TH PROTOTYPE OF MERAK-7B-V2!
+Merak-7B is the Large Language Model of Indonesia Languange
+This model is based on Meta Llama-2-7B-Chat-HF and fine tuned by some of Indonesia Wikipedia articles that I cleaned before.
+Leveraging QLoRA (QLora: Efficient Finetuning of Quantized LLMs), Merak-7B is able to run with 16 GB VRAM
+Licensed under Creative Commons-By Attribution-Share Alike-Non Commercial (CC-BY-SA-NC 4.0) Merak-7B empowers AI enthusiasts, researchers alike.
+Big thanks to all my friends and communities that help to build our first model. Feel free, to ask me about the model and please share the news on your social media.
+## HOW TO USE
+### Installation
+Please make sure you have installed CUDA driver in your system, Python 3.10 and PyTorch 2. Then install this library in terminal
+```
+pip install bitsandbytes==0.39.1
+pip install transformers==4.31.0
+pip install peft==0.4.0
+pip install accelerate==0.20.3
+pip install einops==0.6.1 scipy sentencepiece datasets
+```
+### Using BitsandBytes and it run with >= 10 GB VRAM GPU
+[![Open in Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Cl1tO1QIYNWHR8K-nQe6xIaUvaLwxXCq?usp=sharing)
+```
+import torch
+from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM, BitsAndBytesConfig, LlamaTokenizer
+from peft import PeftModel, PeftConfig
+model_id = "Ichsan2895/Merak-7B-v2"
+config = AutoConfig.from_pretrained(model_id)
+BNB_CONFIG = BitsAndBytesConfig(load_in_4bit=True,
+                                bnb_4bit_compute_dtype=torch.bfloat16,
+                                bnb_4bit_use_double_quant=True,
+                                bnb_4bit_quant_type="nf4",
+    )
+model = AutoModelForCausalLM.from_pretrained(model_id,
+                                             quantization_config=BNB_CONFIG,
+                                             device_map="auto",
+                                             trust_remote_code=True)
+tokenizer = LlamaTokenizer.from_pretrained(model_id)
+def generate_response(question: str) -> str:
+  prompt = f"<|prompt|>{question}\n<|answer|>".strip()
+  encoding = tokenizer(prompt, return_tensors='pt').to("cuda")
+  with torch.inference_mode():
+    outputs = model.generate(input_ids=encoding.input_ids,
+                             attention_mask=encoding.attention_mask,
+                             eos_token_id=tokenizer.pad_token_id,
+                             do_sample=False,
+                             num_beams=2,
+                             temperature=0.3,
+                             repetition_penalty=1.2,
+                             max_length=200)
+    response = tokenizer.decode(outputs[0], skip_special_tokes=True)
+    assistant_start = "<|answer|>"
+    response_start = response.find(assistant_start)
+return response[response_start + len(assistant_start) :].strip()
+prompt = "Siapa penulis naskah proklamasi kemerdekaan Indonesia?"
+print(generate_response(prompt))
+```
+### From my experience, For better answer, please don’t use BitsandBytes 4-bit Quantization, but it using higher VRAM
+[![Open in Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1uUaeI4-Zzuk0m9Xjg1Dw45YZs402EgWz?usp=sharing)
+```
+import torch
+from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM, BitsAndBytesConfig, LlamaTokenizer
+from peft import PeftModel, PeftConfig
+model_id = "Ichsan2895/Merak-7B-v2"
+config = AutoConfig.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id,
+                                             device_map="auto",
+                                             trust_remote_code=True)
+tokenizer = LlamaTokenizer.from_pretrained(model_id)
+def generate_response(question: str) -> str:
+  prompt = f"<|prompt|>{question}\n<|answer|>".strip()
+  encoding = tokenizer(prompt, return_tensors='pt').to("cuda")
+  with torch.inference_mode():
+    outputs = model.generate(input_ids=encoding.input_ids,
+                             attention_mask=encoding.attention_mask,
+                             eos_token_id=tokenizer.pad_token_id,
+                             do_sample=False,
+                             num_beams=2,
+                             temperature=0.3,
+                             repetition_penalty=1.2,
+                             max_length=200)
+    response = tokenizer.decode(outputs[0], skip_special_tokes=True)
+    assistant_start = "<|answer|>"
+    response_start = response.find(assistant_start)
+return response[response_start + len(assistant_start) :].strip()
+prompt = "Siapa penulis naskah proklamasi kemerdekaan Indonesia?"
+print(generate_response(prompt))
+```
+## CHANGELOG
+**v1** = The first Merak-7B model. We selected and cleaned about 200k ID wikipedia articles.
+**v2** = Finetuned version of first Merak-7B model. We finetuned again with the same ID Wikipedia articles except it changes prompt-style in the questions.
+## CITATION
+```
+@Paper{arXiv,
+  author  = {Touvron, et al},
+  title   = {Llama 2: Open Foundation and Fine-Tuned Chat Models},
+  journal = {arXiv preprint arXiv:2307.09288},
+  year    = {2023}
+}
+@ONLINE{wikidump,
+    author = "Wikimedia Foundation",
+    title  = "Wikimedia Downloads",
+    url    = "https://dumps.wikimedia.org"
+}
+@inproceedings{wolf-etal-2020-transformers,
+    title = "Transformers: State-of-the-Art Natural Language Processing",
+    author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush",
+    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
+    month = oct,
+    year = "2020",
+    address = "Online",
+    publisher = "Association for Computational Linguistics",
+    url = "https://www.aclweb.org/anthology/2020.emnlp-demos.6",
+    pages = "38--45"
+}
+@article{dettmers2023qlora,
+  title   = {QLoRA: Efficient Finetuning of Quantized LLMs},
+  author  = {Dettmers, Tim and Pagnoni, Artidoro and Holtzman, Ari and Zettlemoyer, Luke},
+  journal = {arXiv preprint arXiv:2305.14314},
+  year    = {2023}
+}
+```

USE_POLICY.md ADDED Viewed

	@@ -0,0 +1,50 @@

+# Llama 2 Acceptable Use Policy
+Meta is committed to promoting safe and fair use of its tools and features, including Llama 2. If you access or use Llama 2, you agree to this Acceptable Use Policy (“Policy”). The most recent copy of this policy can be found at [ai.meta.com/llama/use-policy](http://ai.meta.com/llama/use-policy).
+## Prohibited Uses
+We want everyone to use Llama 2 safely and responsibly. You agree you will not use, or allow others to use, Llama 2 to:
+1. Violate the law or others’ rights, including to:
+    1. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as:
+        1. Violence or terrorism
+        2. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material
+        3. Human trafficking, exploitation, and sexual violence
+        4. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials.
+        5. Sexual solicitation
+        6. Any other criminal activity
+    2. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals
+    3. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services
+    4. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices
+    5. Collect, process, disclose, generate, or infer health, demographic, or other sensitive personal or private information about individuals without rights and consents required by applicable laws
+    6. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama 2 Materials
+    7. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system
+2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 2 related to the following:
+    1. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State
+    2. Guns and illegal weapons (including weapon development)
+    3. Illegal drugs and regulated/controlled substances
+    4. Operation of critical infrastructure, transportation technologies, or heavy machinery
+    5. Self-harm or harm to others, including suicide, cutting, and eating disorders
+    6. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual
+3. Intentionally deceive or mislead others, including use of Llama 2 related to the following:
+    1. Generating, promoting, or furthering fraud or the creation or promotion of disinformation
+    2. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content
+    3. Generating, promoting, or further distributing spam
+    4. Impersonating another individual without consent, authorization, or legal right
+    5. Representing that the use of Llama 2 or outputs are human-generated
+    6. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement
+4. Fail to appropriately disclose to end users any known dangers of your AI system
+Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means:
+* Reporting issues with the model: [github.com/facebookresearch/llama](http://github.com/facebookresearch/llama)
+* Reporting risky content generated by the model: [developers.facebook.com/llama_output_feedback](http://developers.facebook.com/llama_output_feedback)
+* Reporting bugs and security concerns: [facebook.com/whitehat/info](http://facebook.com/whitehat/info)
+* Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama: [LlamaUseReport@meta.com](mailto:LlamaUseReport@meta.com)

config.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+  "_name_or_path": "meta-llama/Llama-2-7b-chat-hf",
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "hidden_act": "silu",
+  "hidden_size": 4096,
+  "initializer_range": 0.02,
+  "intermediate_size": 11008,
+  "max_position_embeddings": 4096,
+  "model_type": "llama",
+  "num_attention_heads": 32,
+  "num_hidden_layers": 32,
+  "num_key_value_heads": 32,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "tie_word_embeddings": false,
+  "torch_dtype": "float16",
+  "transformers_version": "4.32.0.dev0",
+  "use_cache": true,
+  "vocab_size": 32000
+}