license: apache-2.0
Buddhi 7B
Buddhi-7B vLLM Inference:
Model Description
Buddhi is a general-purpose chat model, meticulously fine-tuned on the Mistral 7B Instruct, and optimised to handle an extended context length of up to 128,000 tokens using the innovative YaRN (Yet another Rope Extension) Technique. This enhancement allows Buddhi to maintain a deeper understanding of context in long documents or conversations, making it particularly adept at tasks requiring extensive context retention, such as comprehensive document summarization, detailed narrative generation, and intricate question-answering.
Architecture
Hardware requirements:
For 128k Context Length
- 80GB VRAM - A100 Preferred
For 32k Context Length
- 40GB VRAM - A100 Preferred
vLLM - For Faster Inference
Installation
!pip install vllm
!pip install flash_attn # If Flash Attention 2 is supported by your System
Please check out Flash Attention 2 Github Repository for more instructions on how to Install it.
Implementation:
from vllm import LLM, SamplingParams
llm = LLM(
model='aiplanet/Buddhi-128K-Chat',
gpu_memory_utilization=0.99,
max_model_len=131072
)
prompts = [
"""<s> [INST] Please tell me a joke. [/INST] """,
"""<s> [INST] What is Machine Learning? [/INST] """
]
sampling_params = SamplingParams(
temperature=0.8,
top_p=0.95,
max_tokens=1000
)
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(generated_text)
print("\n\n")
Transformers - Basic Implementation
import torch
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model_name = "aiplanet/Buddhi-128K-Chat"
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
device_map="sequential",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
model,
trust_remote_code=True
)
prompt = "<s> [INST] Please tell me a small joke. [/INST] "
tokens = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(
**tokens,
max_new_tokens=100,
do_sample=True,
top_p=0.95,
temperature=0.8,
)
decoded_output = tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]
print(f"Output:\n{decoded_output[len(prompt):]}")
Output
Output:
Why don't scientists trust atoms?
Because they make up everything.
Prompt Template for Panda Coder 13B
In order to leverage instruction fine-tuning, your prompt should be surrounded by [INST] and [/INST] tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id.
"<s>[INST] What is your favourite condiment? [/INST]"
"Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> "
"[INST] Do you have mayonnaise recipes? [/INST]"
π Key Features:
π― Precision and Efficiency: The model is tailored for accuracy, ensuring your code is not just functional but also efficient.
β¨ Unleash Creativity: Whether you're a novice or an expert coder, Panda-Coder is here to support your coding journey, offering creative solutions to your programming challenges.
π Evol Instruct Code: It's built on the robust Evol Instruct Code 80k-v1 dataset, guaranteeing top-notch code generation.
π’ What's Next?: We believe in continuous improvement and are excited to announce that in our next release, Panda-Coder will be enhanced with a custom dataset. This dataset will not only expand the language support but also include hardware programming languages like MATLAB, Embedded C, and Verilog. π§°π‘
Get in Touch
You can schedule a 1:1 meeting with our DevRel & Community Team to get started with AI Planet Open Source LLMs and GenAI Stack. Schedule the call here: https://calendly.com/jaintarun
Stay tuned for more updates and be a part of the coding evolution. Join us on this exciting journey as we make AI accessible to all at AI Planet!
Framework versions
- Transformers 4.39.2
- Pytorch 2.2.1+cu121
- Datasets 2.18.0
- Accelerate 0.27.2
- flash_attn 2.5.6
Citation
@misc {Chaitanya890,
author = { {Chaitanya Singhal} },
title = { Buddhi-128k-Chat by AI Planet},
year = 2024,
url = { https://huggingface.co/aiplanet//Buddhi-128K-Chat },
publisher = { Hugging Face }
}