--- tags: - merge - mergekit - lazymergekit - OpenPipe/mistral-ft-optimized-1218 - HuggingFaceH4/zephyr-7b-beta base_model: - OpenPipe/mistral-ft-optimized-1218 - HuggingFaceH4/zephyr-7b-beta --- # AeolusBlend-7B-slerp AeolusBlend-7B-slerp is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing): * [OpenPipe/mistral-ft-optimized-1218](https://huggingface.co/OpenPipe/mistral-ft-optimized-1218) * [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) ## 🧩 Configuration ```yaml slices: - sources: - model: OpenPipe/mistral-ft-optimized-1218 layer_range: [0, 32] - model: HuggingFaceH4/zephyr-7b-beta layer_range: [0, 32] merge_method: slerp base_model: OpenPipe/mistral-ft-optimized-1218 parameters: t: - filter: self_attn value: [0, 0.5, 0.3, 0.7, 1] - filter: mlp value: [1, 0.5, 0.7, 0.3, 0] - value: 0.5 dtype: bfloat16 ``` ## 💻 Usage ```python !pip install -qU transformers accelerate from transformers import AutoTokenizer import transformers import torch model = "lxyuan/AeolusBlend-7B-slerp" messages = [{"role": "user", "content": "What is a large language model?"}] tokenizer = AutoTokenizer.from_pretrained(model) prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) pipeline = transformers.pipeline( "text-generation", model=model, torch_dtype=torch.float16, device_map="auto", ) outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95) print(outputs[0]["generated_text"]) >>> [RESP] A large language model is a type of artificial intelligence (AI) system that has been trained on vast amounts of text data to understand and generate human-like language. These models are typically made up of neural networks, which are a type of machine learning algorithm that can learn to recognize patterns in data and make predictions based on those patterns. The term "large" in this context refers to the size of the model, which is measured by the number of parameters or connections in the neural network. Large language models can have billions or even trillions of parameters, making them capable of processing and generating extremely complex text. Some examples of large language models include Google's BERT, OpenAI's GPT-3, and Facebook's Transformer-XL. These models have been trained on huge datasets such as books, articles, and web pages, allowing them to understand the nuances of language and generate text that is not only grammatically correct but also natural and fluent. Large language models have a wide range of potential applications, including natural language processing (NLP) tasks such as text generation, translation, and summarization, as well as chatbots and virtual assistants that can ``` ```python messages = [{"role": "user", "content": "Who is Lee Kuan Yew? Summarize your answer in point form format"}] prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95) print(outputs[0]["generated_text"]) >>> [REP] Lee Kuan Yew was a Singaporean statesman who served as the first Prime Minister of Singapore from 1959 to 1990. Here are some key points about him: - Born in 1923 in Singapore, Lee was educated in England and was a lawyer by profession. - He was a founding member of the People's Action Party (PAP), which he led for many years. - Under Lee's leadership, Singapore achieved independence from Malaysia in 1965 and became a sovereign nation. - Lee was known for his pragmatic and authoritarian style of governance, emphasizing economic growth, law and order, and meritocracy. - He played a significant role in Singapore's rapid development, transforming the country from a poor and undeveloped nation into a modern and prosperous city-state. - Lee passed away in 2015, at the age of 91. - He was widely regarded as one of the most influential leaders of the 20th century and a key figure in the history of Singapore. ``` ### 4-bit Inferencing Example ```python from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig import transformers import torch #!nvidia-smi """ Wed Feb 7 12:51:07 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Tesla V100-SXM2-16GB On | 00000000:00:1E.0 Off | 0 | | N/A 41C P0 44W / 300W | 4950MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ """ model_id = "lxyuan/AeolusBlend-7B-slerp" bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16 ) model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto") tokenizer = AutoTokenizer.from_pretrained(model_id) pipeline = transformers.pipeline( "text-generation", model=model, tokenizer=tokenizer, device_map="auto", ) messages = [{"role": "user", "content": "What is a large language model?"}] prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) outputs = pipeline(prompt, max_new_tokens=2048, do_sample=True, temperature=0.7, top_k=50, top_p=0.95) print(outputs[0]["generated_text"]) >>> [INST] What is a large language model? [/INST] A large language model is a type of artificial intelligence system that has been trained on vast amounts of text data, enabling it to generate human-like responses to a wide range of written prompts. These models are designed to learn the patterns and rules of language, and as a result, they can perform various natural language processing tasks, such as translation, summarization, and question-answering, with a high degree of accuracy. Large language models are typically powered by deep learning algorithms and can have billions or trillions of parameters, making them capable of processing and understanding complex language structures and nuances. Some well-known examples of large language models include GPT-3, BERT, and T5. ``` - 4bit Inference Example notebook can be found [here](https://github.com/LxYuan0420/nlp/blob/main/notebooks/Inference_4bit_AeolusBlend.ipynb)