--- library_name: transformers datasets: - Intel/orca_dpo_pairs language: - en tags: - mistral-7b - mistral - dpo - neuralhermes - instruct - rlhf - notebook - endtoend license: apache-2.0 --- - Based model `teknium/OpenHermes-2.5-Mistral-7B` - Refined using Direct Preference Optimization (DPO) with the `Intel/orca_dpo_pairs`. ## Uses ### Direct Use Way 1 (see the next one for faster inference `Way 2`) ```python import transformers from transformers import AutoTokenizer new_model="abdullahalzubaer/NeuralHermes-2.5-Mistral-7B" # Format prompt message = [ {"role": "system", "content": "You are a helpful assistant chatbot."}, {"role": "user", "content": "What is a Large Language Model?"} ] tokenizer = AutoTokenizer.from_pretrained(new_model) prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False) # Create pipeline pipeline = transformers.pipeline( "text-generation", model=new_model, tokenizer=tokenizer ) # Generate text sequences = pipeline( prompt, do_sample=True, temperature=0.7, top_p=0.9, num_return_sequences=1, max_length=200, ) print(sequences[0]['generated_text']) ``` Sample Output from `abdullahalzubaer/NeuralHermes-2.5-Mistral-7B` ``` <|im_start|>system You are a helpful assistant chatbot.<|im_end|> <|im_start|>user What is a Large Language Model?<|im_end|> <|im_start|>assistant A large language model is an artificial intelligence system designed to process and understand large amounts of natural language data. It's a type of machine learning model, typically built using neural networks, that is trained on vast datasets of text to learn patterns and relationships within the language. These models can then generate human-like text, predict the next word in a sequence, perform language translation, and answer questions, among other tasks. The "large" in the term refers to the size of the model, which includes the number of parameters, the complexity of the architecture, and the amount of training data it processes. As a result, large language models are capable of generating more complex and coherent responses compared to smaller models. ``` Sample Output from `mlabonne/NeuralHermes-2.5-Mistral-7B` (provided as in the [tutorial](https://mlabonne.github.io/blog/posts/Fine_tune_Mistral_7b_with_DPO.html)) ``` <|im_start|>system You are a helpful assistant chatbot.<|im_end|> <|im_start|>user What is a Large Language Model?<|im_end|> <|im_start|>assistant A large language model is a type of artificial intelligence (AI) system that has been trained on vast amounts of text data. These models are designed to understand and generate human language, allowing them to perform various natural language processing tasks, such as text generation, language translation, and question answering. Large language models typically use deep learning techniques, like recurrent neural networks (RNNs) or transformers, to learn patterns and relationships in the data, enabling them to generate coherent and contextually relevant responses. The size of these models, in terms of the number of parameters and the volume of data they are trained on, plays a significant role in their ability to comprehend and produce complex language structures. ``` Therefore it worked maybe not as good as the original model but still close to it (due to max lenght in DPOTrainer?) Way 2 (not sure but it is significantly faster than Way 1 above - therefore I recommend this. Taken directly from [mistral model card](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) and just replaced with my model) ```python import torch import transformers from transformers import AutoModelForCausalLM, AutoTokenizer import trl from trl import AutoModelForCausalLMWithValueHead, PPOConfig, PPOTrainer print(torch.__version__) print(transformers.__version__) print(trl.__version__) ''' 1.13.0+cu117 4.38.2 0.7.11 ''' model_tokenizer = "abdullahalzubaer/NeuralHermes-2.5-Mistral-7B" #lets try my model # model_tokenizer = "mistralai/Mistral-7B-Instruct-v0.2" # model_tokenizer = "mistralai/Mixtral-8x7B-Instruct-v0.1" model = AutoModelForCausalLM.from_pretrained(model_tokenizer) tokenizer = AutoTokenizer.from_pretrained(model_tokenizer) print(f"Loaded Model = {model.config._name_or_path}") print(f"Loaded Tokenizer = {tokenizer.name_or_path}") # Check available GPUs and print their names gpu_count = torch.cuda.device_count() print("Available GPUs:", gpu_count) for i in range(gpu_count): print(f"GPU {i}: {torch.cuda.get_device_name(i)}") # Choose a specific GPU (e.g., GPU 0) device_id = 3 # Change this to select a different GPU device = f"cuda:{device_id}" if torch.cuda.is_available() else "cpu" print(f"Using device: {device}") your_prompt="""What is a Large Language Model?""" messages = [ {"role": "user", "content": your_prompt}, ] encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt") model_inputs = encodeds.to(device) model.to(device) generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True) decoded = tokenizer.batch_decode(generated_ids) print(f"\nComplete I/O:\n{decoded[0]}") # print(f"Using device: {device}") # print(f"\nModel Reply:\n{decoded[0].split('[/INST]')[1]}") ''' Complete I/O: <|im_start|> user What is a Large Language Model? Elaborate. <|im_end|> A Large Language Model is a type of artificial intelligence algorithm designed to generate human-like text or respond to natural language input. It is typically trained on vast amounts of text data, enabling it to understand and generate language with a high level of complexity.<|im_end|> ''' ``` # Loss | Step | Training Loss | |-----|---------| | 1 | 0.693300| | 2 | 0.693200| | 3 | 0.692500| | 4 | 0.691300| | 5 | 0.68940 | | ... | ... | | 45 | 0.633700| | 46 | 0.629000| | 47 | 0.591300| | 48 | 0.558100| | 49 | 0.585800| | 50 | 0.558900| # Hyperparameters: All hyperparameters are as [here](https://mlabonne.github.io/blog/posts/Fine_tune_Mistral_7b_with_DPO.html) except the following ```python # for TrainingArguments() dataloader_num_workers=1, # had to add this #CHANGED_HERE# dataloader_prefetch_factor=1 # for DPOTrainer() # ref_model (it is not required as prompted by error when I included a reference model: not sure why tho, needs further investigation) max_prompt_length=256, # had to lower this to 256 #CHANGED_HERE# or else cuda out of memory max_length=256, # had to lower this to 256 #CHANGED_HERE# cuda out of memory ``` # Reference Thanks! https://mlabonne.github.io/blog/posts/Fine_tune_Mistral_7b_with_DPO.html