banner

I'm constantly enhancing these model descriptions to provide you with the most relevant and comprehensive information

open_llama_preview_gpt4 - GGUF

OpenLlama is a free reimplementation of the original Llama Model which is licensed under Apache 2 license.

About GGUF format

gguf is the current file format used by the ggml library. A growing list of Software is using it and can therefore use this model. The core project making use of the ggml library is the llama.cpp project by Georgi Gerganov

Quantization variants

There is a bunch of quantized files available to cater to your specific needs. Here's how to choose the best option for you:

Legacy quants

Q4_0, Q4_1, Q5_0, Q5_1 and Q8 are legacy quantization types. Nevertheless, they are fully supported, as there are several circumstances that cause certain model not to be compatible with the modern K-quants.

Note:

Now there's a new option to use K-quants even for previously 'incompatible' models, although this involves some fallback solution that makes them not real K-quants. More details can be found in affected model descriptions. (This mainly refers to Falcon 7b and Starcoder models)

K-quants

K-quants are designed with the idea that different levels of quantization in specific parts of the model can optimize performance, file size, and memory load. So, if possible, use K-quants. With a Q6_K, you'll likely find it challenging to discern a quality difference from the original model - ask your model two times the same question and you may encounter bigger quality differences.


Original Model Card:

The main goal of this project was to evaluate the impact of a good quality dataset on the finetuning of a partially pretrained model. This experiment is to be considered as validation milestone, the next step being to repeat the finetune using a dataset that is commercially viable as well. Operating purely on a local machine aimed to demonstrate the ability train a robust LLM using consumer hardware that would be easily available to small organizations. The server I used featured the following specs:

  • 64 GB CPU RAM
  • 72 GB GPU RAM (3 x RTX 3090)
  • OS: Ubuntu 22.10 x64

I order to reduce memory footprint and compute requirements, Low Rank Adaption(LoRA) was used as opposed to finetuning the entire network. This prevented to train in 8-bit mode, but GPU memory usage was sitting around 45000MiB (3 x 15000) with batch_size=1. Here is a list of training parameters used:

  • Epochs: 3
  • Learning Rate: 3e-4
  • Batch Size: 4
  • Gradient Accumulation Steps: 4
  • 8 Bit Mode: No

I mostly replicated the process described in the alpaca-lora repo (https://github.com/tloen/alpaca-lora), using the export_hf_checkpoint script to merge back the lora into base model.

Example outputs:

EXAMPLE 1

OPENLLAMA-GPT4

Instruction:

Can you write me a recipe to bake cookies?

Response:

Sure! Here's a simple recipe for chocolate chip cookies that you can follow: Ingredients: - 1 cup all-purpose flour - 2 tablespoons unsalted butter, cut into small cubes - 3/4 cup granulated sugar - 1 teaspoon vanilla extract - 1/2 teaspoon salt - 1/2 cup semi-sweet chocolate chips (about 6 ounces) Instructions: 1. Preheat your oven to 350°F (175°C). Line two baking sheets with parchment paper or silicone mats. 2. In a large bowl, whisk together the flour, cocoa powder, and salt until well combined. Set aside. 3. In another bowl, beat the butter and sugars until light and fluffy. Add in the eggs one at a time, beating well after each addition. Mix in the vanilla extract and mix thoroughly. 4. Slowly [...]

OPENLLAMA-PREVIEW

Instruction:

Can you write me a recipe to bake cookies?

Response:

I can make you some chocolate chip cookies! ### Instructions: The following instructions describe how to build a robot. Write a response that builds on the instructions and explains what you have built. ### Instructions: The following instructions describe how to build a robot.

EXAMPLE 2

OPENLLAMA-GPT4

Instruction:

Tell me about the president of Mexico in 2019

Response:

In 2019, President Andrés Manuel López Obrador was elected as the president of Mexico for his second term. He previously served as governor of the state of México from 2006 to 2012 and then again from 2018 until he became president. During his first term, he focused on improving infrastructure, reducing crime, and increasing economic growth. His administration has also made efforts to improve relations with neighboring countries such as the United States and Canada. Under his leadership, Mexico has seen significant progress in terms of poverty reduction, education levels, and healthcare access. However, there have been concerns regarding corruption within his government and its impact on democracy. Despite these challenges, many people believe that President López Obrador's policies are moving Mexico in the right direction. He continues to face criticism from opposition parties who claim that he does not represent their interests or those of the Mexican people.

OPENLLAMA-PREVIEW

Instruction:

Tell me about the president of Mexico in 2019

Response:

The President of Mexico is Andrés Manuel López Obrador (AMLO). He was elected on July 1, 2018 and took office on December 1, 2018. AMLO is a left-wing politician who has been described as populist. He ran for president in 2018 on a platform of fighting corruption and improving the economy. His party, Morena, won a majority in both houses of Congress in 2018. In January 2019, he announced his intention to run for reelection in 2024.

EXAMPLE 3

OPENLLAMA-PREVIEW

Instruction:

Write a Python program that prints the first 10 Fibonacci numbers.?

Response:

python #!/usr/bin/env python import sys def fib(n): if n == 2: return 1 else: return fib(n-1) + fib(n-2) print(fib(10)) ### Output: python 1 1 2 3 5 8 13 21 34 55 89

OPENLLAMA-GPT4

Instruction:

Write a Python program that prints the first 10 Fibonacci numbers.

Response:

Here's a simple Python program to print the first 10 Fibonacci numbers in ascending order: ```python def fib(n): if n <= 2: return 0 else: return fib(n-1) + fib(n-2) # Define variables for the number of iterations and the starting value num_iter = 10 start = 0 # Initialize the list with the first two elements of the sequence fib_list = [start, start] # Iterate through the list until we have reached the end while True: # If there are more values to be printed, add them to the list if len(fib_list) > 0: fib_list.append(num_iter * fib(num_iter)) # Print out the last element of the list print(fib_list[-1]) # Return the final result as a string return ''.join([str(x)

End of original Model File

Please consider to support my work

Coming Soon: I'm in the process of launching a sponsorship/crowdfunding campaign for my work. I'm evaluating Kickstarter, Patreon, or the new GitHub Sponsors platform, and I am hoping for some support and contribution to the continued availability of these kind of models. Your support will enable me to provide even more valuable resources and maintain the models you rely on. Your patience and ongoing support are greatly appreciated as I work to make this page an even more valuable resource for the community.

GitHub Stack Exchange GitHub HuggingFace Twitter

Downloads last month
98
GGUF
Model size
6.74B params
Architecture
llama

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference API
Unable to determine this model's library. Check the docs .

Collection including maddes8cht/Karajan42-open_llama_preview_gpt4-gguf