SPAHE
/

Meltemi-7B-Instruct-v1-GGUF

Text Generation

GGUF

finetuned

Inference Endpoints

conversational

Model card Files Files and versions Community

ibalampanis commited on Mar 29

Commit

7e3cc96

•

1 Parent(s): 97805a3

Update README.md

Browse files

Files changed (1) hide show

README.md +47 -59

README.md CHANGED Viewed

@@ -5,102 +5,90 @@ model_name: Meltemi-7B-Instruct-v1
 pipeline_tag: text-generation
 quantized_by: SPAHE
 tags:
-- finetuned
 ---
 <!-- markdownlint-disable MD041 -->
 # Meltemi 7B Instruct v1 - GGUF
 - Original model: [Meltemi 7B Instruct v1](https://huggingface.co/ilsp/Meltemi-7B-Instruct-v1)
 <!-- description start -->
 ## Description
-This repo contains GGUF format model files for [ilsp's Meltemi 7B Instruct v1](https://huggingface.co/ilsp/Meltemi-7B-Instruct-v1).
 <!-- description end -->
-<!-- README_GGUF.md-about-gguf start -->
-### About GGUF
-GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp.
-Here is an incomplete list of clients and libraries that are known to support GGUF:
-* [llama.cpp](https://github.com/ggerganov/llama.cpp). The source project for GGUF. Offers a CLI and a server option.
-* [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration.
-* [KoboldCpp](https://github.com/LostRuins/koboldcpp), a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling.
-* [GPT4All](https://gpt4all.io/index.html), a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel.
-* [LM Studio](https://lmstudio.ai/), an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023.
-* [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui), a great web UI with many interesting and unique features, including a full model library for easy model selection.
-* [Faraday.dev](https://faraday.dev/), an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration.
-* [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), a Python library with GPU accel, LangChain support, and OpenAI-compatible API server.
-* [candle](https://github.com/huggingface/candle), a Rust ML framework with a focus on performance, including GPU support, and ease of use.
-* [ctransformers](https://github.com/marella/ctransformers), a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models.
-<!-- compatibility_gguf start -->
-## Compatibility
-These quantised GGUFv2 files are compatible with llama.cpp from August 27th onwards, as of commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221)
 <!-- README_GGUF.md-provided-files start -->
 ## Provided files
-| Name | Quant method | Bits/Floats | Size | Max RAM required | Use case |
-| ---- | ---- | ---- | ---- | ---- | ----- |
-| [meltemi-7B-instruct-v1_q8_0.gguf](https://huggingface.co/SPAHE/Meltemi-7B-Instruct-v1-GGUF/blob/main/meltemi-7B-instruct-v1_q8_0.gguf) | Q8_0 | 5 | 7.40 GB| 7.30 GB | very low quality loss - recommended |
-| [meltemi-7B-instruct-v1_f16.gguf](https://huggingface.co/SPAHE/Meltemi-7B-Instruct-v1-GGUF/blob/main/meltemi-7B-instruct-v1_f16.gguf) | F16 | 16 | 13.90 GB| 14.20 GB | very large, extremely low quality loss |
-| [meltemi-7B-instruct-v1_f32.gguf](https://huggingface.co/SPAHE/Meltemi-7B-Instruct-v1-GGUF/blob/main/meltemi-7B-instruct-v1_f32.gguf) | F32 | 32 | 27.80 GB| 29.30 GB | very large, extremely low quality loss - not recommended |
-**Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
 <!-- README_GGUF.md-provided-files end -->
 <!-- README_GGUF.md-how-to-download start -->
-## How to download GGUF files
-**Note for manual downloaders:** You almost never want to clone the entire repo! Multiple different quantisation formats are provided, and most users only want to pick and download a single file.
-The following clients/libraries will automatically download models for you, providing a list of available models to choose from:
-* LM Studio
-* LoLLMS Web UI
-* Faraday.dev
-### On the command line, including multiple files at once
-I recommend using the `huggingface-hub` Python library:
 ```shell
-pip3 install huggingface-hub
 ```
-Then you can download any individual model file to the current directory, at high speed, with a command like this:
 ```shell
-huggingface-cli download SPAHE/Meltemi-7B-Instruct-v1-GGUF meltemi-7B-instruct-v1_q8_0.gguf --local-dir . --local-dir-use-symlinks False
 ```
 <!-- original-model-card start -->
 # Original model card: ilsp's Meltemi 7B Instruct v1
 # Meltemi Instruct Large Language Model for the Greek language
 We present Meltemi-7B-Instruct-v1 Large Language Model (LLM), an instruct fine-tuned version of [Meltemi-7B-v1](https://huggingface.co/ilsp/Meltemi-7B-v1).
 # Model Information
 - Vocabulary extension of the Mistral-7b tokenizer with Greek tokens
 - 8192 context length
 - Fine-tuned with 100k Greek machine translated instructions extracted from:
-  * [Open-Platypus](https://huggingface.co/datasets/garage-bAInd/Open-Platypus) (only subsets with permissive licenses)
-  * [Evol-Instruct](https://huggingface.co/datasets/WizardLM/WizardLM_evol_instruct_V2_196k)
-  * [Capybara](https://huggingface.co/datasets/LDJnr/Capybara)
-  * A hand-crafted Greek dataset with multi-turn examples steering the instruction-tuned model towards safe and harmless responses
 - Our SFT procedure is based on the [Hugging Face finetuning recipes](https://github.com/huggingface/alignment-handbook)
 # Instruction format
 The prompt format is the same as the [Zephyr](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) format and can be
 utilized through the tokenizer's [chat template](https://huggingface.co/docs/transformers/main/chat_templating) functionality as follows:
@@ -164,25 +152,25 @@ print(tokenizer.batch_decode(outputs)[0])
 The evaluation suite we created includes 6 test sets. The suite is integrated with [lm-eval-harness](https://github.com/EleutherAI/lm-evaluation-harness).
-Our evaluation suite includes:
-* Four machine-translated versions ([ARC Greek](https://huggingface.co/datasets/ilsp/arc_greek), [Truthful QA Greek](https://huggingface.co/datasets/ilsp/truthful_qa_greek), [HellaSwag Greek](https://huggingface.co/datasets/ilsp/hellaswag_greek), [MMLU Greek](https://huggingface.co/datasets/ilsp/mmlu_greek)) of established English benchmarks for language understanding and reasoning ([ARC Challenge](https://arxiv.org/abs/1803.05457), [Truthful QA](https://arxiv.org/abs/2109.07958), [Hellaswag](https://arxiv.org/abs/1905.07830), [MMLU](https://arxiv.org/abs/2009.03300)).
-* An existing benchmark for question answering in Greek ([Belebele](https://arxiv.org/abs/2308.16884))
-* A novel benchmark created by the ILSP team for medical question answering based on the medical exams of [DOATAP](https://www.doatap.gr) ([Medical MCQA](https://huggingface.co/datasets/ilsp/medical_mcqa_greek)).
-Our evaluation for Meltemi-7b is performed in a few-shot setting, consistent with the settings in the [Open LLM leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard). We can see that our training enhances performance across all Greek test sets by a **+14.9%** average improvement. The results for the Greek test sets are shown in the following table:
-|                | Medical MCQA EL (15-shot) | Belebele EL (5-shot) | HellaSwag EL (10-shot) | ARC-Challenge EL (25-shot) | TruthfulQA MC2 EL (0-shot) | MMLU EL (5-shot) | Average |
-|----------------|----------------|-------------|--------------|------------------|-------------------|---------|---------|
-| Mistral 7B     | 29.8%          | 45.0%       | 36.5%        | 27.1%            | 45.8%             | 35%     | 36.5%   |
-| Meltemi 7B     | 41.0%          | 63.6%       | 61.6%        | 43.2%            | 52.1%             | 47%     | 51.4%   |
 # Ethical Considerations
 This model has not been aligned with human preferences, and therefore might generate misleading, harmful, and toxic content.
 # Acknowledgements
-The ILSP team utilized Amazon’s cloud computing services, which were made available via GRNET under the [OCRE Cloud framework](https://www.ocre-project.eu/), providing Amazon Web Services for the Greek Academic and Research Community.
 <!-- original-model-card end -->

 pipeline_tag: text-generation
 quantized_by: SPAHE
 tags:
+  - finetuned
 ---
 <!-- markdownlint-disable MD041 -->
 # Meltemi 7B Instruct v1 - GGUF
 - Original model: [Meltemi 7B Instruct v1](https://huggingface.co/ilsp/Meltemi-7B-Instruct-v1)
 <!-- description start -->
 ## Description
+This repository contains GGUF format model files for [ilsp's Meltemi 7B Instruct v1](https://huggingface.co/ilsp/Meltemi-7B-Instruct-v1), optimized for different performance and storage requirements. Each model variant has been carefully quantized or preserved in floating-point format to suit varying demands for quality, speed, and memory usage.
 <!-- description end -->
 <!-- README_GGUF.md-provided-files start -->
 ## Provided files
+| Name                                                                                                                                    | Quantization Method | Precision (Bits) | File Size | Max RAM Required | Use Case                                                      |
+| --------------------------------------------------------------------------------------------------------------------------------------- | ------------------- | ---------------- | --------- | ---------------- | ------------------------------------------------------------- |
+| [meltemi-7b-instruct-v1_q8_0.gguf](https://huggingface.co/SPAHE/Meltemi-7B-Instruct-v1-GGUF/blob/main/meltemi-7b-instruct-v1_q8_0.gguf) | Q8_0                | 8                | 7.40 GB   | 7.30 GB          | Low quality loss - recommended                                |
+| [meltemi-7b-instruct-v1_f16.gguf](https://huggingface.co/SPAHE/Meltemi-7B-Instruct-v1-GGUF/blob/main/meltemi-7b-instruct-v1_f16.gguf)   | F16                 | 16               | 13.90 GB  | 14.20 GB         | Very large, extremely low quality loss - recommended          |
+| [meltemi-7b-instruct-v1_f32.gguf](https://huggingface.co/SPAHE/Meltemi-7B-Instruct-v1-GGUF/blob/main/meltemi-7b-instruct-v1_f32.gguf)   | F32                 | 32               | 27.80 GB  | 29.30 GB         | Very very large, extremely low quality loss - not recommended |
+**Note**: The above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
 <!-- README_GGUF.md-provided-files end -->
 <!-- README_GGUF.md-how-to-download start -->
+## How to Download GGUF Files
+### For Manual Downloaders
+It is recommended not to clone the entire repository due to the large file sizes and multiple quantization formats available. Most users will benefit from selecting and downloading a single, specific model file that best suits their requirements.
+### Automated Download via Client Libraries
+For convenience, the following clients and libraries can automate the download process and offer a selection of available models:
+- **LM Studio**: Provides an integrated environment for downloading and utilizing models directly.
+### Downloading with Command Line
+The `huggingface-hub` Python library simplifies the process of downloading specific model files. Install the library with:
 ```shell
+pip install huggingface-hub
 ```
+To download a model file directly to your current directory, execute:
 ```shell
+huggingface-cli download SPAHE/Meltemi-7B-Instruct-v1-GGUF --filename meltemi-7b-instruct-v1_q8_0.gguf --output-dir .
 ```
+This command ensures a high-speed download of the specific GGUF file you need without unnecessary data.
+<!-- README_GGUF.md-how-to-download end -->
 <!-- original-model-card start -->
 # Original model card: ilsp's Meltemi 7B Instruct v1
 # Meltemi Instruct Large Language Model for the Greek language
 We present Meltemi-7B-Instruct-v1 Large Language Model (LLM), an instruct fine-tuned version of [Meltemi-7B-v1](https://huggingface.co/ilsp/Meltemi-7B-v1).
 # Model Information
 - Vocabulary extension of the Mistral-7b tokenizer with Greek tokens
 - 8192 context length
 - Fine-tuned with 100k Greek machine translated instructions extracted from:
+  - [Open-Platypus](https://huggingface.co/datasets/garage-bAInd/Open-Platypus) (only subsets with permissive licenses)
+  - [Evol-Instruct](https://huggingface.co/datasets/WizardLM/WizardLM_evol_instruct_V2_196k)
+  - [Capybara](https://huggingface.co/datasets/LDJnr/Capybara)
+  - A hand-crafted Greek dataset with multi-turn examples steering the instruction-tuned model towards safe and harmless responses
 - Our SFT procedure is based on the [Hugging Face finetuning recipes](https://github.com/huggingface/alignment-handbook)
 # Instruction format
 The prompt format is the same as the [Zephyr](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) format and can be
 utilized through the tokenizer's [chat template](https://huggingface.co/docs/transformers/main/chat_templating) functionality as follows:
 The evaluation suite we created includes 6 test sets. The suite is integrated with [lm-eval-harness](https://github.com/EleutherAI/lm-evaluation-harness).
+Our evaluation suite includes:
+- Four machine-translated versions ([ARC Greek](https://huggingface.co/datasets/ilsp/arc_greek), [Truthful QA Greek](https://huggingface.co/datasets/ilsp/truthful_qa_greek), [HellaSwag Greek](https://huggingface.co/datasets/ilsp/hellaswag_greek), [MMLU Greek](https://huggingface.co/datasets/ilsp/mmlu_greek)) of established English benchmarks for language understanding and reasoning ([ARC Challenge](https://arxiv.org/abs/1803.05457), [Truthful QA](https://arxiv.org/abs/2109.07958), [Hellaswag](https://arxiv.org/abs/1905.07830), [MMLU](https://arxiv.org/abs/2009.03300)).
+- An existing benchmark for question answering in Greek ([Belebele](https://arxiv.org/abs/2308.16884))
+- A novel benchmark created by the ILSP team for medical question answering based on the medical exams of [DOATAP](https://www.doatap.gr) ([Medical MCQA](https://huggingface.co/datasets/ilsp/medical_mcqa_greek)).
+Our evaluation for Meltemi-7b is performed in a few-shot setting, consistent with the settings in the [Open LLM leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard). We can see that our training enhances performance across all Greek test sets by a **+14.9%** average improvement. The results for the Greek test sets are shown in the following table:
+|            | Medical MCQA EL (15-shot) | Belebele EL (5-shot) | HellaSwag EL (10-shot) | ARC-Challenge EL (25-shot) | TruthfulQA MC2 EL (0-shot) | MMLU EL (5-shot) | Average |
+| ---------- | ------------------------- | -------------------- | ---------------------- | -------------------------- | -------------------------- | ---------------- | ------- |
+| Mistral 7B | 29.8%                     | 45.0%                | 36.5%                  | 27.1%                      | 45.8%                      | 35%              | 36.5%   |
+| Meltemi 7B | 41.0%                     | 63.6%                | 61.6%                  | 43.2%                      | 52.1%                      | 47%              | 51.4%   |
 # Ethical Considerations
 This model has not been aligned with human preferences, and therefore might generate misleading, harmful, and toxic content.
 # Acknowledgements
+The ILSP team utilized Amazon’s cloud computing services, which were made available via GRNET under the [OCRE Cloud framework](https://www.ocre-project.eu/), providing Amazon Web Services for the Greek Academic and Research Community.
 <!-- original-model-card end -->