Aleph-Alpha
/

Pharia-1-LLM-7B-control

Text Generation

scaling

Model card Files Files and versions Community

ArturBaranowskiAA commited on Aug 28

Commit

fd1d732

•

1 Parent(s): 5fc77c4

Update README.md with references to safetensors-conversions.

Browse files

Files changed (1) hide show

README.md +53 -13

README.md CHANGED Viewed

@@ -10,11 +10,18 @@ This model card provides an overview of the **Pharia-1-LLM-7B** model family, wh
 Pharia-1-LLM-7B comes in two distinct variants, `Pharia-1-LLM-7B-control` and [`Pharia-1-LLM-7B-control-aligned`](https://huggingface.co/Aleph-Alpha/Pharia-1-LLM-7B-control-aligned). Due to being trained on a multilingual corpus, both models are culturally and linguistically optimized for German, French and Spanish. The Pharia-1-LLM-7B models were trained on carefully curated data in compliance with applicable EU and national regulations, including copyright and data privacy laws. With improved token efficiency, the Pharia-1-LLM-7B-control models excel in domain-specific applications, particularly in the automotive and engineering industries. As such, they serve as a valuable complement to the community's selection of weight-available foundation models. `Pharia-1-LLM-7B-control` is engineered to deliver concise, length-controlled responses that match the performance of leading open-source models in the 7B to 8B parameter range. `Pharia-1-LLM-7B-control` can be aligned to user preferences, making it suitable for critical applications without the risk of shutdown behavior. `Pharia-1-LLM-7B-control-aligned` has received additional alignment training to mitigate the risks associated with using the model.
 # Model Overview
-*   **Developed by:** Aleph Alpha Research
-*   **Model type/architecture:** Autoregressive (causal, decoder only) transformer large language models with rotary position embeddings, trained on the next token prediction task. Both `Pharia-1-LLM-7B-control` and `Pharia-1-LLM-7B-control-aligned` are a standalone transformer foundation models with the intention to be integrated into broader AI applications (systems).
 *   **Language(s):** Trained in English, German, French, Spanish, Italian, Portuguese, and Dutch. Tested in English, German, Spanish, and French.
@@ -31,12 +38,12 @@ We provide access to our models through the channels listed below.
 *   **Intelligence Layer SDK**: After the account is approved, accessing the models through the [Intelligence Layer SDK](https://github.com/Aleph-Alpha/intelligence-layer-sdk) is possible. It is a source available library that allows users to easily interact with any model in the Pharia-1-LLM-7B model family as well as supported third-party models, and to build evaluation pipelines to ensure every application delivers the expected results in production.
-*   **On-premise installation:** Our customers are supplied with our full LLM stack, including model weights and inference runtime. Contact us for options to deploy Pharia-1-LLM-7B models in any cloud or on-premise environment. We provide our customers with open access to our full model checkpoint including weights and code for commercial use.
 *   **Hugging Face:** The model’s weights are available on Hugging Face under the [Open Aleph License](https://github.com/Aleph-Alpha/.github/blob/main/oal.pdf), which limits the usage to educational and research purposes.
-Please refer to the [changelog](https://docs.aleph-alpha.com/changelog/) for updates to the models served. We do not deprecate officially released versions of old model generations when we release newer versions, so users can continue to have access to available models.
 No prompt data is stored when using our systems, which means that we do not collect PII (personally identifiable information) for any of our public API users as detailed in our Terms & Conditions. We do not log user inputs to the models. We do not train on user data.
@@ -54,7 +61,7 @@ The Pharia-1-LLM-7B models are not to be used for illegal or unlawful actions of
 Although we do not inspect the requests sent to our API, we regularly review and monitor potential violations that may be related to our models and, depending on the circumstances of the specific case, take legal action against them. This includes, but is not limited to, enforcement to remove published model content, requesting compensation for damages caused, and account termination or removal of credits.
-For non-anonymous reports, we also provide an appeals mechanism for usage policy violations via our dedicated contact address [violations@aleph-alpha.com](mailto:violations@aleph-alpha.com) to communicate with us.
 Customers and partners are enabled to use our [ticketing system](https://servicedesk.aleph-alpha.de/external) for appeals, claims and feedback
@@ -62,7 +69,37 @@ Customers and partners are enabled to use our [ticketing system](https://service
 ### Inference
-To perform inference with the model, you’ll first need to [install the Scaling library](https://github.com/Aleph-Alpha/scaling). Follow the installation instructions provided in the repository's README file. After installation, download the model weights and use the Scaling inference module to load the checkpoint, vocabulary, and configuration files.
 ```python
 from pathlib import Path
@@ -73,10 +110,13 @@ inference_model = TransformerInferenceModule.from_checkpoint(
     checkpoint_dir=Path("path/to/Pharia-1-LLM-7B-control-aligned"),
 )
-input_text = """<|start_header_id|>user<|end_header_id|>
 When was Rome founded?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
 """
 generation = inference_model.generate(max_tokens=100, input_text=input_text)
@@ -424,11 +464,11 @@ The following table shows the training setup, efficiency and duration for all Ph
 | Hardware Type | Hardware Amount | Avg. measured step duration | Avg. measured MFU | Avg. measured TFLOPS | Iterations (number of update steps) | Training tokens | GPU hours | Total FLOPs |
 | A100 (80GB) H100 | Up to 256 GPUs | 8.6s (A100) 3.6s (H100) | 0.66 (A100) 0.5 (H100) | 215 (A100)<br><br>520 (H100) | 582000 + 350000 | ~4.7T + 3T | 356k on A100 + 96k on H100 | 2.75\*1023 + 1.68\*1023 |
-The total compute budget is reported in FLOPS in accordance with the [Bloom implementation](https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/e52bdabbde3c6895aceb76c1bced295c2646121f/megatron/training.py#L759) to provide comparability to the [related paper](https://arxiv.org/pdf/2211.05100.pdf).
 ### Environmental Impact
-Our data centre runs on 100% renewable energy such that **no CO2 emissions are incurred for any inference job** executed through the API. Furthermore, the data center operates with a net-zero water footprint.
 To estimate CO2 emissions, we base our calculations on the following assumptions:
@@ -442,11 +482,11 @@ To estimate CO2 emissions, we base our calculations on the following assumptions
 | Carbon emitted | Carbon emitted accounting for PUE | Power consumption | Note |
 | A100: 0 | A100: 0 | A100: max 400W per GPU<br><br>H100: max 700W per GPU | A100: 100% water-powered energy |
-Numbers may be put into context e.g. by reference to [estimating the carbon footprint of BLOOM, a 176B parameter language model](https://arxiv.org/pdf/2211.02001.pdf).
 # Risks and Limitations
-**Note:** Language models are **not agents** and not optimized for prescriptive actions. The use of language models in high-stake environments, for critical decisions or to support a user's wellbeing should be performed with additional guardrails in place.
 While `Pharia-1-LLM-7B-control-aligned` has received extra training to mitigate risks associated with harmful outputs and biases, it may still be prone to produce undesirable completions in some circumstances.
@@ -469,7 +509,7 @@ Large language models can sometimes generate undesired outputs that are unsuitab
 *   Employing a finetuned model designed to maintain an appropriate tone and style, including avoiding offensive language.
-*   Implementing [explainability](https://docs.aleph-alpha.com/docs/tasks/explain/) checks to create an audit trail at the application level.
 *   Conducting additional validations at the application level to ensure output quality and appropriateness.
@@ -507,7 +547,7 @@ Risks may be mitigated by:
 *   Performing validations on the application layer (e.g., classifying the output).
-*   Using the repetition penalty, especially in the case of repetition, or other parameters available in the API (see [documentation](https://docs.aleph-alpha.com/api/complete/)).
 *   Avoiding of use cases targeted at retrieval of personally identifiable information.

 Pharia-1-LLM-7B comes in two distinct variants, `Pharia-1-LLM-7B-control` and [`Pharia-1-LLM-7B-control-aligned`](https://huggingface.co/Aleph-Alpha/Pharia-1-LLM-7B-control-aligned). Due to being trained on a multilingual corpus, both models are culturally and linguistically optimized for German, French and Spanish. The Pharia-1-LLM-7B models were trained on carefully curated data in compliance with applicable EU and national regulations, including copyright and data privacy laws. With improved token efficiency, the Pharia-1-LLM-7B-control models excel in domain-specific applications, particularly in the automotive and engineering industries. As such, they serve as a valuable complement to the community's selection of weight-available foundation models. `Pharia-1-LLM-7B-control` is engineered to deliver concise, length-controlled responses that match the performance of leading open-source models in the 7B to 8B parameter range. `Pharia-1-LLM-7B-control` can be aligned to user preferences, making it suitable for critical applications without the risk of shutdown behavior. `Pharia-1-LLM-7B-control-aligned` has received additional alignment training to mitigate the risks associated with using the model.
+You can find all model weights and their corresponding safetensors conversions at the following links:
+- [`Pharia-1-LLM-7B-control`](https://huggingface.co/Aleph-Alpha/Pharia-1-LLM-7B-control)
+- [`Pharia-1-LLM-7B-control-hf`](https://huggingface.co/Aleph-Alpha/Pharia-1-LLM-7B-control-hf) (Safetensors)
+- [`Pharia-1-LLM-7B-control-aligned`](https://huggingface.co/Aleph-Alpha/Pharia-1-LLM-7B-control-aligned)
+- [`Pharia-1-LLM-7B-control-aligned-hf`](https://huggingface.co/Aleph-Alpha/Pharia-1-LLM-7B-control-aligned-hf) (Safetensors)
 # Model Overview
+*   **Developed by:** Aleph Alpha Research
+*   **Model type/architecture:** Autoregressive (causal, decoder only) transformer large language models with rotary position embeddings, trained on the next token prediction task. Both `Pharia-1-LLM-7B-control` and `Pharia-1-LLM-7B-control-aligned` are a standalone transformer foundation models with the intention to be integrated into broader AI applications (systems).
 *   **Language(s):** Trained in English, German, French, Spanish, Italian, Portuguese, and Dutch. Tested in English, German, Spanish, and French.
 *   **Intelligence Layer SDK**: After the account is approved, accessing the models through the [Intelligence Layer SDK](https://github.com/Aleph-Alpha/intelligence-layer-sdk) is possible. It is a source available library that allows users to easily interact with any model in the Pharia-1-LLM-7B model family as well as supported third-party models, and to build evaluation pipelines to ensure every application delivers the expected results in production.
+*   **On-premise installation:** Our customers are supplied with our full LLM stack, including model weights and inference runtime. Contact us for options to deploy Pharia-1-LLM-7B models in any cloud or on-premise environment. We provide our customers with open access to our full model checkpoint including weights and code for commercial use.
 *   **Hugging Face:** The model’s weights are available on Hugging Face under the [Open Aleph License](https://github.com/Aleph-Alpha/.github/blob/main/oal.pdf), which limits the usage to educational and research purposes.
+Please refer to the [changelog](https://docs.aleph-alpha.com/changelog/) for updates to the models served. We do not deprecate officially released versions of old model generations when we release newer versions, so users can continue to have access to available models.
 No prompt data is stored when using our systems, which means that we do not collect PII (personally identifiable information) for any of our public API users as detailed in our Terms & Conditions. We do not log user inputs to the models. We do not train on user data.
 Although we do not inspect the requests sent to our API, we regularly review and monitor potential violations that may be related to our models and, depending on the circumstances of the specific case, take legal action against them. This includes, but is not limited to, enforcement to remove published model content, requesting compensation for damages caused, and account termination or removal of credits.
+For non-anonymous reports, we also provide an appeals mechanism for usage policy violations via our dedicated contact address [violations@aleph-alpha.com](mailto:violations@aleph-alpha.com) to communicate with us.
 Customers and partners are enabled to use our [ticketing system](https://servicedesk.aleph-alpha.de/external) for appeals, claims and feedback
 ### Inference
+You can load the model and tokenizer using the Hugging Face Transformers library and our safetensors conversion in [`Pharia-1-LLM-7B-control-hf`](https://huggingface.co/Aleph-Alpha/Pharia-1-LLM-7B-control-hf) and [`Pharia-1-LLM-7B-control-aligned-hf`](https://huggingface.co/Aleph-Alpha/Pharia-1-LLM-7B-control-aligned-hf).
+```python
+import torch
+from transformers import AutoModelForCausalLM, PreTrainedTokenizerFast
+INPUT = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
+You are a helpful assistant. You give engaging, well-structured answers to user inquiries.<|eot_id|><|start_header_id|>user<|end_header_id|>
+When was Rome founded?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
+"""
+MODEL_ID = "Aleph-Alpha/Pharia-1-LLM-7B-control-hf"
+tokenizer = PreTrainedTokenizerFast.from_pretrained(MODEL_ID)
+model = AutoModelForCausalLM.from_pretrained(MODEL_ID, trust_remote_code=True, torch_dtype=torch.bfloat16)
+device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
+model = model.to(device)
+inputs = tokenizer(INPUT, return_token_type_ids=False, return_tensors="pt").to(device)
+outputs = model.generate(**inputs, max_new_tokens=50)
+generated_text = tokenizer.decode(outputs[0])
+print(generated_text)
+```
+To perform inference with the original model files, you’ll first need to [install the Scaling library](https://github.com/Aleph-Alpha/scaling). Follow the installation instructions provided in the repository's README file. After installation, download the model weights and use the Scaling inference module to load the checkpoint, vocabulary, and configuration files.
 ```python
 from pathlib import Path
     checkpoint_dir=Path("path/to/Pharia-1-LLM-7B-control-aligned"),
 )
+input_text = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
+You are a helpful assistant. You give engaging, well-structured answers to user inquiries.<|eot_id|><|start_header_id|>user<|end_header_id|>
 When was Rome founded?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
 """
 generation = inference_model.generate(max_tokens=100, input_text=input_text)
 | Hardware Type | Hardware Amount | Avg. measured step duration | Avg. measured MFU | Avg. measured TFLOPS | Iterations (number of update steps) | Training tokens | GPU hours | Total FLOPs |
 | A100 (80GB) H100 | Up to 256 GPUs | 8.6s (A100) 3.6s (H100) | 0.66 (A100) 0.5 (H100) | 215 (A100)<br><br>520 (H100) | 582000 + 350000 | ~4.7T + 3T | 356k on A100 + 96k on H100 | 2.75\*1023 + 1.68\*1023 |
+The total compute budget is reported in FLOPS in accordance with the [Bloom implementation](https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/e52bdabbde3c6895aceb76c1bced295c2646121f/megatron/training.py#L759) to provide comparability to the [related paper](https://arxiv.org/pdf/2211.05100.pdf).
 ### Environmental Impact
+Our data centre runs on 100% renewable energy such that **no CO2 emissions are incurred for any inference job** executed through the API. Furthermore, the data center operates with a net-zero water footprint.
 To estimate CO2 emissions, we base our calculations on the following assumptions:
 | Carbon emitted | Carbon emitted accounting for PUE | Power consumption | Note |
 | A100: 0 | A100: 0 | A100: max 400W per GPU<br><br>H100: max 700W per GPU | A100: 100% water-powered energy |
+Numbers may be put into context e.g. by reference to [estimating the carbon footprint of BLOOM, a 176B parameter language model](https://arxiv.org/pdf/2211.02001.pdf).
 # Risks and Limitations
+**Note:** Language models are **not agents** and not optimized for prescriptive actions. The use of language models in high-stake environments, for critical decisions or to support a user's wellbeing should be performed with additional guardrails in place.
 While `Pharia-1-LLM-7B-control-aligned` has received extra training to mitigate risks associated with harmful outputs and biases, it may still be prone to produce undesirable completions in some circumstances.
 *   Employing a finetuned model designed to maintain an appropriate tone and style, including avoiding offensive language.
+*   Implementing [explainability](https://docs.aleph-alpha.com/docs/tasks/explain/) checks to create an audit trail at the application level.
 *   Conducting additional validations at the application level to ensure output quality and appropriateness.
 *   Performing validations on the application layer (e.g., classifying the output).
+*   Using the repetition penalty, especially in the case of repetition, or other parameters available in the API (see [documentation](https://docs.aleph-alpha.com/api/complete/)).
 *   Avoiding of use cases targeted at retrieval of personally identifiable information.