|
--- |
|
base_model: aixonlab/Aether-12b |
|
language: |
|
- en |
|
license: apache-2.0 |
|
tags: |
|
- text-generation-inference |
|
- transformers |
|
- mistral |
|
--- |
|
|
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/66dcee3321f901b049f48002/jWXtbknuetFdz5fkFn-ey.png" width="800"/> |
|
|
|
# Grey-12b |
|
|
|
Grey-12b is a merged language model created by combining multiple models using the della_linear merge method, with Aether-12b as the base model. |
|
|
|
## Model Details π |
|
- Developed by: AIXON Lab |
|
- Model type: Merged Causal Language Model |
|
- Language(s): English (primarily), may support other languages |
|
- License: apache-2.0 |
|
- Repository: https://huggingface.co/aixonlab/Grey-12b |
|
|
|
## Model Architecture ποΈ |
|
- Base model: aixonlab/Aether-12b |
|
- Parameter count: ~12 billion |
|
- Architecture specifics: Transformer-based language model |
|
- Merge method: della_linear |
|
|
|
### Merged Models |
|
1. VAGOsolutions/SauerkrautLM-Nemo-12b-Instruct |
|
- Weight: 0.33 |
|
- Density: 0.4 |
|
2. cognitivecomputations/dolphin-2.9.3-mistral-nemo-12b |
|
- Weight: 0.77 |
|
- Density: 0.8 |
|
|
|
## Technical Specifications |
|
- Dtype: float16 |
|
- Tokenizer source: base (aixonlab/Aether-12b) |
|
- Merge parameters: |
|
- Epsilon: 0.05 |
|
- Lambda: 1 |
|
|
|
## Intended Use π― |
|
As an advanced language model for various natural language processing tasks, including but not limited to text generation, question-answering, and analysis. |
|
|
|
## Ethical Considerations π€ |
|
As a merged model based on multiple sources, Grey-12b may inherit biases and limitations from its constituent models. Users should be aware of potential biases in generated content and use the model responsibly. |
|
|
|
## Performance and Evaluation |
|
Performance metrics and evaluation results for Grey-12b are yet to be determined. Users are encouraged to contribute their findings and benchmarks. |
|
|
|
## Limitations and Biases |
|
The model may exhibit biases present in its training data and constituent models. It's crucial to critically evaluate the model's outputs and use them in conjunction with human judgment. |
|
|
|
## Additional Information |
|
For more details on the base model and constituent models, please refer to their respective model cards and documentation. |
|
|
|
## Acknowledgments π |
|
We acknowledge the contributions of: |
|
- VAGOsolutions for the SauerkrautLM-Nemo-12b-Instruct model |
|
- Cognitive Computations for the dolphin-2.9.3-mistral-nemo-12b model |
|
|
|
## How to Use |
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
model = AutoModelForCausalLM.from_pretrained("aixonlab/Grey-12b") |
|
tokenizer = AutoTokenizer.from_pretrained("aixonlab/Grey-12b") |
|
|
|
prompt = "Once upon a time" |
|
input_ids = tokenizer(prompt, return_tensors="pt").input_ids |
|
|
|
generated_ids = model.generate(input_ids, max_length=100) |
|
generated_text = tokenizer.decode(generated_ids, skip_special_tokens=True) |
|
print(generated_text) |