|
--- |
|
language: |
|
- en |
|
tags: |
|
- pytorch |
|
- causal-lm |
|
license: bigscience-openrail-m |
|
--- |
|
|
|
|
|
[GeoV](https://github.com/geov-ai/geov)-9B-r2 is a 9 billion parameter causal language model. |
|
|
|
It is still being trained and has the same architecture as the [GeoV-9b](https://huggingface.co/GeoV/GeoV-9b) model, but |
|
the training data is sampled without replacement; (GeoV-9b models training data was sampled with replacement). |
|
|
|
The GeoV model was designed by Georges Harik and uses |
|
[Rotary Positional Embeddings with Relative distances (RoPER)](https://research.labml.ai/RoPER.html) |
|
by [Georges Harik](https://twitter.com/gharik) and [Varuna Jayasiri](https://twitter.com/vpj). |
|
|
|
[RoPER](https://research.labml.ai/RoPER.html), |
|
in addition to using relative positions in the attention score calculation by RoPE embeddings, |
|
adds relative positional information explicitly to value embeddings. |
|
Specifically, it incorporates the relative positions of the tokens paid attention to. |
|
RoPER has given better performance in some algorithmic tasks, and seems comparable to RoPE in language modeling. |
|
|
|
## Model details |
|
|
|
- Developed by: [Georges Harik](http://twitter.com/gharik) |
|
- Model type: Transformer-based Language Model |
|
- Language: English |
|
|
|
<figure style="width:30em"> |
|
|
|
| Hyperparameter | Value | |
|
| ---------------------- | ----------- | |
|
| n<sub>parameters</sub> | 9B | |
|
| n<sub>layers</sub> | 32 | |
|
| d<sub>model</sub> | 5120 | |
|
| n<sub>heads</sub> | 40 | |
|
| d<sub>head</sub> | 128 | |
|
| n<sub>vocab</sub> | 65500 | |
|
| Sequence Length | 2048 | |
|
</figure> |
|
|
|
The current released weights were trained on ~39 billion tokens. |
|
We plan to continue training up to 300 billion tokens. |
|
This training run is monolingual and uses c4en and english wikipedia datasets. |
|
|
|
## Test results |
|
|
|
These are the results from [EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) at 81B (tokens trained) checkpoint. |
|
|
|
| Task |Version| Metric | Value | |Stderr| |
|
|--------------|------:|--------|------:|---|-----:| |
|
|anli_r1 | 0|acc | 0.3260|± |0.0148| |
|
|anli_r2 | 0|acc | 0.3380|± |0.0150| |
|
|anli_r3 | 0|acc | 0.3583|± |0.0138| |
|
|hellaswag | 0|acc | 0.4666|± |0.0050| |
|
| | |acc_norm| 0.6157|± |0.0049| |
|
|lambada_openai| 0|ppl |10.0153|± |0.3145| |
|
| | |acc | 0.5403|± |0.0069| |
|
|mathqa | 0|acc | 0.2332|± |0.0077| |
|
| | |acc_norm| 0.2348|± |0.0078| |
|
|piqa | 0|acc | 0.7503|± |0.0101| |
|
| | |acc_norm| 0.7503|± |0.0101| |
|
|winogrande | 0|acc | 0.5872|± |0.0138| |
|
|wsc | 0|acc | 0.5673|± |0.0488| |
|
|
|
## Installation |
|
|
|
```shell |
|
pip install geov |
|
``` |
|
|
|
## Generation |
|
|
|
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/geov-ai/geov/blob/master/notebooks/generate.ipynb) |
|
|
|
```python |
|
from geov import GeoVForCausalLM, GeoVTokenizer |
|
|
|
model = GeoVForCausalLM.from_pretrained("GeoV/GeoV-9b-r2") |
|
tokenizer = GeoVTokenizer.from_pretrained("GeoV/GeoV-9b-r2") |
|
|
|
prompt = "In mathematics, topology is the study of" |
|
|
|
input_ids = tokenizer(prompt, return_tensors="pt").input_ids |
|
|
|
gen_tokens = model.generate( |
|
input_ids, |
|
do_sample=True, |
|
temperature=0.9, |
|
max_length=100, |
|
) |
|
gen_text = tokenizer.batch_decode(gen_tokens)[0] |
|
``` |