GeoV
/

GeoV-9b-r2

Text Generation

Model card Files Files and versions Community

GeoV-9b-r2 / README.md

vpj's picture

vpj

Update README.md

8ca5bfe over 1 year ago

|

history blame contribute delete

3.5 kB

	---
	language:
	- en
	tags:
	- pytorch
	- causal-lm
	license: bigscience-openrail-m
	---


	[GeoV](https://github.com/geov-ai/geov)-9B-r2 is a 9 billion parameter causal language model.

	It is still being trained and has the same architecture as the [GeoV-9b](https://huggingface.co/GeoV/GeoV-9b) model, but
	the training data is sampled without replacement; (GeoV-9b models training data was sampled with replacement).

	The GeoV model was designed by Georges Harik and uses
	[Rotary Positional Embeddings with Relative distances (RoPER)](https://research.labml.ai/RoPER.html)
	by [Georges Harik](https://twitter.com/gharik) and [Varuna Jayasiri](https://twitter.com/vpj).

	[RoPER](https://research.labml.ai/RoPER.html),
	in addition to using relative positions in the attention score calculation by RoPE embeddings,
	adds relative positional information explicitly to value embeddings.
	Specifically, it incorporates the relative positions of the tokens paid attention to.
	RoPER has given better performance in some algorithmic tasks, and seems comparable to RoPE in language modeling.

	## Model details

	- Developed by: [Georges Harik](http://twitter.com/gharik)
	- Model type: Transformer-based Language Model
	- Language: English

	<figure style="width:30em">

	\| Hyperparameter \| Value \|
	\| ---------------------- \| ----------- \|
	\| n<sub>parameters</sub> \| 9B \|
	\| n<sub>layers</sub> \| 32 \|
	\| d<sub>model</sub> \| 5120 \|
	\| n<sub>heads</sub> \| 40 \|
	\| d<sub>head</sub> \| 128 \|
	\| n<sub>vocab</sub> \| 65500 \|
	\| Sequence Length \| 2048 \|
	</figure>

	The current released weights were trained on ~39 billion tokens.
	We plan to continue training up to 300 billion tokens.
	This training run is monolingual and uses c4en and english wikipedia datasets.

	## Test results

	These are the results from [EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) at 81B (tokens trained) checkpoint.

	\| Task \|Version\| Metric \| Value \| \|Stderr\|
	\|--------------\|------:\|--------\|------:\|---\|-----:\|
	\|anli_r1 \| 0\|acc \| 0.3260\|± \|0.0148\|
	\|anli_r2 \| 0\|acc \| 0.3380\|± \|0.0150\|
	\|anli_r3 \| 0\|acc \| 0.3583\|± \|0.0138\|
	\|hellaswag \| 0\|acc \| 0.4666\|± \|0.0050\|
	\| \| \|acc_norm\| 0.6157\|± \|0.0049\|
	\|lambada_openai\| 0\|ppl \|10.0153\|± \|0.3145\|
	\| \| \|acc \| 0.5403\|± \|0.0069\|
	\|mathqa \| 0\|acc \| 0.2332\|± \|0.0077\|
	\| \| \|acc_norm\| 0.2348\|± \|0.0078\|
	\|piqa \| 0\|acc \| 0.7503\|± \|0.0101\|
	\| \| \|acc_norm\| 0.7503\|± \|0.0101\|
	\|winogrande \| 0\|acc \| 0.5872\|± \|0.0138\|
	\|wsc \| 0\|acc \| 0.5673\|± \|0.0488\|

	## Installation

	```shell
	pip install geov
	```

	## Generation

	[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/geov-ai/geov/blob/master/notebooks/generate.ipynb)

	```python
	from geov import GeoVForCausalLM, GeoVTokenizer

	model = GeoVForCausalLM.from_pretrained("GeoV/GeoV-9b-r2")
	tokenizer = GeoVTokenizer.from_pretrained("GeoV/GeoV-9b-r2")

	prompt = "In mathematics, topology is the study of"

	input_ids = tokenizer(prompt, return_tensors="pt").input_ids

	gen_tokens = model.generate(
	input_ids,
	do_sample=True,
	temperature=0.9,
	max_length=100,
	)
	gen_text = tokenizer.batch_decode(gen_tokens)[0]
	```