LouisML
/

tinyllama_32k

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

tinyllama_32k / README.md

LouisML's picture

Update README.md

fd7f50a 10 months ago

|

history blame contribute delete

2.39 kB

	---
	license: apache-2.0
	datasets:
	- togethercomputer/RedPajama-Data-1T-Sample
	language:
	- en
	tags:
	- llama
	- llama 2
	---
	# TinyLlama-1.1B-32k

	#### NOTE: This is a fork of the original model at https://huggingface.co/Doctor-Shotgun/TinyLlama-1.1B-32k but with fixed safetensors metadata using the following code:

	```
	import safetensors
	from safetensors.torch import save_file

	tensors = dict()
	with safetensors.safe_open(safetensors_path, framework="pt") as f:
	for key in f.keys():
	tensors[key] = f.get_tensor(key)

	save_file(tensors, safetensors_path, metadata={'format': 'pt'})
	```
	(from https://huggingface.co/SeaLLMs/SeaLLM-7B-Hybrid/discussions/2#65752144412ee70185d49ff5)

	## Original model card:

	32k context finetune of TinyLlama-1.1B using increased rope theta (rope frequency base) meant to serve as a long-context speculative decoding model.

	Created using [TinyLlama-1.1B](https://huggingface.co/TinyLlama/tinyLlama-intermediate-checkpoints-after-1T-token) and further pretraining at 32768 context length on [togethercomputer/RedPajama-Data-1T-Sample](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T-Sample).

	Of note, the base checkpoint used was from commit "final model" fad4f1a5cd0563ac41349b8fec2e6e51156568a0 which was subsequently reverted, and not the current main branch 3T checkpoint of TinyLlama-1.1B.

	### Wikitext (wikitext-2-raw-v1_train) Perplexity (64 rows) as evaluated via [exllamav2](https://github.com/turboderp/exllamav2):

	\| Model \| 2048 \| 4096 \| 8192 \| 16384 \| 32768 \|
	\| ---------------------- \| ---------- \| ---------- \| ---------- \| ---------- \| ---------- \|
	\| TinyLlama-1.1B \| 8.5633 \| 208.3586 \| 863.7507 \| 1600.5021 \| 6981.9021 \|
	\| TinyLlama-1.1B-32k \| 8.6548 \| 7.8339 \| 7.4904 \| 7.3674 \| 7.1338 \|

	### Evaluation on HumanEval by [turboderp](https://huggingface.co/turboderp):

	\| Model \| Pass@1 \| Pass@10 \|
	\| -------------------------------------- \| --------------- \| ----------- \|
	\| TinyLlama-1.1B \| 0.0841 \| 0.1524 \|
	\| TinyLlama-1.1B (NTK alpha=7.7) \| 0.0598 \| 0.1098 \|
	\| TinyLlama-1.1B-32k-ckpt-554 \| 0.0732 \| 0.1402 \|
	\| TinyLlama-1.1B-32k \| 0.0829 \| 0.1524 \|