dandanw
/

bloom-3b-sv

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

bloom-3b-sv / README.md

dandanw's picture

Update README.md

41d6980 almost 2 years ago

|

1.95 kB

	---
	license: bigscience-bloom-rail-1.0
	language:
	- en
	- sv
	---

	# Model Summary

	This is a base causal model extended from [bigscience/bloomz-3b](https://huggingface.co/bigscience/bloomz-3b).

	* Model size: 3.02B (~20M more than the base model)
	* The tokenizer is extended to support Swedish language. Additional 8068 of tokens trained from Swedish Wiki and OSCAR have been added. The embedding layer is therefore extended too.
	* The embedding layer and self-attention query_key_value layers are re-trained on mixed English and Swedish corpuses.

	# Intended Use
	This model is being created in order to enable using Swedish and English on LLMs to cover public research and business use cases. LLMs are intended to be used for language generation or as a pretrained base model.
	It needs to be further fine-tuned for specific tasks.

	The model inherits bigscience-bloom-rail-1.0 license from the base model. It shall NOT be used in bad purposes. For use restrictions, please check out [RAIL License, Use Restrictions](https://huggingface.co/spaces/bigscience/license) Appendix A.

	# Training Corpuses:

	The model is re-trained with ~800M Swedish tokens and ~260M English tokens.

	* [olm/wikipedia](https://huggingface.co/datasets/olm/wikipedia)
	* [oscar](https://huggingface.co/datasets/oscar)
	* [sbx/superlim-2](https://huggingface.co/datasets/sbx/superlim-2)
	* [Gabriel/xsum_swe](https://huggingface.co/datasets/Gabriel/xsum_swe)

	# Notes:
	* Since the model is only re-trained with Swedish and English. It seems only Swedish and English capabilities are retained. If you want to re-enable the capabilities of other languages, you will need to re-train it with the specific language.
	* You might notice the base model is bloomz-3b, which is a intruction fine-tuned version of bloom. After this re-train, it seems to lose the instruction capability as well. So it is now simply a base causal model which could speak Swedish.