CataLlama

non-profit

Activity Feed Request to join this org

AI & ML interests

LLM, RAG, Agents

Organization Card

Community About org cards

CataLlama

CataLlama is a fine-tune of Llama-3 8B on the Catalan language.

CataLlama-v0.1 was trained on roughly 445 million new tokens in three separate stages:

Language enhancement with raw text - we could also call this "continued pre-training" at a very small scale.
Supervised fine-tuning on instructions consisting of 70% Catalan Language and 30% English Language.
DPO fine-tuning on preferences consisting of 70% Catalan language and 30% English Language.

CataLlama-v0.2 was trained on roughly 620 million new tokens in a very similar manner to v0.1, except for the base model which is obtained via a merge.

Note: This model is not intended to beat benchmarks, but to demonstrate techniques for augmenting LLMs on new languages and preserve rare languages as part of our world heritage.

Three models and three respective datasets have been released.

Model Author

Laurentiu Petrea

Model Inheritance

Collections 4

spaces 1

CataLlama

models 8

catallama/CataLlama-v0.1-Instruct-DPO

Text Generation • Updated Jul 16, 2024 • 12 • 3

catallama/CataLlama-v0.1-Instruct-SFT

Text Generation • Updated Jul 16, 2024 • 5 • 2

catallama/CataLlama-v0.2-Instruct-SFT-DPO-Merged-GGUF

Text Generation • Updated Jul 15, 2024 • 3 • 1

catallama/CataLlama-v0.2-Instruct-DPO

Text Generation • Updated Jul 15, 2024 • 2

catallama/CataLlama-v0.2-Instruct-SFT

Text Generation • Updated Jul 15, 2024 • 173

catallama/CataLlama-v0.2-Instruct-SFT-DPO-Merged

Text Generation • Updated Jul 14, 2024 • 4

catallama/CataLlama-v0.2-Base

Text Generation • Updated Jul 13, 2024 • 2

catallama/CataLlama-v0.1-Base

Text Generation • Updated May 26, 2024 • 9 • 1

datasets 5

catallama/Catalan-DPO-V2

Viewer • Updated Jul 16, 2024 • 23.5k • 36 • 2

catallama/Catalan-Instruct-V2

Viewer • Updated Jul 14, 2024 • 710k • 37 • 2

catallama/Catalan-DPO

Preview • Updated May 26, 2024 • 37

catallama/Catalan-Instruct

Viewer • Updated May 26, 2024 • 328k • 43 • 1

catallama/Catalan-Raw-Text

Viewer • Updated May 25, 2024 • 409k • 36