TeenyTinyLlama

nicholasKluge 's Collections

Aira

updated May 31

TeenyTinyLlama is a pair of compact language models based on the Llama 2 architecture trained on a Brazilian Portuguese corpus.

Upvote

TeenyTinyLlama: open-source tiny language models trained in Brazilian Portuguese

Paper • 2401.16640 • Published Jan 30, 2024 • 9
Running

4

4

TeenyTinyLlama-Chat

🦙

Generate chatbot responses based on user input
nicholasKluge/TeenyTinyLlama-460m

Text Generation • 0.5B • Updated Jan 15 • 1.14k • 11
Note 460 million-parameter version of the TeenyTinyLlama.
nicholasKluge/TeenyTinyLlama-460m-awq

Text Generation • 0.1B • Updated Jan 15 • 14 • 1

Note 460 million-parameter version (4-bit quantized via AWQ) of the TeenyTinyLlama.
nicholasKluge/TeenyTinyLlama-460m-Chat

Text Generation • 0.5B • Updated Jan 15 • 46 • 3

Note 460 million-parameter version of the TeenyTinyLlama fine-tuned on the Instruct-Aira Dataset version 2.0.
nicholasKluge/TeenyTinyLlama-460m-Chat-awq

Text Generation • 0.1B • Updated Jan 15 • 9 • 1

Note 460 million-parameter version of the TeenyTinyLlama fine-tuned on the Instruct-Aira Dataset version 2.0 (4-bit quantized via AWQ).
nicholasKluge/TeenyTinyLlama-460m-HateBR

Text Classification • 0.4B • Updated Oct 8, 2024 • 4 • 1

Note 460 million-parameter version of the TeenyTinyLlama fine-tuned on the HateBR dataset.
nicholasKluge/TeenyTinyLlama-460m-FaQuAD-NLI

Text Classification • 0.4B • Updated Oct 8, 2024 • 4

Note 460 million-parameter version of the TeenyTinyLlama fine-tuned on the FaQuAD-NLI dataset.
nicholasKluge/TeenyTinyLlama-460m-IMDB

Text Classification • 0.4B • Updated Oct 8, 2024 • 5 • 1

Note 460 million-parameter version of the TeenyTinyLlama fine-tuned on the IMDB dataset.
nicholasKluge/TeenyTinyLlama-460m-Assin2

Text Classification • 0.4B • Updated Oct 8, 2024 • 3

Note 460 million-parameter version of the TeenyTinyLlama fine-tuned on the Assin2 dataset.
nicholasKluge/TeenyTinyLlama-460m-AgNews

Text Classification • 0.4B • Updated Oct 8, 2024 • 5

Note 460 million-parameter version of the TeenyTinyLlama fine-tuned on the AgNews dataset.
nicholasKluge/TeenyTinyLlama-160m

Text Generation • 0.2B • Updated Jan 15 • 1.18k • 7
Note 160 million-parameter version of the TeenyTinyLlama.
nicholasKluge/TeenyTinyLlama-160m-HateBR

Text Classification • 0.1B • Updated Oct 8, 2024 • 13

Note 160 million-parameter version of the TeenyTinyLlama fine-tuned on the HateBR dataset.
nicholasKluge/TeenyTinyLlama-160m-FaQuAD-NLI

Text Classification • 0.1B • Updated Oct 8, 2024 • 9

Note 160 million-parameter version of the TeenyTinyLlama fine-tuned on the FaQuAD-NLI dataset.
nicholasKluge/TeenyTinyLlama-160m-IMDB

Text Classification • 0.1B • Updated Oct 8, 2024 • 4

Note 160 million-parameter version of the TeenyTinyLlama fine-tuned on the IMDB dataset.
nicholasKluge/TeenyTinyLlama-160m-Assin2

Text Classification • 0.1B • Updated Oct 8, 2024 • 4

Note 160 million-parameter version of the TeenyTinyLlama fine-tuned on the Assin2 dataset.
nicholasKluge/TeenyTinyLlama-160m-AgNews

Text Classification • 0.1B • Updated Oct 8, 2024 • 18

Note 160 million-parameter version of the TeenyTinyLlama fine-tuned on the AgNews dataset.
nicholasKluge/Pt-Corpus

Viewer • Updated Jun 18, 2024 • 5.77M • 223 • 3

Note Pt-Corpus is a concatenation of several portions of Brazilian Portuguese datasets found in the Hub, with approximately 4.1B tokens. This version does not have instructional content.
nicholasKluge/Pt-Corpus-tokenized

Viewer • Updated Jun 18, 2024 • 2.02M • 98

Note Tokenized version of the Pt-Corpus (performed using the TeenyTinyLlama tokenizer).
nicholasKluge/Pt-Corpus-Instruct

Viewer • Updated Jun 18, 2024 • 10.6M • 2.54k • 3

Note Pt-Corpus Instruct is a concatenation of several portions of Brazilian Portuguese datasets found in the Hub, with approximately 6.2B tokens. This version of the corpus includes several instances of conversational and general instructional data.
nicholasKluge/Pt-Corpus-Instruct-tokenized

Viewer • Updated May 31 • 3.06M • 304

Note Tokenized version of the Pt-Corpus-Instruct (performed using the TeenyTinyLlama tokenizer).
nicholasKluge/instruct-aira-dataset-v2

Viewer • Updated Jun 18, 2024 • 163k • 73 • 5

Note A collection of single-turn conversations between an assistant and a user.

Upvote

TeenyTinyLlama-Chat