🚧"raw" pretrained smol_llama checkpoints - WIP 🚧
BEEspoke Data
community
AI & ML interests
'an LLM is only as good as the dataset it was trained on' - Sun Tzu
Organization Card
🐝📊💁
Collections
7
smol_llama 220M fine-tunes we did
-
BEE-spoke-data/smol_llama-220M-openhermes
Text Generation • Updated • 1.32k • 5 -
BEE-spoke-data/smol_llama-220M-open_instruct
Text Generation • Updated • 12 • 1 -
BEE-spoke-data/beecoder-220M-python
Text Generation • Updated • 12 • 2 -
BEE-spoke-data/zephyr-220m-sft-full
Text Generation • Updated • 1.28k • 1
spaces
1
models
52
BEE-spoke-data/pegasus-x-base-synthsumm_open-16k
Summarization
•
Updated
•
157
BEE-spoke-data/tFINE-680m-e32-d16-gqa-flan
Text2Text Generation
•
Updated
•
21
BEE-spoke-data/tFINE-680m-e32-d16-infinity_instruct-L2
Text2Text Generation
•
Updated
•
22
BEE-spoke-data/tFINE-900m-e16-d32-instruct_2e
Text2Text Generation
•
Updated
•
4
BEE-spoke-data/tFINE-900m-instruct-orpo
Text2Text Generation
•
Updated
•
60
BEE-spoke-data/smol_llama-220M-openhermes
Text Generation
•
Updated
•
1.32k
•
5
BEE-spoke-data/tFINE-900m-e16-d32-instruct
Text2Text Generation
•
Updated
•
38
BEE-spoke-data/tFINE-900m-e16-d32-flan
Text2Text Generation
•
Updated
•
2
BEE-spoke-data/slimpajama_tok-48128-BPE-forT5
Updated
BEE-spoke-data/claude-tokenizer-forT5
Updated
datasets
71
BEE-spoke-data/TxT360-5M-sample-en
Viewer
•
Updated
•
10M
•
182
•
2
BEE-spoke-data/TxT360-500k-sample-no_cc
Viewer
•
Updated
•
500k
•
34
BEE-spoke-data/TxT360-1M-sample
Viewer
•
Updated
•
1M
•
118
BEE-spoke-data/survivorslib-law-books
Viewer
•
Updated
•
49
•
55
BEE-spoke-data/roastme-filtered
Viewer
•
Updated
•
78.8k
•
57
BEE-spoke-data/taskweb
Viewer
•
Updated
•
1.05M
•
34
BEE-spoke-data/FLAN-compressed-plusplus
Viewer
•
Updated
•
124M
•
223
•
1
BEE-spoke-data/FLAN-compressed
Viewer
•
Updated
•
338M
•
59
•
1
BEE-spoke-data/synthsumm-comparisons
Viewer
•
Updated
•
4.67k
•
31
BEE-spoke-data/fineweb-cinema-100k
Viewer
•
Updated
•
100k
•
35