Update for paper release
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ license: apache-2.0
|
|
11 |
|
12 |
**Falcon-RW-1B is a 1B parameters causal decoder-only model built by [TII](https://www.tii.ae) and trained on 350B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb). It is made available under the Apache 2.0 license.**
|
13 |
|
14 |
-
|
15 |
|
16 |
RefinedWeb is a high-quality web dataset built by leveraging stringent filtering and large-scale deduplication. Falcon-RW-1B, trained on RefinedWeb only, matches or outperforms comparable models trained on curated data.
|
17 |
|
@@ -63,7 +63,7 @@ for seq in sequences:
|
|
63 |
|
64 |
### Model Source
|
65 |
|
66 |
-
- **Paper:**
|
67 |
|
68 |
## Uses
|
69 |
|
@@ -147,7 +147,7 @@ Training happened in early December 2022 and took about six days.
|
|
147 |
|
148 |
## Evaluation
|
149 |
|
150 |
-
|
151 |
|
152 |
|
153 |
## Technical Specifications
|
@@ -179,7 +179,15 @@ Falcon-RW-1B was trained a custom distributed training codebase, Gigatron. It us
|
|
179 |
|
180 |
## Citation
|
181 |
|
182 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
183 |
|
184 |
|
185 |
## Contact
|
|
|
11 |
|
12 |
**Falcon-RW-1B is a 1B parameters causal decoder-only model built by [TII](https://www.tii.ae) and trained on 350B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb). It is made available under the Apache 2.0 license.**
|
13 |
|
14 |
+
See the π [paper on arXiv](https://arxiv.org/abs/2306.01116) for more details.
|
15 |
|
16 |
RefinedWeb is a high-quality web dataset built by leveraging stringent filtering and large-scale deduplication. Falcon-RW-1B, trained on RefinedWeb only, matches or outperforms comparable models trained on curated data.
|
17 |
|
|
|
63 |
|
64 |
### Model Source
|
65 |
|
66 |
+
- **Paper:** [https://arxiv.org/abs/2306.01116](https://arxiv.org/abs/2306.01116).
|
67 |
|
68 |
## Uses
|
69 |
|
|
|
147 |
|
148 |
## Evaluation
|
149 |
|
150 |
+
See the π [paper on arXiv](https://arxiv.org/abs/2306.01116) for in-depth evaluation.
|
151 |
|
152 |
|
153 |
## Technical Specifications
|
|
|
179 |
|
180 |
## Citation
|
181 |
|
182 |
+
@article{refinedweb,
|
183 |
+
title={The {R}efined{W}eb dataset for {F}alcon {LLM}: outperforming curated corpora with web data, and web data only},
|
184 |
+
author={Guilherme Penedo and Quentin Malartic and Daniel Hesslow and Ruxandra Cojocaru and Alessandro Cappelli and Hamza Alobeidli and Baptiste Pannier and Ebtesam Almazrouei and Julien Launay},
|
185 |
+
journal={arXiv preprint arXiv:2306.01116},
|
186 |
+
eprint={2306.01116},
|
187 |
+
eprinttype = {arXiv},
|
188 |
+
url={https://arxiv.org/abs/2306.01116},
|
189 |
+
year={2023}
|
190 |
+
}
|
191 |
|
192 |
|
193 |
## Contact
|