hooking-dev
/

Hebrew_v1.0-Base

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

hooking-dev commited on Apr 15

Commit

95cb6ff

•

1 Parent(s): ff85c3e

Update README.md

Files changed (1) hide show

README.md +19 -1

README.md CHANGED Viewed

@@ -73,7 +73,7 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ### Training Data
-The model was trained on the OSCAR Hebrew dataset, a large-scale, open corpus consisting of diverse text collected from the web, reflecting common usage of Hebrew in various contexts.
 ### Training Procedure
@@ -116,3 +116,21 @@ If you use this model in your research, please cite it as follows:
   year={2024},
   url={https://huggingface.co/hooking-dev/Hebrew_v1.0}
 }

 ### Training Data
+The model was trained on the OSCAR Hebrew dataset, a large-scale, open corpus consisting of diverse text collected from the web, reflecting common usage of Hebrew in various contexts. For more details on the dataset, see the citations related to OSCAR below.
 ### Training Procedure
   year={2024},
   url={https://huggingface.co/hooking-dev/Hebrew_v1.0}
 }
+@article{2022arXiv221210440J,
+  author = {{Jansen}, Tim and {Tong}, Yangling and {Zevallos}, Victoria and {Ortiz Suarez}, Pedro},
+  title = "{Perplexed by Quality: A Perplexity-based Method for Adult and Harmful Content Detection in Multilingual Heterogeneous Web Data}",
+  journal = {arXiv e-prints},
+  year = 2022,
+  month = dec,
+  eid = {arXiv:2212.10440},
+  pages = {arXiv:2212.10440},
+  doi = {10.48550/arXiv.2212.10440},
+  archivePrefix = {arXiv},
+  eprint = {2212.10440},
+  primaryClass = {cs.CL},
+  adsurl = {https://ui.adsabs.harvard.edu/abs/2022arXiv221210440J},
+  adsnote = {Provided by the SAO/NASA Astrophysics Data System}
+}
+}